All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-17 13:42 ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

This series introduce a vfio driver over virtio devices to support the
legacy interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

The first 6 patches are in the virtio area and handle the below:
- Fix common config map for modern device as was reported by Michael Tsirkin.
- Introduce the admin virtqueue infrastcture.
- Expose the layout of the commands that should be used for
  supporting the legacy access.
- Expose APIs to enable upper layers as of vfio, net, etc
  to execute admin commands.

The above follows the virtio spec that was lastly accepted in that area
[1].

The last 3 patches are in the vfio area and handle the below:
- Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
- Introduce a vfio driver over virtio devices to support the legacy
  interface functionality for VFs. 

The series was tested successfully over virtio-net VFs in the host,
while running in the guest both modern and legacy drivers.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html

Virtio:
- Fix the common config map size issue that was reported by Michael
  Tsirkin.
- Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
  Michael, instead skip the AQ specifically.
- Move admin vq implementation into virtio_pci_modern.c as was asked by
  Michael.
- Rename structure virtio_avq to virtio_pci_admin_vq and some extra
  corresponding renames.
- Remove exported symbols virtio_pci_vf_get_pf_dev(),
  virtio_admin_cmd_exec() as now callers are local to the module.
- Handle inflight commands as part of the device reset flow.
- Introduce APIs per admin command in virtio-pci as was asked by Michael.

Vfio:
- Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
  vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
  Alex.
- Drop the intermediate patch which prepares the commands and calls the
  generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
- Instead, call directly to the new APIs per admin command that are
  exported from Virtio - based on Michael's request.
- Enable only virtio-net as part of the pci_device_id table to enforce
  upon binding only what is supported as suggested by Alex.
- Add support for byte-wise access (read/write) over the device config
  region as was asked by Alex.
- Consider whether MSIX is practically enabled/disabled to choose the
  right opcode upon issuing read/write admin command, as mentioned
  by Michael.
- Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
  as was suggested by Michael.
- Set the '.close_device' op to vfio_pci_core_close_device() as was
  pointed by Alex.
- Adapt to Vfio multi-line comment style in a few places.
- Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
  to be CCed for the new driver as was suggested by Jason.

Yishai

Feng Liu (5):
  virtio-pci: Fix common config map for modern device
  virtio: Define feature bit for administration virtqueue
  virtio-pci: Introduce admin virtqueue
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce admin commands

Yishai Hadas (4):
  virtio-pci: Introduce APIs to execute legacy IO admin commands
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  vfio/pci: Expose vfio_pci_iowrite/read##size()
  vfio/virtio: Introduce a vfio driver over virtio devices

 MAINTAINERS                            |   7 +
 drivers/vfio/pci/Kconfig               |   2 +
 drivers/vfio/pci/Makefile              |   2 +
 drivers/vfio/pci/vfio_pci_core.c       |  25 ++
 drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
 drivers/vfio/pci/virtio/Kconfig        |  15 +
 drivers/vfio/pci/virtio/Makefile       |   4 +
 drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
 drivers/virtio/virtio.c                |  37 +-
 drivers/virtio/virtio_pci_common.c     |  14 +
 drivers/virtio/virtio_pci_common.h     |  20 +-
 drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
 drivers/virtio/virtio_pci_modern_dev.c |  24 +-
 include/linux/vfio_pci_core.h          |  20 +
 include/linux/virtio.h                 |   8 +
 include/linux/virtio_config.h          |   4 +
 include/linux/virtio_pci_admin.h       |  18 +
 include/linux/virtio_pci_modern.h      |   5 +
 include/uapi/linux/virtio_config.h     |   8 +-
 include/uapi/linux/virtio_pci.h        |  66 +++
 20 files changed, 1295 insertions(+), 40 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c
 create mode 100644 include/linux/virtio_pci_admin.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-17 13:42 ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

This series introduce a vfio driver over virtio devices to support the
legacy interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

The first 6 patches are in the virtio area and handle the below:
- Fix common config map for modern device as was reported by Michael Tsirkin.
- Introduce the admin virtqueue infrastcture.
- Expose the layout of the commands that should be used for
  supporting the legacy access.
- Expose APIs to enable upper layers as of vfio, net, etc
  to execute admin commands.

The above follows the virtio spec that was lastly accepted in that area
[1].

The last 3 patches are in the vfio area and handle the below:
- Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
- Introduce a vfio driver over virtio devices to support the legacy
  interface functionality for VFs. 

The series was tested successfully over virtio-net VFs in the host,
while running in the guest both modern and legacy drivers.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html

Virtio:
- Fix the common config map size issue that was reported by Michael
  Tsirkin.
- Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
  Michael, instead skip the AQ specifically.
- Move admin vq implementation into virtio_pci_modern.c as was asked by
  Michael.
- Rename structure virtio_avq to virtio_pci_admin_vq and some extra
  corresponding renames.
- Remove exported symbols virtio_pci_vf_get_pf_dev(),
  virtio_admin_cmd_exec() as now callers are local to the module.
- Handle inflight commands as part of the device reset flow.
- Introduce APIs per admin command in virtio-pci as was asked by Michael.

Vfio:
- Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
  vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
  Alex.
- Drop the intermediate patch which prepares the commands and calls the
  generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
- Instead, call directly to the new APIs per admin command that are
  exported from Virtio - based on Michael's request.
- Enable only virtio-net as part of the pci_device_id table to enforce
  upon binding only what is supported as suggested by Alex.
- Add support for byte-wise access (read/write) over the device config
  region as was asked by Alex.
- Consider whether MSIX is practically enabled/disabled to choose the
  right opcode upon issuing read/write admin command, as mentioned
  by Michael.
- Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
  as was suggested by Michael.
- Set the '.close_device' op to vfio_pci_core_close_device() as was
  pointed by Alex.
- Adapt to Vfio multi-line comment style in a few places.
- Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
  to be CCed for the new driver as was suggested by Jason.

Yishai

Feng Liu (5):
  virtio-pci: Fix common config map for modern device
  virtio: Define feature bit for administration virtqueue
  virtio-pci: Introduce admin virtqueue
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce admin commands

Yishai Hadas (4):
  virtio-pci: Introduce APIs to execute legacy IO admin commands
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  vfio/pci: Expose vfio_pci_iowrite/read##size()
  vfio/virtio: Introduce a vfio driver over virtio devices

 MAINTAINERS                            |   7 +
 drivers/vfio/pci/Kconfig               |   2 +
 drivers/vfio/pci/Makefile              |   2 +
 drivers/vfio/pci/vfio_pci_core.c       |  25 ++
 drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
 drivers/vfio/pci/virtio/Kconfig        |  15 +
 drivers/vfio/pci/virtio/Makefile       |   4 +
 drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
 drivers/virtio/virtio.c                |  37 +-
 drivers/virtio/virtio_pci_common.c     |  14 +
 drivers/virtio/virtio_pci_common.h     |  20 +-
 drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
 drivers/virtio/virtio_pci_modern_dev.c |  24 +-
 include/linux/vfio_pci_core.h          |  20 +
 include/linux/virtio.h                 |   8 +
 include/linux/virtio_config.h          |   4 +
 include/linux/virtio_pci_admin.h       |  18 +
 include/linux/virtio_pci_modern.h      |   5 +
 include/uapi/linux/virtio_config.h     |   8 +-
 include/uapi/linux/virtio_pci.h        |  66 +++
 20 files changed, 1295 insertions(+), 40 deletions(-)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c
 create mode 100644 include/linux/virtio_pci_admin.h

-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 1/9] virtio-pci: Fix common config map for modern device
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Currently vp_modern_probe() missed out to map config space structure
starting from notify_data offset. Due to this when such structure
elements are accessed it can result in an error.

Fix it by considering the minimum size of what device has offered and
what driver will access.

Fixes: ea024594b1dc ("virtio_pci: struct virtio_pci_common_cfg add queue_notify_data")
Fixes: 0cdd450e7051 ("virtio_pci: struct virtio_pci_common_cfg add queue_reset")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reported-by: Michael S . Tsirkin <mst@redhat.com>
Closes: https://lkml.kernel.org/kvm/20230927172553-mutt-send-email-mst@kernel.org/
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_modern_dev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index aad7d9296e77..7fa70d7c8146 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -290,9 +290,9 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
 
 	err = -EINVAL;
 	mdev->common = vp_modern_map_capability(mdev, common,
-				      sizeof(struct virtio_pci_common_cfg), 4,
-				      0, sizeof(struct virtio_pci_common_cfg),
-				      NULL, NULL);
+				sizeof(struct virtio_pci_common_cfg), 4,
+				0, sizeof(struct virtio_pci_modern_common_cfg),
+				NULL, NULL);
 	if (!mdev->common)
 		goto err_map_common;
 	mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 1/9] virtio-pci: Fix common config map for modern device
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Currently vp_modern_probe() missed out to map config space structure
starting from notify_data offset. Due to this when such structure
elements are accessed it can result in an error.

Fix it by considering the minimum size of what device has offered and
what driver will access.

Fixes: ea024594b1dc ("virtio_pci: struct virtio_pci_common_cfg add queue_notify_data")
Fixes: 0cdd450e7051 ("virtio_pci: struct virtio_pci_common_cfg add queue_reset")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reported-by: Michael S . Tsirkin <mst@redhat.com>
Closes: https://lkml.kernel.org/kvm/20230927172553-mutt-send-email-mst@kernel.org/
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_modern_dev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index aad7d9296e77..7fa70d7c8146 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -290,9 +290,9 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
 
 	err = -EINVAL;
 	mdev->common = vp_modern_map_capability(mdev, common,
-				      sizeof(struct virtio_pci_common_cfg), 4,
-				      0, sizeof(struct virtio_pci_common_cfg),
-				      NULL, NULL);
+				sizeof(struct virtio_pci_common_cfg), 4,
+				0, sizeof(struct virtio_pci_modern_common_cfg),
+				NULL, NULL);
 	if (!mdev->common)
 		goto err_map_common;
 	mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 2/9] virtio: Define feature bit for administration virtqueue
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce VIRTIO_F_ADMIN_VQ which is used for administration virtqueue
support.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_config.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 2c712c654165..09d694968b14 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		41
+#define VIRTIO_TRANSPORT_F_END		42
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -109,4 +109,10 @@
  * This feature indicates that the driver can reset a queue individually.
  */
 #define VIRTIO_F_RING_RESET		40
+
+/*
+ * This feature indicates that the device support administration virtqueues.
+ */
+#define VIRTIO_F_ADMIN_VQ		41
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 2/9] virtio: Define feature bit for administration virtqueue
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduce VIRTIO_F_ADMIN_VQ which is used for administration virtqueue
support.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_config.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 2c712c654165..09d694968b14 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		41
+#define VIRTIO_TRANSPORT_F_END		42
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -109,4 +109,10 @@
  * This feature indicates that the driver can reset a queue individually.
  */
 #define VIRTIO_F_RING_RESET		40
+
+/*
+ * This feature indicates that the device support administration virtqueues.
+ */
+#define VIRTIO_F_ADMIN_VQ		41
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 3/9] virtio-pci: Introduce admin virtqueue
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio.c                | 37 ++++++++++++++--
 drivers/virtio/virtio_pci_common.c     |  3 ++
 drivers/virtio/virtio_pci_common.h     | 15 ++++++-
 drivers/virtio/virtio_pci_modern.c     | 61 +++++++++++++++++++++++++-
 drivers/virtio/virtio_pci_modern_dev.c | 18 ++++++++
 include/linux/virtio_config.h          |  4 ++
 include/linux/virtio_pci_modern.h      |  5 +++
 7 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
 	if (err)
 		goto err;
 
+	if (dev->config->create_avq) {
+		err = dev->config->create_avq(dev);
+		if (err)
+			goto err;
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		goto err;
+		goto err_probe;
 
 	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
 	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
 	virtio_config_enable(dev);
 
 	return 0;
+
+err_probe:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
 
 	drv->remove(dev);
 
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
+
 	/* Driver should have reset device. */
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
 int virtio_device_freeze(struct virtio_device *dev)
 {
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+	int ret;
 
 	virtio_config_disable(dev);
 
 	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
 
-	if (drv && drv->freeze)
-		return drv->freeze(dev);
+	if (drv && drv->freeze) {
+		ret = drv->freeze(dev);
+		if (ret)
+			return ret;
+	}
+
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 
 	return 0;
 }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
 	if (ret)
 		goto err;
 
+	if (dev->config->create_avq) {
+		ret = dev->config->create_avq(dev);
+		if (ret)
+			goto err;
+	}
+
 	if (drv->restore) {
 		ret = drv->restore(dev);
 		if (ret)
-			goto err;
+			goto err_restore;
 	}
 
 	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	return 0;
 
+err_restore:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index c2524a7207cf..6b4766d5abe6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -236,6 +236,9 @@ void vp_del_vqs(struct virtio_device *vdev)
 	int i;
 
 	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+		if (vp_dev->is_avq(vdev, vq->index))
+			continue;
+
 		if (vp_dev->per_vq_vectors) {
 			int v = vp_dev->vqs[vq->index]->msix_vector;
 
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..e03af0966a4b 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
 	unsigned int msix_vector;
 };
 
+struct virtio_pci_admin_vq {
+	/* Virtqueue info associated with this admin queue. */
+	struct virtio_pci_vq_info info;
+	/* Name of the admin queue: avq.$index. */
+	char name[10];
+	u16 vq_index;
+};
+
 /* Our device structure */
 struct virtio_pci_device {
 	struct virtio_device vdev;
@@ -58,9 +66,13 @@ struct virtio_pci_device {
 	spinlock_t lock;
 	struct list_head virtqueues;
 
-	/* array of all queues for house-keeping */
+	/* Array of all virtqueues reported in the
+	 * PCI common config num_queues field
+	 */
 	struct virtio_pci_vq_info **vqs;
 
+	struct virtio_pci_admin_vq admin_vq;
+
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
@@ -86,6 +98,7 @@ struct virtio_pci_device {
 	void (*del_vq)(struct virtio_pci_vq_info *info);
 
 	u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector);
+	bool (*is_avq)(struct virtio_device *vdev, unsigned int index);
 };
 
 /* Constants for MSI-X */
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d6bb68ba84e5..01c5ba346471 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -26,6 +26,16 @@ static u64 vp_get_features(struct virtio_device *vdev)
 	return vp_modern_get_features(&vp_dev->mdev);
 }
 
+static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return false;
+
+	return index == vp_dev->admin_vq.vq_index;
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -37,6 +47,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
 
 	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
 		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
+
+	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
+		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -317,7 +330,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 	else
 		notify = vp_notify;
 
-	if (index >= vp_modern_get_num_queues(mdev))
+	if (index >= vp_modern_get_num_queues(mdev) &&
+	    !vp_is_avq(&vp_dev->vdev, index))
 		return ERR_PTR(-EINVAL);
 
 	/* Check if queue is either not available or already active. */
@@ -491,6 +505,46 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 	return true;
 }
 
+static int vp_modern_create_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *avq;
+	struct virtqueue *vq;
+	u16 admin_q_num;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return 0;
+
+	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
+	if (!admin_q_num)
+		return -EINVAL;
+
+	avq = &vp_dev->admin_vq;
+	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
+	sprintf(avq->name, "avq.%u", avq->vq_index);
+	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin_vq.info, avq->vq_index, NULL,
+			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
+	if (IS_ERR(vq)) {
+		dev_err(&vdev->dev, "failed to setup admin virtqueue, err=%ld",
+			PTR_ERR(vq));
+		return PTR_ERR(vq);
+	}
+
+	vp_dev->admin_vq.info.vq = vq;
+	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
+	return 0;
+}
+
+static void vp_modern_destroy_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	vp_dev->del_vq(&vp_dev->admin_vq.info);
+}
+
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get		= NULL,
 	.set		= NULL,
@@ -509,6 +563,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_modern_create_avq,
+	.destroy_avq = vp_modern_destroy_avq,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -529,6 +585,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_modern_create_avq,
+	.destroy_avq = vp_modern_destroy_avq,
 };
 
 /* the PCI probing function */
@@ -552,6 +610,7 @@ int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev)
 	vp_dev->config_vector = vp_config_vector;
 	vp_dev->setup_vq = setup_vq;
 	vp_dev->del_vq = del_vq;
+	vp_dev->is_avq = vp_is_avq;
 	vp_dev->isr = mdev->isr;
 	vp_dev->vdev.id = mdev->id;
 
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 7fa70d7c8146..229a32a4cb68 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -714,6 +714,24 @@ void __iomem *vp_modern_map_vq_notify(struct virtio_pci_modern_device *mdev,
 }
 EXPORT_SYMBOL_GPL(vp_modern_map_vq_notify);
 
+u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_num);
+}
+EXPORT_SYMBOL_GPL(vp_modern_avq_num);
+
+u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_index);
+}
+EXPORT_SYMBOL_GPL(vp_modern_avq_index);
+
 MODULE_VERSION("0.1");
 MODULE_DESCRIPTION("Modern Virtio PCI Device");
 MODULE_AUTHOR("Jason Wang <jasowang@redhat.com>");
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 2b3438de2c4d..da9b271b54db 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
  *	Returns 0 on success or error status
  *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
  *	set.
+ * @create_avq: create admin virtqueue resource.
+ * @destroy_avq: destroy admin virtqueue resource.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -120,6 +122,8 @@ struct virtio_config_ops {
 			       struct virtio_shm_region *region, u8 id);
 	int (*disable_vq_and_reset)(struct virtqueue *vq);
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
+	int (*create_avq)(struct virtio_device *vdev);
+	void (*destroy_avq)(struct virtio_device *vdev);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index 067ac1d789bc..0f8737c9ae7d 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
 
 	__le16 queue_notify_data;	/* read-write */
 	__le16 queue_reset;		/* read-write */
+
+	__le16 admin_queue_index;	/* read-only */
+	__le16 admin_queue_num;		/* read-only */
 };
 
 struct virtio_pci_modern_device {
@@ -121,4 +124,6 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev);
 void vp_modern_remove(struct virtio_pci_modern_device *mdev);
 int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
 void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
+u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev);
+u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev);
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 3/9] virtio-pci: Introduce admin virtqueue
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio.c                | 37 ++++++++++++++--
 drivers/virtio/virtio_pci_common.c     |  3 ++
 drivers/virtio/virtio_pci_common.h     | 15 ++++++-
 drivers/virtio/virtio_pci_modern.c     | 61 +++++++++++++++++++++++++-
 drivers/virtio/virtio_pci_modern_dev.c | 18 ++++++++
 include/linux/virtio_config.h          |  4 ++
 include/linux/virtio_pci_modern.h      |  5 +++
 7 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
 	if (err)
 		goto err;
 
+	if (dev->config->create_avq) {
+		err = dev->config->create_avq(dev);
+		if (err)
+			goto err;
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		goto err;
+		goto err_probe;
 
 	/* If probe didn't do it, mark device DRIVER_OK ourselves. */
 	if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
 	virtio_config_enable(dev);
 
 	return 0;
+
+err_probe:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
 
 	drv->remove(dev);
 
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
+
 	/* Driver should have reset device. */
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
 int virtio_device_freeze(struct virtio_device *dev)
 {
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+	int ret;
 
 	virtio_config_disable(dev);
 
 	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
 
-	if (drv && drv->freeze)
-		return drv->freeze(dev);
+	if (drv && drv->freeze) {
+		ret = drv->freeze(dev);
+		if (ret)
+			return ret;
+	}
+
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 
 	return 0;
 }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
 	if (ret)
 		goto err;
 
+	if (dev->config->create_avq) {
+		ret = dev->config->create_avq(dev);
+		if (ret)
+			goto err;
+	}
+
 	if (drv->restore) {
 		ret = drv->restore(dev);
 		if (ret)
-			goto err;
+			goto err_restore;
 	}
 
 	/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
 
 	return 0;
 
+err_restore:
+	if (dev->config->destroy_avq)
+		dev->config->destroy_avq(dev);
 err:
 	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index c2524a7207cf..6b4766d5abe6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -236,6 +236,9 @@ void vp_del_vqs(struct virtio_device *vdev)
 	int i;
 
 	list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+		if (vp_dev->is_avq(vdev, vq->index))
+			continue;
+
 		if (vp_dev->per_vq_vectors) {
 			int v = vp_dev->vqs[vq->index]->msix_vector;
 
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..e03af0966a4b 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
 	unsigned int msix_vector;
 };
 
+struct virtio_pci_admin_vq {
+	/* Virtqueue info associated with this admin queue. */
+	struct virtio_pci_vq_info info;
+	/* Name of the admin queue: avq.$index. */
+	char name[10];
+	u16 vq_index;
+};
+
 /* Our device structure */
 struct virtio_pci_device {
 	struct virtio_device vdev;
@@ -58,9 +66,13 @@ struct virtio_pci_device {
 	spinlock_t lock;
 	struct list_head virtqueues;
 
-	/* array of all queues for house-keeping */
+	/* Array of all virtqueues reported in the
+	 * PCI common config num_queues field
+	 */
 	struct virtio_pci_vq_info **vqs;
 
+	struct virtio_pci_admin_vq admin_vq;
+
 	/* MSI-X support */
 	int msix_enabled;
 	int intx_enabled;
@@ -86,6 +98,7 @@ struct virtio_pci_device {
 	void (*del_vq)(struct virtio_pci_vq_info *info);
 
 	u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector);
+	bool (*is_avq)(struct virtio_device *vdev, unsigned int index);
 };
 
 /* Constants for MSI-X */
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index d6bb68ba84e5..01c5ba346471 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -26,6 +26,16 @@ static u64 vp_get_features(struct virtio_device *vdev)
 	return vp_modern_get_features(&vp_dev->mdev);
 }
 
+static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return false;
+
+	return index == vp_dev->admin_vq.vq_index;
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -37,6 +47,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features)
 
 	if (features & BIT_ULL(VIRTIO_F_RING_RESET))
 		__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
+
+	if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ))
+		__virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -317,7 +330,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
 	else
 		notify = vp_notify;
 
-	if (index >= vp_modern_get_num_queues(mdev))
+	if (index >= vp_modern_get_num_queues(mdev) &&
+	    !vp_is_avq(&vp_dev->vdev, index))
 		return ERR_PTR(-EINVAL);
 
 	/* Check if queue is either not available or already active. */
@@ -491,6 +505,46 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 	return true;
 }
 
+static int vp_modern_create_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *avq;
+	struct virtqueue *vq;
+	u16 admin_q_num;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return 0;
+
+	admin_q_num = vp_modern_avq_num(&vp_dev->mdev);
+	if (!admin_q_num)
+		return -EINVAL;
+
+	avq = &vp_dev->admin_vq;
+	avq->vq_index = vp_modern_avq_index(&vp_dev->mdev);
+	sprintf(avq->name, "avq.%u", avq->vq_index);
+	vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin_vq.info, avq->vq_index, NULL,
+			      avq->name, NULL, VIRTIO_MSI_NO_VECTOR);
+	if (IS_ERR(vq)) {
+		dev_err(&vdev->dev, "failed to setup admin virtqueue, err=%ld",
+			PTR_ERR(vq));
+		return PTR_ERR(vq);
+	}
+
+	vp_dev->admin_vq.info.vq = vq;
+	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
+	return 0;
+}
+
+static void vp_modern_destroy_avq(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	vp_dev->del_vq(&vp_dev->admin_vq.info);
+}
+
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get		= NULL,
 	.set		= NULL,
@@ -509,6 +563,8 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_modern_create_avq,
+	.destroy_avq = vp_modern_destroy_avq,
 };
 
 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -529,6 +585,8 @@ static const struct virtio_config_ops virtio_pci_config_ops = {
 	.get_shm_region  = vp_get_shm_region,
 	.disable_vq_and_reset = vp_modern_disable_vq_and_reset,
 	.enable_vq_after_reset = vp_modern_enable_vq_after_reset,
+	.create_avq = vp_modern_create_avq,
+	.destroy_avq = vp_modern_destroy_avq,
 };
 
 /* the PCI probing function */
@@ -552,6 +610,7 @@ int virtio_pci_modern_probe(struct virtio_pci_device *vp_dev)
 	vp_dev->config_vector = vp_config_vector;
 	vp_dev->setup_vq = setup_vq;
 	vp_dev->del_vq = del_vq;
+	vp_dev->is_avq = vp_is_avq;
 	vp_dev->isr = mdev->isr;
 	vp_dev->vdev.id = mdev->id;
 
diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c
index 7fa70d7c8146..229a32a4cb68 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -714,6 +714,24 @@ void __iomem *vp_modern_map_vq_notify(struct virtio_pci_modern_device *mdev,
 }
 EXPORT_SYMBOL_GPL(vp_modern_map_vq_notify);
 
+u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_num);
+}
+EXPORT_SYMBOL_GPL(vp_modern_avq_num);
+
+u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev)
+{
+	struct virtio_pci_modern_common_cfg __iomem *cfg;
+
+	cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
+	return vp_ioread16(&cfg->admin_queue_index);
+}
+EXPORT_SYMBOL_GPL(vp_modern_avq_index);
+
 MODULE_VERSION("0.1");
 MODULE_DESCRIPTION("Modern Virtio PCI Device");
 MODULE_AUTHOR("Jason Wang <jasowang@redhat.com>");
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 2b3438de2c4d..da9b271b54db 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -93,6 +93,8 @@ typedef void vq_callback_t(struct virtqueue *);
  *	Returns 0 on success or error status
  *	If disable_vq_and_reset is set, then enable_vq_after_reset must also be
  *	set.
+ * @create_avq: create admin virtqueue resource.
+ * @destroy_avq: destroy admin virtqueue resource.
  */
 struct virtio_config_ops {
 	void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -120,6 +122,8 @@ struct virtio_config_ops {
 			       struct virtio_shm_region *region, u8 id);
 	int (*disable_vq_and_reset)(struct virtqueue *vq);
 	int (*enable_vq_after_reset)(struct virtqueue *vq);
+	int (*create_avq)(struct virtio_device *vdev);
+	void (*destroy_avq)(struct virtio_device *vdev);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h
index 067ac1d789bc..0f8737c9ae7d 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -10,6 +10,9 @@ struct virtio_pci_modern_common_cfg {
 
 	__le16 queue_notify_data;	/* read-write */
 	__le16 queue_reset;		/* read-write */
+
+	__le16 admin_queue_index;	/* read-only */
+	__le16 admin_queue_num;		/* read-only */
 };
 
 struct virtio_pci_modern_device {
@@ -121,4 +124,6 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev);
 void vp_modern_remove(struct virtio_pci_modern_device *mdev);
 int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
 void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index);
+u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev);
+u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev);
 #endif
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 4/9] virtio-pci: Introduce admin command sending function
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Add support for sending admin command through admin virtqueue interface.
Abort any inflight admin commands once device reset completes.

To enforce the below statement from the specification [1], the admin
queue is activated for the upper layer users only post of setting status
to DRIVER_OK.

[1] The driver MUST NOT send any buffer available notifications to the
device before setting DRIVER_OK.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.h |   3 +
 drivers/virtio/virtio_pci_modern.c | 174 +++++++++++++++++++++++++++++
 include/linux/virtio.h             |   8 ++
 include/uapi/linux/virtio_pci.h    |  22 ++++
 4 files changed, 207 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index e03af0966a4b..a21b9ba01a60 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -44,9 +44,12 @@ struct virtio_pci_vq_info {
 struct virtio_pci_admin_vq {
 	/* Virtqueue info associated with this admin queue. */
 	struct virtio_pci_vq_info info;
+	struct completion flush_done;
+	refcount_t refcount;
 	/* Name of the admin queue: avq.$index. */
 	char name[10];
 	u16 vq_index;
+	bool abort;
 };
 
 /* Our device structure */
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 01c5ba346471..cc159a8e6c70 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -36,6 +36,58 @@ static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
 	return index == vp_dev->admin_vq.vq_index;
 }
 
+static bool vp_modern_avq_get(struct virtio_pci_admin_vq *admin_vq)
+{
+	return refcount_inc_not_zero(&admin_vq->refcount);
+}
+
+static void vp_modern_avq_put(struct virtio_pci_admin_vq *admin_vq)
+{
+	if (refcount_dec_and_test(&admin_vq->refcount))
+		complete(&admin_vq->flush_done);
+}
+
+static bool vp_modern_avq_is_abort(const struct virtio_pci_admin_vq *admin_vq)
+{
+	return READ_ONCE(admin_vq->abort);
+}
+
+static void
+vp_modern_avq_set_abort(struct virtio_pci_admin_vq *admin_vq, bool abort)
+{
+	/* Mark the AVQ to abort, so that inflight commands can be aborted. */
+	WRITE_ONCE(admin_vq->abort, abort);
+}
+
+static void vp_modern_avq_activate(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	init_completion(&admin_vq->flush_done);
+	refcount_set(&admin_vq->refcount, 1);
+	vp_modern_avq_set_abort(admin_vq, false);
+}
+
+static void vp_modern_avq_deactivate(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	vp_modern_avq_set_abort(admin_vq, true);
+	/* Balance with refcount_set() during vp_modern_avq_activate */
+	vp_modern_avq_put(admin_vq);
+
+	/* Wait for all the inflight admin commands to be aborted */
+	wait_for_completion(&vp_dev->admin_vq.flush_done);
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -172,6 +224,8 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
 	vp_modern_set_status(&vp_dev->mdev, status);
+	if (status & VIRTIO_CONFIG_S_DRIVER_OK)
+		vp_modern_avq_activate(vdev);
 }
 
 static void vp_reset(struct virtio_device *vdev)
@@ -188,6 +242,9 @@ static void vp_reset(struct virtio_device *vdev)
 	 */
 	while (vp_modern_get_status(mdev))
 		msleep(1);
+
+	vp_modern_avq_deactivate(vdev);
+
 	/* Flush pending VQ/configuration callbacks. */
 	vp_synchronize_vectors(vdev);
 }
@@ -505,6 +562,121 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 	return true;
 }
 
+static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
+				    struct scatterlist **sgs,
+				    unsigned int out_num,
+				    unsigned int in_num,
+				    void *data,
+				    gfp_t gfp)
+{
+	struct virtqueue *vq;
+	int ret, len;
+
+	if (!vp_modern_avq_get(admin_vq))
+		return -EIO;
+
+	vq = admin_vq->info.vq;
+
+	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, gfp);
+	if (ret < 0)
+		goto out;
+
+	if (unlikely(!virtqueue_kick(vq))) {
+		ret = -EIO;
+		goto out;
+	}
+
+	while (!virtqueue_get_buf(vq, &len) &&
+	       !virtqueue_is_broken(vq) &&
+	       !vp_modern_avq_is_abort(admin_vq))
+		cpu_relax();
+
+	if (vp_modern_avq_is_abort(admin_vq)) {
+		ret = -EIO;
+		goto out;
+	}
+out:
+	vp_modern_avq_put(admin_vq);
+	return ret;
+}
+
+#define VIRTIO_AVQ_SGS_MAX	4
+
+static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
+				    struct virtio_admin_cmd *cmd)
+{
+	struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_admin_cmd_status *va_status;
+	unsigned int out_num = 0, in_num = 0;
+	struct virtio_admin_cmd_hdr *va_hdr;
+	struct virtqueue *avq;
+	u16 status;
+	int ret;
+
+	avq = virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ) ?
+		vp_dev->admin_vq.info.vq : NULL;
+	if (!avq)
+		return -EOPNOTSUPP;
+
+	va_status = kzalloc(sizeof(*va_status), GFP_KERNEL);
+	if (!va_status)
+		return -ENOMEM;
+
+	va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL);
+	if (!va_hdr) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	va_hdr->opcode = cmd->opcode;
+	va_hdr->group_type = cmd->group_type;
+	va_hdr->group_member_id = cmd->group_member_id;
+
+	/* Add header */
+	sg_init_one(&hdr, va_hdr, sizeof(*va_hdr));
+	sgs[out_num] = &hdr;
+	out_num++;
+
+	if (cmd->data_sg) {
+		sgs[out_num] = cmd->data_sg;
+		out_num++;
+	}
+
+	/* Add return status */
+	sg_init_one(&stat, va_status, sizeof(*va_status));
+	sgs[out_num + in_num] = &stat;
+	in_num++;
+
+	if (cmd->result_sg) {
+		sgs[out_num + in_num] = cmd->result_sg;
+		in_num++;
+	}
+
+	ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq, sgs,
+				       out_num, in_num,
+				       sgs, GFP_KERNEL);
+	if (ret) {
+		dev_err(&vdev->dev,
+			"Failed to execute command on admin vq: %d\n.", ret);
+		goto err_cmd_exec;
+	}
+
+	status = le16_to_cpu(va_status->status);
+	if (status != VIRTIO_ADMIN_STATUS_OK) {
+		dev_err(&vdev->dev,
+			"admin command error: status(%#x) qualifier(%#x)\n",
+			status, le16_to_cpu(va_status->status_qualifier));
+		ret = -status;
+	}
+
+err_cmd_exec:
+	kfree(va_hdr);
+err_alloc:
+	kfree(va_status);
+	return ret;
+}
+
 static int vp_modern_create_avq(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -530,6 +702,7 @@ static int vp_modern_create_avq(struct virtio_device *vdev)
 		return PTR_ERR(vq);
 	}
 
+	refcount_set(&vp_dev->admin_vq.refcount, 0);
 	vp_dev->admin_vq.info.vq = vq;
 	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
 	return 0;
@@ -542,6 +715,7 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
 	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
 		return;
 
+	WARN_ON(refcount_read(&vp_dev->admin_vq.refcount));
 	vp_dev->del_vq(&vp_dev->admin_vq.info);
 }
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4cc614a38376..b0201747a263 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -103,6 +103,14 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 int virtqueue_reset(struct virtqueue *vq,
 		    void (*recycle)(struct virtqueue *vq, void *buf));
 
+struct virtio_admin_cmd {
+	__le16 opcode;
+	__le16 group_type;
+	__le64 group_member_id;
+	struct scatterlist *data_sg;
+	struct scatterlist *result_sg;
+};
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index f703afc7ad31..68eacc9676dc 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -207,4 +207,26 @@ struct virtio_pci_cfg_cap {
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
+/* Admin command status. */
+#define VIRTIO_ADMIN_STATUS_OK		0
+
+struct __packed virtio_admin_cmd_hdr {
+	__le16 opcode;
+	/*
+	 * 1 - SR-IOV
+	 * 2-65535 - reserved
+	 */
+	__le16 group_type;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved1[12];
+	__le64 group_member_id;
+};
+
+struct __packed virtio_admin_cmd_status {
+	__le16 status;
+	__le16 status_qualifier;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved2[4];
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 4/9] virtio-pci: Introduce admin command sending function
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Add support for sending admin command through admin virtqueue interface.
Abort any inflight admin commands once device reset completes.

To enforce the below statement from the specification [1], the admin
queue is activated for the upper layer users only post of setting status
to DRIVER_OK.

[1] The driver MUST NOT send any buffer available notifications to the
device before setting DRIVER_OK.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.h |   3 +
 drivers/virtio/virtio_pci_modern.c | 174 +++++++++++++++++++++++++++++
 include/linux/virtio.h             |   8 ++
 include/uapi/linux/virtio_pci.h    |  22 ++++
 4 files changed, 207 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index e03af0966a4b..a21b9ba01a60 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -44,9 +44,12 @@ struct virtio_pci_vq_info {
 struct virtio_pci_admin_vq {
 	/* Virtqueue info associated with this admin queue. */
 	struct virtio_pci_vq_info info;
+	struct completion flush_done;
+	refcount_t refcount;
 	/* Name of the admin queue: avq.$index. */
 	char name[10];
 	u16 vq_index;
+	bool abort;
 };
 
 /* Our device structure */
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 01c5ba346471..cc159a8e6c70 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -36,6 +36,58 @@ static bool vp_is_avq(struct virtio_device *vdev, unsigned int index)
 	return index == vp_dev->admin_vq.vq_index;
 }
 
+static bool vp_modern_avq_get(struct virtio_pci_admin_vq *admin_vq)
+{
+	return refcount_inc_not_zero(&admin_vq->refcount);
+}
+
+static void vp_modern_avq_put(struct virtio_pci_admin_vq *admin_vq)
+{
+	if (refcount_dec_and_test(&admin_vq->refcount))
+		complete(&admin_vq->flush_done);
+}
+
+static bool vp_modern_avq_is_abort(const struct virtio_pci_admin_vq *admin_vq)
+{
+	return READ_ONCE(admin_vq->abort);
+}
+
+static void
+vp_modern_avq_set_abort(struct virtio_pci_admin_vq *admin_vq, bool abort)
+{
+	/* Mark the AVQ to abort, so that inflight commands can be aborted. */
+	WRITE_ONCE(admin_vq->abort, abort);
+}
+
+static void vp_modern_avq_activate(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	init_completion(&admin_vq->flush_done);
+	refcount_set(&admin_vq->refcount, 1);
+	vp_modern_avq_set_abort(admin_vq, false);
+}
+
+static void vp_modern_avq_deactivate(struct virtio_device *vdev)
+{
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+		return;
+
+	vp_modern_avq_set_abort(admin_vq, true);
+	/* Balance with refcount_set() during vp_modern_avq_activate */
+	vp_modern_avq_put(admin_vq);
+
+	/* Wait for all the inflight admin commands to be aborted */
+	wait_for_completion(&vp_dev->admin_vq.flush_done);
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -172,6 +224,8 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
 	/* We should never be setting status to 0. */
 	BUG_ON(status == 0);
 	vp_modern_set_status(&vp_dev->mdev, status);
+	if (status & VIRTIO_CONFIG_S_DRIVER_OK)
+		vp_modern_avq_activate(vdev);
 }
 
 static void vp_reset(struct virtio_device *vdev)
@@ -188,6 +242,9 @@ static void vp_reset(struct virtio_device *vdev)
 	 */
 	while (vp_modern_get_status(mdev))
 		msleep(1);
+
+	vp_modern_avq_deactivate(vdev);
+
 	/* Flush pending VQ/configuration callbacks. */
 	vp_synchronize_vectors(vdev);
 }
@@ -505,6 +562,121 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
 	return true;
 }
 
+static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
+				    struct scatterlist **sgs,
+				    unsigned int out_num,
+				    unsigned int in_num,
+				    void *data,
+				    gfp_t gfp)
+{
+	struct virtqueue *vq;
+	int ret, len;
+
+	if (!vp_modern_avq_get(admin_vq))
+		return -EIO;
+
+	vq = admin_vq->info.vq;
+
+	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, gfp);
+	if (ret < 0)
+		goto out;
+
+	if (unlikely(!virtqueue_kick(vq))) {
+		ret = -EIO;
+		goto out;
+	}
+
+	while (!virtqueue_get_buf(vq, &len) &&
+	       !virtqueue_is_broken(vq) &&
+	       !vp_modern_avq_is_abort(admin_vq))
+		cpu_relax();
+
+	if (vp_modern_avq_is_abort(admin_vq)) {
+		ret = -EIO;
+		goto out;
+	}
+out:
+	vp_modern_avq_put(admin_vq);
+	return ret;
+}
+
+#define VIRTIO_AVQ_SGS_MAX	4
+
+static int vp_modern_admin_cmd_exec(struct virtio_device *vdev,
+				    struct virtio_admin_cmd *cmd)
+{
+	struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat;
+	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	struct virtio_admin_cmd_status *va_status;
+	unsigned int out_num = 0, in_num = 0;
+	struct virtio_admin_cmd_hdr *va_hdr;
+	struct virtqueue *avq;
+	u16 status;
+	int ret;
+
+	avq = virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ) ?
+		vp_dev->admin_vq.info.vq : NULL;
+	if (!avq)
+		return -EOPNOTSUPP;
+
+	va_status = kzalloc(sizeof(*va_status), GFP_KERNEL);
+	if (!va_status)
+		return -ENOMEM;
+
+	va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL);
+	if (!va_hdr) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	va_hdr->opcode = cmd->opcode;
+	va_hdr->group_type = cmd->group_type;
+	va_hdr->group_member_id = cmd->group_member_id;
+
+	/* Add header */
+	sg_init_one(&hdr, va_hdr, sizeof(*va_hdr));
+	sgs[out_num] = &hdr;
+	out_num++;
+
+	if (cmd->data_sg) {
+		sgs[out_num] = cmd->data_sg;
+		out_num++;
+	}
+
+	/* Add return status */
+	sg_init_one(&stat, va_status, sizeof(*va_status));
+	sgs[out_num + in_num] = &stat;
+	in_num++;
+
+	if (cmd->result_sg) {
+		sgs[out_num + in_num] = cmd->result_sg;
+		in_num++;
+	}
+
+	ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq, sgs,
+				       out_num, in_num,
+				       sgs, GFP_KERNEL);
+	if (ret) {
+		dev_err(&vdev->dev,
+			"Failed to execute command on admin vq: %d\n.", ret);
+		goto err_cmd_exec;
+	}
+
+	status = le16_to_cpu(va_status->status);
+	if (status != VIRTIO_ADMIN_STATUS_OK) {
+		dev_err(&vdev->dev,
+			"admin command error: status(%#x) qualifier(%#x)\n",
+			status, le16_to_cpu(va_status->status_qualifier));
+		ret = -status;
+	}
+
+err_cmd_exec:
+	kfree(va_hdr);
+err_alloc:
+	kfree(va_status);
+	return ret;
+}
+
 static int vp_modern_create_avq(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -530,6 +702,7 @@ static int vp_modern_create_avq(struct virtio_device *vdev)
 		return PTR_ERR(vq);
 	}
 
+	refcount_set(&vp_dev->admin_vq.refcount, 0);
 	vp_dev->admin_vq.info.vq = vq;
 	vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true);
 	return 0;
@@ -542,6 +715,7 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
 	if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
 		return;
 
+	WARN_ON(refcount_read(&vp_dev->admin_vq.refcount));
 	vp_dev->del_vq(&vp_dev->admin_vq.info);
 }
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4cc614a38376..b0201747a263 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -103,6 +103,14 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
 int virtqueue_reset(struct virtqueue *vq,
 		    void (*recycle)(struct virtqueue *vq, void *buf));
 
+struct virtio_admin_cmd {
+	__le16 opcode;
+	__le16 group_type;
+	__le64 group_member_id;
+	struct scatterlist *data_sg;
+	struct scatterlist *result_sg;
+};
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index f703afc7ad31..68eacc9676dc 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -207,4 +207,26 @@ struct virtio_pci_cfg_cap {
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
+/* Admin command status. */
+#define VIRTIO_ADMIN_STATUS_OK		0
+
+struct __packed virtio_admin_cmd_hdr {
+	__le16 opcode;
+	/*
+	 * 1 - SR-IOV
+	 * 2-65535 - reserved
+	 */
+	__le16 group_type;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved1[12];
+	__le64 group_member_id;
+};
+
+struct __packed virtio_admin_cmd_status {
+	__le16 status;
+	__le16 status_qualifier;
+	/* Unused, reserved for future extensions. */
+	__u8 reserved2[4];
+};
+
 #endif
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 5/9] virtio-pci: Introduce admin commands
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

From: Feng Liu <feliu@nvidia.com>

Introduces admin commands, as follow:

The "list query" command can be used by the driver to query the
set of admin commands supported by the virtio device.
The "list use" command is used to inform the virtio device which
admin commands the driver will use.
The "legacy common cfg rd/wr" commands are used to read from/write
into the legacy common configuration structure.
The "legacy dev cfg rd/wr" commands are used to read from/write
into the legacy device configuration structure.
The "notify info" command is used to query the notification region
information.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_pci.h | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 68eacc9676dc..6e42c211fc08 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -210,6 +210,23 @@ struct virtio_pci_cfg_cap {
 /* Admin command status. */
 #define VIRTIO_ADMIN_STATUS_OK		0
 
+/* Admin command opcode. */
+#define VIRTIO_ADMIN_CMD_LIST_QUERY	0x0
+#define VIRTIO_ADMIN_CMD_LIST_USE	0x1
+
+/* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
+
+/* Transitional device admin command. */
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
+#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
+
+/* Increment MAX_OPCODE to next value when new opcode is added */
+#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
+
 struct __packed virtio_admin_cmd_hdr {
 	__le16 opcode;
 	/*
@@ -229,4 +246,31 @@ struct __packed virtio_admin_cmd_status {
 	__u8 reserved2[4];
 };
 
+struct __packed virtio_admin_cmd_legacy_wr_data {
+	__u8 offset; /* Starting offset of the register(s) to write. */
+	__u8 reserved[7];
+	__u8 registers[];
+};
+
+struct __packed virtio_admin_cmd_legacy_rd_data {
+	__u8 offset; /* Starting offset of the register(s) to read. */
+};
+
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
+
+#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
+
+struct __packed virtio_admin_cmd_notify_info_data {
+	__u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
+	__u8 bar; /* BAR of the member or the owner device */
+	__u8 padding[6];
+	__le64 offset; /* Offset within bar. */
+};
+
+struct virtio_admin_cmd_notify_info_result {
+	struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
+};
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 5/9] virtio-pci: Introduce admin commands
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

From: Feng Liu <feliu@nvidia.com>

Introduces admin commands, as follow:

The "list query" command can be used by the driver to query the
set of admin commands supported by the virtio device.
The "list use" command is used to inform the virtio device which
admin commands the driver will use.
The "legacy common cfg rd/wr" commands are used to read from/write
into the legacy common configuration structure.
The "legacy dev cfg rd/wr" commands are used to read from/write
into the legacy device configuration structure.
The "notify info" command is used to query the notification region
information.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/uapi/linux/virtio_pci.h | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 68eacc9676dc..6e42c211fc08 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -210,6 +210,23 @@ struct virtio_pci_cfg_cap {
 /* Admin command status. */
 #define VIRTIO_ADMIN_STATUS_OK		0
 
+/* Admin command opcode. */
+#define VIRTIO_ADMIN_CMD_LIST_QUERY	0x0
+#define VIRTIO_ADMIN_CMD_LIST_USE	0x1
+
+/* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
+
+/* Transitional device admin command. */
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE	0x2
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ		0x3
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE		0x4
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ		0x5
+#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO		0x6
+
+/* Increment MAX_OPCODE to next value when new opcode is added */
+#define VIRTIO_ADMIN_MAX_CMD_OPCODE			0x6
+
 struct __packed virtio_admin_cmd_hdr {
 	__le16 opcode;
 	/*
@@ -229,4 +246,31 @@ struct __packed virtio_admin_cmd_status {
 	__u8 reserved2[4];
 };
 
+struct __packed virtio_admin_cmd_legacy_wr_data {
+	__u8 offset; /* Starting offset of the register(s) to write. */
+	__u8 reserved[7];
+	__u8 registers[];
+};
+
+struct __packed virtio_admin_cmd_legacy_rd_data {
+	__u8 offset; /* Starting offset of the register(s) to read. */
+};
+
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
+
+#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
+
+struct __packed virtio_admin_cmd_notify_info_data {
+	__u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
+	__u8 bar; /* BAR of the member or the owner device */
+	__u8 padding[6];
+	__le64 offset; /* Offset within bar. */
+};
+
+struct virtio_admin_cmd_notify_info_result {
+	struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
+};
+
 #endif
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Introduce APIs to execute legacy IO admin commands.

It includes: list_query/use, io_legacy_read/write,
io_legacy_notify_info.

Those APIs will be used by the next patches from this series.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c |  11 ++
 drivers/virtio/virtio_pci_common.h |   2 +
 drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
 include/linux/virtio_pci_admin.h   |  18 +++
 4 files changed, 237 insertions(+)
 create mode 100644 include/linux/virtio_pci_admin.h

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 6b4766d5abe6..212d68401d2c 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
 	.sriov_configure = virtio_pci_sriov_configure,
 };
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
+{
+	struct virtio_pci_device *pf_vp_dev;
+
+	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
+	if (IS_ERR(pf_vp_dev))
+		return NULL;
+
+	return &pf_vp_dev->vdev;
+}
+
 module_pci_driver(virtio_pci_driver);
 
 MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index a21b9ba01a60..2785e61ed668 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
 int virtio_pci_modern_probe(struct virtio_pci_device *);
 void virtio_pci_modern_remove(struct virtio_pci_device *);
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
+
 #endif
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index cc159a8e6c70..00b65e20b2f5 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
 	vp_dev->del_vq(&vp_dev->admin_vq.info);
 }
 
+/*
+ * virtio_pci_admin_list_query - Provides to driver list of commands
+ * supported for the PCI VF.
+ * @dev: VF pci_dev
+ * @buf: buffer to hold the returned list
+ * @buf_size: size of the given buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist result_sg;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	sg_init_one(&result_sg, buf, buf_size);
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.result_sg = &result_sg;
+
+	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
+
+/*
+ * virtio_pci_admin_list_use - Provides to device list of commands
+ * used for the PCI VF.
+ * @dev: VF pci_dev
+ * @buf: buffer which holds the list
+ * @buf_size: size of the given buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist data_sg;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	sg_init_one(&data_sg, buf, buf_size);
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.data_sg = &data_sg;
+
+	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
+
+/*
+ * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
+ * @dev: VF pci_dev
+ * @opcode: op code of the io write command
+ * @offset: starting byte offset within the registers to write to
+ * @size: size of the data to write
+ * @buf: buffer which holds the data
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
+				     u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_legacy_wr_data *data;
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist data_sg;
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->offset = offset;
+	memcpy(data->registers, buf, size);
+	sg_init_one(&data_sg, data, sizeof(*data) + size);
+	cmd.opcode = cpu_to_le16(opcode);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.data_sg = &data_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(data);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
+
+/*
+ * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
+ * @dev: VF pci_dev
+ * @opcode: op code of the io read command
+ * @offset: starting byte offset within the registers to read from
+ * @size: size of the data to be read
+ * @buf: buffer to hold the returned data
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
+				    u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_legacy_rd_data *data;
+	struct scatterlist data_sg, result_sg;
+	struct virtio_admin_cmd cmd = {};
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->offset = offset;
+	sg_init_one(&data_sg, data, sizeof(*data));
+	sg_init_one(&result_sg, buf, size);
+	cmd.opcode = cpu_to_le16(opcode);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.data_sg = &data_sg;
+	cmd.result_sg = &result_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(data);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
+
+/*
+ * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
+ * information for legacy interface
+ * @dev: VF pci_dev
+ * @req_bar_flags: requested bar flags
+ * @bar: on output the BAR number of the member device
+ * @bar_offset: on output the offset within bar
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
+					   u8 req_bar_flags, u8 *bar,
+					   u64 *bar_offset)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_notify_info_result *result;
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist result_sg;
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	result = kzalloc(sizeof(*result), GFP_KERNEL);
+	if (!result)
+		return -ENOMEM;
+
+	sg_init_one(&result_sg, result, sizeof(*result));
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.result_sg = &result_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+	if (!ret) {
+		struct virtio_admin_cmd_notify_info_data *entry;
+		int i;
+
+		ret = -ENOENT;
+		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
+			entry = &result->entries[i];
+			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
+				break;
+			if (entry->flags != req_bar_flags)
+				continue;
+			*bar = entry->bar;
+			*bar_offset = le64_to_cpu(entry->offset);
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(result);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
+
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get		= NULL,
 	.set		= NULL,
diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
new file mode 100644
index 000000000000..cb916a4bc1b1
--- /dev/null
+++ b/include/linux/virtio_pci_admin.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
+#define _LINUX_VIRTIO_PCI_ADMIN_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
+				     u8 offset, u8 size, u8 *buf);
+int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
+				    u8 offset, u8 size, u8 *buf);
+int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
+					   u8 req_bar_flags, u8 *bar,
+					   u64 *bar_offset);
+
+#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Introduce APIs to execute legacy IO admin commands.

It includes: list_query/use, io_legacy_read/write,
io_legacy_notify_info.

Those APIs will be used by the next patches from this series.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/virtio/virtio_pci_common.c |  11 ++
 drivers/virtio/virtio_pci_common.h |   2 +
 drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
 include/linux/virtio_pci_admin.h   |  18 +++
 4 files changed, 237 insertions(+)
 create mode 100644 include/linux/virtio_pci_admin.h

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 6b4766d5abe6..212d68401d2c 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
 	.sriov_configure = virtio_pci_sriov_configure,
 };
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
+{
+	struct virtio_pci_device *pf_vp_dev;
+
+	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
+	if (IS_ERR(pf_vp_dev))
+		return NULL;
+
+	return &pf_vp_dev->vdev;
+}
+
 module_pci_driver(virtio_pci_driver);
 
 MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index a21b9ba01a60..2785e61ed668 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
 int virtio_pci_modern_probe(struct virtio_pci_device *);
 void virtio_pci_modern_remove(struct virtio_pci_device *);
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
+
 #endif
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index cc159a8e6c70..00b65e20b2f5 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
 	vp_dev->del_vq(&vp_dev->admin_vq.info);
 }
 
+/*
+ * virtio_pci_admin_list_query - Provides to driver list of commands
+ * supported for the PCI VF.
+ * @dev: VF pci_dev
+ * @buf: buffer to hold the returned list
+ * @buf_size: size of the given buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist result_sg;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	sg_init_one(&result_sg, buf, buf_size);
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.result_sg = &result_sg;
+
+	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
+
+/*
+ * virtio_pci_admin_list_use - Provides to device list of commands
+ * used for the PCI VF.
+ * @dev: VF pci_dev
+ * @buf: buffer which holds the list
+ * @buf_size: size of the given buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist data_sg;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	sg_init_one(&data_sg, buf, buf_size);
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.data_sg = &data_sg;
+
+	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
+
+/*
+ * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
+ * @dev: VF pci_dev
+ * @opcode: op code of the io write command
+ * @offset: starting byte offset within the registers to write to
+ * @size: size of the data to write
+ * @buf: buffer which holds the data
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
+				     u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_legacy_wr_data *data;
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist data_sg;
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->offset = offset;
+	memcpy(data->registers, buf, size);
+	sg_init_one(&data_sg, data, sizeof(*data) + size);
+	cmd.opcode = cpu_to_le16(opcode);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.data_sg = &data_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(data);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
+
+/*
+ * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
+ * @dev: VF pci_dev
+ * @opcode: op code of the io read command
+ * @offset: starting byte offset within the registers to read from
+ * @size: size of the data to be read
+ * @buf: buffer to hold the returned data
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
+				    u8 offset, u8 size, u8 *buf)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_legacy_rd_data *data;
+	struct scatterlist data_sg, result_sg;
+	struct virtio_admin_cmd cmd = {};
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->offset = offset;
+	sg_init_one(&data_sg, data, sizeof(*data));
+	sg_init_one(&result_sg, buf, size);
+	cmd.opcode = cpu_to_le16(opcode);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.data_sg = &data_sg;
+	cmd.result_sg = &result_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+
+	kfree(data);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
+
+/*
+ * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
+ * information for legacy interface
+ * @dev: VF pci_dev
+ * @req_bar_flags: requested bar flags
+ * @bar: on output the BAR number of the member device
+ * @bar_offset: on output the offset within bar
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
+					   u8 req_bar_flags, u8 *bar,
+					   u64 *bar_offset)
+{
+	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+	struct virtio_admin_cmd_notify_info_result *result;
+	struct virtio_admin_cmd cmd = {};
+	struct scatterlist result_sg;
+	int vf_id;
+	int ret;
+
+	if (!virtio_dev)
+		return -ENODEV;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	result = kzalloc(sizeof(*result), GFP_KERNEL);
+	if (!result)
+		return -ENOMEM;
+
+	sg_init_one(&result_sg, result, sizeof(*result));
+	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
+	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+	cmd.group_member_id = cpu_to_le64(vf_id + 1);
+	cmd.result_sg = &result_sg;
+	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+	if (!ret) {
+		struct virtio_admin_cmd_notify_info_data *entry;
+		int i;
+
+		ret = -ENOENT;
+		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
+			entry = &result->entries[i];
+			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
+				break;
+			if (entry->flags != req_bar_flags)
+				continue;
+			*bar = entry->bar;
+			*bar_offset = le64_to_cpu(entry->offset);
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(result);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
+
 static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
 	.get		= NULL,
 	.set		= NULL,
diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
new file mode 100644
index 000000000000..cb916a4bc1b1
--- /dev/null
+++ b/include/linux/virtio_pci_admin.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
+#define _LINUX_VIRTIO_PCI_ADMIN_H
+
+#include <linux/types.h>
+#include <linux/pci.h>
+
+int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
+int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
+				     u8 offset, u8 size, u8 *buf);
+int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
+				    u8 offset, u8 size, u8 *buf);
+int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
+					   u8 req_bar_flags, u8 *bar,
+					   u64 *bar_offset);
+
+#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap()
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
 include/linux/vfio_pci_core.h    |  1 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..ebea39836dd9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	void __iomem *io;
+	int ret;
+
+	if (vdev->barmap[bar])
+		return 0;
+
+	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	if (ret)
+		return ret;
+
+	io = pci_iomap(pdev, bar, 0);
+	if (!io) {
+		pci_release_selected_regions(pdev, 1 << bar);
+		return -ENOMEM;
+	}
+
+	vdev->barmap[bar] = io;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
+
 void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 	return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	void __iomem *io;
-
-	if (vdev->barmap[bar])
-		return 0;
-
-	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-	if (ret)
-		return ret;
-
-	io = pci_iomap(pdev, bar, 0);
-	if (!io) {
-		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
-	}
-
-	vdev->barmap[bar] = io;
-
-	return 0;
-}
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		}
 		x_end = end;
 	} else {
-		int ret = vfio_pci_setup_barmap(vdev, bar);
+		int ret = vfio_pci_core_setup_barmap(vdev, bar);
 		if (ret) {
 			done = ret;
 			goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 		return -EINVAL;
 #endif
 
-	ret = vfio_pci_setup_barmap(vdev, bar);
+	ret = vfio_pci_core_setup_barmap(vdev, bar);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap()
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 25 +++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--------------------------
 include/linux/vfio_pci_core.h    |  1 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..ebea39836dd9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	void __iomem *io;
+	int ret;
+
+	if (vdev->barmap[bar])
+		return 0;
+
+	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+	if (ret)
+		return ret;
+
+	io = pci_iomap(pdev, bar, 0);
+	if (!io) {
+		pci_release_selected_regions(pdev, 1 << bar);
+		return -ENOMEM;
+	}
+
+	vdev->barmap[bar] = io;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
+
 void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 	return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	void __iomem *io;
-
-	if (vdev->barmap[bar])
-		return 0;
-
-	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-	if (ret)
-		return ret;
-
-	io = pci_iomap(pdev, bar, 0);
-	if (!io) {
-		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
-	}
-
-	vdev->barmap[bar] = io;
-
-	return 0;
-}
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		}
 		x_end = end;
 	} else {
-		int ret = vfio_pci_setup_barmap(vdev, bar);
+		int ret = vfio_pci_core_setup_barmap(vdev, bar);
 		if (ret) {
 			done = ret;
 			goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 		return -EINVAL;
 #endif
 
-	ret = vfio_pci_setup_barmap(vdev, bar);
+	ret = vfio_pci_core_setup_barmap(vdev, bar);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Expose vfio_pci_iowrite/read##size() to let it be used by drivers.

This functionality is needed to enable direct access to some physical
BAR of the device with the proper locks/checks in place.

The next patches from this series will use this functionality on a data
path flow when a direct access to the BAR is needed.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
 include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6f08b3ecbb89..817ec9a89123 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
 #define vfio_iowrite8	iowrite8
 
 #define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
 
 VFIO_IOWRITE(8)
 VFIO_IOWRITE(16)
@@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
 #endif
 
 #define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size *val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
 
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 67ac58e20e1d..22c915317788 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
+#define VFIO_IOWRITE_DECLATION(size) \
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size val, void __iomem *io);
+
+VFIO_IOWRITE_DECLATION(8)
+VFIO_IOWRITE_DECLATION(16)
+VFIO_IOWRITE_DECLATION(32)
+#ifdef iowrite64
+VFIO_IOWRITE_DECLATION(64)
+#endif
+
+#define VFIO_IOREAD_DECLATION(size) \
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size *val, void __iomem *io);
+
+VFIO_IOREAD_DECLATION(8)
+VFIO_IOREAD_DECLATION(16)
+VFIO_IOREAD_DECLATION(32)
+
 #endif /* VFIO_PCI_CORE_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Expose vfio_pci_iowrite/read##size() to let it be used by drivers.

This functionality is needed to enable direct access to some physical
BAR of the device with the proper locks/checks in place.

The next patches from this series will use this functionality on a data
path flow when a direct access to the BAR is needed.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++++++----
 include/linux/vfio_pci_core.h    | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6f08b3ecbb89..817ec9a89123 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
 #define vfio_iowrite8	iowrite8
 
 #define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
 
 VFIO_IOWRITE(8)
 VFIO_IOWRITE(16)
@@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
 #endif
 
 #define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 			bool test_mem, u##size *val, void __iomem *io)	\
 {									\
 	if (test_mem) {							\
@@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
 		up_read(&vdev->memory_lock);				\
 									\
 	return 0;							\
-}
+}									\
+EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
 
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 67ac58e20e1d..22c915317788 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 
+#define VFIO_IOWRITE_DECLATION(size) \
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size val, void __iomem *io);
+
+VFIO_IOWRITE_DECLATION(8)
+VFIO_IOWRITE_DECLATION(16)
+VFIO_IOWRITE_DECLATION(32)
+#ifdef iowrite64
+VFIO_IOWRITE_DECLATION(64)
+#endif
+
+#define VFIO_IOREAD_DECLATION(size) \
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,		\
+			bool test_mem, u##size *val, void __iomem *io);
+
+VFIO_IOREAD_DECLATION(8)
+VFIO_IOREAD_DECLATION(16)
+VFIO_IOREAD_DECLATION(32)
+
 #endif /* VFIO_PCI_CORE_H */
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 MAINTAINERS                      |   7 +
 drivers/vfio/pci/Kconfig         |   2 +
 drivers/vfio/pci/Makefile        |   2 +
 drivers/vfio/pci/virtio/Kconfig  |  15 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
 6 files changed, 607 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a7bd8bd80e9..680a70063775 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M:	Yishai Hadas <yishaih@nvidia.com>
+L:	kvm@vger.kernel.org
+L:	virtualization@lists.linux-foundation.org
+S:	Maintained
+F:	drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO VF devices using the VFIO
+          framework that can work with a legacy virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
new file mode 100644
index 000000000000..2039b39fb723
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
+virtio-vfio-pci-y := main.o
+
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
new file mode 100644
index 000000000000..3fef4b21f7e6
--- /dev/null
+++ b/drivers/vfio/pci/virtio/main.c
@@ -0,0 +1,577 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_pci_admin.h>
+
+struct virtiovf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	u8 bar0_virtual_buf_size;
+	u8 *bar0_virtual_buf;
+	/* synchronize access to the virtual buf */
+	struct mutex bar_mutex;
+	void __iomem *notify_addr;
+	u32 notify_offset;
+	u8 notify_bar;
+	u16 pci_cmd;
+	u16 msix_ctrl;
+};
+
+static int
+virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
+			     loff_t pos, char __user *buf,
+			     size_t count, bool read)
+{
+	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
+	u16 opcode;
+	int ret;
+
+	mutex_lock(&virtvdev->bar_mutex);
+	if (read) {
+		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
+		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
+						      bar0_buf + pos);
+		if (ret)
+			goto out;
+		if (copy_to_user(buf, bar0_buf + pos, count))
+			ret = -EFAULT;
+		goto out;
+	}
+
+	if (copy_from_user(bar0_buf + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
+	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
+					       bar0_buf + pos);
+out:
+	mutex_unlock(&virtvdev->bar_mutex);
+	return ret;
+}
+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	u16 queue_notify;
+	int ret;
+
+	if (pos + count > virtvdev->bar0_virtual_buf_size)
+		return -EINVAL;
+
+	switch (pos) {
+	case VIRTIO_PCI_QUEUE_NOTIFY:
+		if (count != sizeof(queue_notify))
+			return -EINVAL;
+		if (read) {
+			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
+						virtvdev->notify_addr);
+			if (ret)
+				return ret;
+			if (copy_to_user(buf, &queue_notify,
+					 sizeof(queue_notify)))
+				return -EFAULT;
+			break;
+		}
+
+		if (copy_from_user(&queue_notify, buf, count))
+			return -EFAULT;
+
+		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
+					 virtvdev->notify_addr);
+		break;
+	default:
+		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
+						   read);
+	}
+
+	return ret ? ret : count;
+}
+
+static bool range_intersect_range(loff_t range1_start, size_t count1,
+				  loff_t range2_start, size_t count2,
+				  loff_t *start_offset,
+				  size_t *intersect_count,
+				  size_t *register_offset)
+{
+	if (range1_start <= range2_start &&
+	    range1_start + count1 > range2_start) {
+		*start_offset = range2_start - range1_start;
+		*intersect_count = min_t(size_t, count2,
+					 range1_start + count1 - range2_start);
+		if (register_offset)
+			*register_offset = 0;
+		return true;
+	}
+
+	if (range1_start > range2_start &&
+	    range1_start < range2_start + count2) {
+		*start_offset = range1_start;
+		*intersect_count = min_t(size_t, count1,
+					 range2_start + count2 - range1_start);
+		if (register_offset)
+			*register_offset = range1_start - range2_start;
+		return true;
+	}
+
+	return false;
+}
+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	size_t register_offset;
+	loff_t copy_offset;
+	size_t copy_count;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				  &copy_offset, &copy_count, NULL)) {
+		val16 = cpu_to_le16(0x1000);
+		if (copy_to_user(buf + copy_offset, &val16, copy_count))
+			return -EFAULT;
+	}
+
+	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
+	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
+				   copy_count))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
+				  &copy_offset, &copy_count, NULL)) {
+		/* Transional needs to have revision 0 */
+		val8 = 0;
+		if (copy_to_user(buf + copy_offset, &val8, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
+				  &copy_offset, &copy_count, NULL)) {
+		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, &val32, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
+				  &copy_offset, &copy_count, NULL)) {
+		/*
+		 * Transitional devices use the PCI subsystem device id as
+		 * virtio device id, same as legacy driver always did.
+		 */
+		val16 = cpu_to_le16(VIRTIO_ID_NET);
+		if (copy_to_user(buf + copy_offset, &val16, copy_count))
+			return -EFAULT;
+	}
+
+	return count;
+}
+
+static ssize_t
+virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+		       size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_read(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
+				     ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		size_t register_offset;
+		loff_t copy_offset;
+		size_t copy_count;
+
+		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+
+		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
+					  sizeof(virtvdev->msix_ctrl),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+	}
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_write(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static int
+virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
+	void __user *uarg = (void __user *)arg;
+	struct vfio_region_info info = {};
+
+	if (copy_from_user(&info, uarg, minsz))
+		return -EFAULT;
+
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	switch (info.index) {
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		info.size = virtvdev->bar0_virtual_buf_size;
+		info.flags = VFIO_REGION_INFO_FLAG_READ |
+			     VFIO_REGION_INFO_FLAG_WRITE;
+		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static long
+virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case VFIO_DEVICE_GET_REGION_INFO:
+		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static int
+virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	int ret;
+
+	/*
+	 * Setup the BAR where the 'notify' exists to be used by vfio as well
+	 * This will let us mmap it only once and use it when needed.
+	 */
+	ret = vfio_pci_core_setup_barmap(core_device,
+					 virtvdev->notify_bar);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
+			virtvdev->notify_offset;
+	return 0;
+}
+
+static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	if (virtvdev->bar0_virtual_buf) {
+		/*
+		 * Upon close_device() the vfio_pci_core_disable() is called
+		 * and will close all the previous mmaps, so it seems that the
+		 * valid life cycle for the 'notify' addr is per open/close.
+		 */
+		ret = virtiovf_set_notify_addr(virtvdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+	}
+
+	vfio_pci_core_finish_enable(vdev);
+	return 0;
+}
+
+static int virtiovf_get_device_config_size(unsigned short device)
+{
+	/* Network card */
+	return offsetofend(struct virtio_net_config, status);
+}
+
+static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
+{
+	u64 offset;
+	int ret;
+	u8 bar;
+
+	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
+				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
+				&bar, &offset);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_bar = bar;
+	virtvdev->notify_offset = offset;
+	return 0;
+}
+
+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	/* Being ready with a buffer that supports MSIX */
+	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
+				virtiovf_get_device_config_size(pdev->device);
+	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
+					     GFP_KERNEL);
+	if (!virtvdev->bar0_virtual_buf)
+		return -ENOMEM;
+	mutex_init(&virtvdev->bar_mutex);
+	return 0;
+}
+
+static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+
+	kfree(virtvdev->bar0_virtual_buf);
+	vfio_pci_core_release_dev(core_vdev);
+}
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
+	.name = "virtio-transitional-vfio-pci",
+	.init = virtiovf_pci_init_device,
+	.release = virtiovf_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = virtiovf_vfio_pci_core_ioctl,
+	.read = virtiovf_pci_core_read,
+	.write = virtiovf_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
+	.name = "virtio-acc-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static bool virtiovf_bar0_exists(struct pci_dev *pdev)
+{
+	struct resource *res = pdev->resource;
+
+	return res->flags ? true : false;
+}
+
+#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
+	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
+
+static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
+{
+	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
+	u8 *buf;
+	int ret;
+
+	buf = kzalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return false;
+
+	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
+	if (ret)
+		goto end;
+
+	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
+		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
+		ret = -EOPNOTSUPP;
+		goto end;
+	}
+
+	/* Confirm the used commands */
+	memset(buf, 0, buf_size);
+	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
+	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
+end:
+	kfree(buf);
+	return ret ? false : true;
+}
+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
+	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
+		ops = &virtiovf_acc_vfio_pci_tran_ops;
+
+	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
+				     &pdev->dev, ops);
+	if (IS_ERR(virtvdev))
+		return PTR_ERR(virtvdev);
+
+	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
+	ret = vfio_pci_core_register_device(&virtvdev->core_device);
+	if (ret)
+		goto out;
+	return 0;
+out:
+	vfio_put_device(&virtvdev->core_device.vdev);
+	return ret;
+}
+
+static void virtiovf_pci_remove(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&virtvdev->core_device);
+	vfio_put_device(&virtvdev->core_device.vdev);
+}
+
+static const struct pci_device_id virtiovf_pci_table[] = {
+	/* Only virtio-net is supported/tested so far */
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
+
+static struct pci_driver virtiovf_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = virtiovf_pci_table,
+	.probe = virtiovf_pci_probe,
+	.remove = virtiovf_pci_remove,
+	.err_handler = &vfio_pci_core_err_handlers,
+	.driver_managed_dma = true,
+};
+
+module_pci_driver(virtiovf_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-17 13:42   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-17 13:42 UTC (permalink / raw)
  To: alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, virtualization, jiri, leonro

Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------

Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 MAINTAINERS                      |   7 +
 drivers/vfio/pci/Kconfig         |   2 +
 drivers/vfio/pci/Makefile        |   2 +
 drivers/vfio/pci/virtio/Kconfig  |  15 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
 6 files changed, 607 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a7bd8bd80e9..680a70063775 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M:	Yishai Hadas <yishaih@nvidia.com>
+L:	kvm@vger.kernel.org
+L:	virtualization@lists.linux-foundation.org
+S:	Maintained
+F:	drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 000000000000..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+        tristate "VFIO support for VIRTIO PCI devices"
+        depends on VIRTIO_PCI
+        select VFIO_PCI_CORE
+        help
+          This provides support for exposing VIRTIO VF devices using the VFIO
+          framework that can work with a legacy virtio driver in the guest.
+          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
+          not indicate I/O Space.
+          As of that this driver emulated I/O BAR in software to let a VF be
+          seen as a transitional device in the guest and let it work with
+          a legacy driver.
+
+          If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
new file mode 100644
index 000000000000..2039b39fb723
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
+virtio-vfio-pci-y := main.o
+
diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
new file mode 100644
index 000000000000..3fef4b21f7e6
--- /dev/null
+++ b/drivers/vfio/pci/virtio/main.c
@@ -0,0 +1,577 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/virtio_pci.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_pci_admin.h>
+
+struct virtiovf_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	u8 bar0_virtual_buf_size;
+	u8 *bar0_virtual_buf;
+	/* synchronize access to the virtual buf */
+	struct mutex bar_mutex;
+	void __iomem *notify_addr;
+	u32 notify_offset;
+	u8 notify_bar;
+	u16 pci_cmd;
+	u16 msix_ctrl;
+};
+
+static int
+virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
+			     loff_t pos, char __user *buf,
+			     size_t count, bool read)
+{
+	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
+	u16 opcode;
+	int ret;
+
+	mutex_lock(&virtvdev->bar_mutex);
+	if (read) {
+		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
+		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
+						      bar0_buf + pos);
+		if (ret)
+			goto out;
+		if (copy_to_user(buf, bar0_buf + pos, count))
+			ret = -EFAULT;
+		goto out;
+	}
+
+	if (copy_from_user(bar0_buf + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
+			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
+			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
+	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
+					       bar0_buf + pos);
+out:
+	mutex_unlock(&virtvdev->bar_mutex);
+	return ret;
+}
+
+static int
+translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
+			    loff_t pos, char __user *buf,
+			    size_t count, bool read)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	u16 queue_notify;
+	int ret;
+
+	if (pos + count > virtvdev->bar0_virtual_buf_size)
+		return -EINVAL;
+
+	switch (pos) {
+	case VIRTIO_PCI_QUEUE_NOTIFY:
+		if (count != sizeof(queue_notify))
+			return -EINVAL;
+		if (read) {
+			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
+						virtvdev->notify_addr);
+			if (ret)
+				return ret;
+			if (copy_to_user(buf, &queue_notify,
+					 sizeof(queue_notify)))
+				return -EFAULT;
+			break;
+		}
+
+		if (copy_from_user(&queue_notify, buf, count))
+			return -EFAULT;
+
+		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
+					 virtvdev->notify_addr);
+		break;
+	default:
+		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
+						   read);
+	}
+
+	return ret ? ret : count;
+}
+
+static bool range_intersect_range(loff_t range1_start, size_t count1,
+				  loff_t range2_start, size_t count2,
+				  loff_t *start_offset,
+				  size_t *intersect_count,
+				  size_t *register_offset)
+{
+	if (range1_start <= range2_start &&
+	    range1_start + count1 > range2_start) {
+		*start_offset = range2_start - range1_start;
+		*intersect_count = min_t(size_t, count2,
+					 range1_start + count1 - range2_start);
+		if (register_offset)
+			*register_offset = 0;
+		return true;
+	}
+
+	if (range1_start > range2_start &&
+	    range1_start < range2_start + count2) {
+		*start_offset = range1_start;
+		*intersect_count = min_t(size_t, count1,
+					 range2_start + count2 - range1_start);
+		if (register_offset)
+			*register_offset = range1_start - range2_start;
+		return true;
+	}
+
+	return false;
+}
+
+static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
+					char __user *buf, size_t count,
+					loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	size_t register_offset;
+	loff_t copy_offset;
+	size_t copy_count;
+	__le32 val32;
+	__le16 val16;
+	u8 val8;
+	int ret;
+
+	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
+	if (ret < 0)
+		return ret;
+
+	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
+				  &copy_offset, &copy_count, NULL)) {
+		val16 = cpu_to_le16(0x1000);
+		if (copy_to_user(buf + copy_offset, &val16, copy_count))
+			return -EFAULT;
+	}
+
+	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
+	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
+				  &copy_offset, &copy_count, &register_offset)) {
+		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
+				   copy_count))
+			return -EFAULT;
+		val16 |= cpu_to_le16(PCI_COMMAND_IO);
+		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
+				 copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
+				  &copy_offset, &copy_count, NULL)) {
+		/* Transional needs to have revision 0 */
+		val8 = 0;
+		if (copy_to_user(buf + copy_offset, &val8, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
+				  &copy_offset, &copy_count, NULL)) {
+		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
+		if (copy_to_user(buf + copy_offset, &val32, copy_count))
+			return -EFAULT;
+	}
+
+	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
+				  &copy_offset, &copy_count, NULL)) {
+		/*
+		 * Transitional devices use the PCI subsystem device id as
+		 * virtio device id, same as legacy driver always did.
+		 */
+		val16 = cpu_to_le16(VIRTIO_ID_NET);
+		if (copy_to_user(buf + copy_offset, &val16, copy_count))
+			return -EFAULT;
+	}
+
+	return count;
+}
+
+static ssize_t
+virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+		       size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_read(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
+				     ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static ssize_t
+virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev = virtvdev->core_device.pdev;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret;
+
+	if (!count)
+		return 0;
+
+	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
+		size_t register_offset;
+		loff_t copy_offset;
+		size_t copy_count;
+
+		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+
+		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
+					  sizeof(virtvdev->msix_ctrl),
+					  &copy_offset, &copy_count,
+					  &register_offset)) {
+			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
+					   buf + copy_offset,
+					   copy_count))
+				return -EFAULT;
+		}
+	}
+
+	if (index != VFIO_PCI_BAR0_REGION_INDEX)
+		return vfio_pci_core_write(core_vdev, buf, count, ppos);
+
+	ret = pm_runtime_resume_and_get(&pdev->dev);
+	if (ret) {
+		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
+		return -EIO;
+	}
+
+	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
+	pm_runtime_put(&pdev->dev);
+	return ret;
+}
+
+static int
+virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
+	void __user *uarg = (void __user *)arg;
+	struct vfio_region_info info = {};
+
+	if (copy_from_user(&info, uarg, minsz))
+		return -EFAULT;
+
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	switch (info.index) {
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		info.size = virtvdev->bar0_virtual_buf_size;
+		info.flags = VFIO_REGION_INFO_FLAG_READ |
+			     VFIO_REGION_INFO_FLAG_WRITE;
+		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static long
+virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+			     unsigned long arg)
+{
+	switch (cmd) {
+	case VFIO_DEVICE_GET_REGION_INFO:
+		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
+	default:
+		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+	}
+}
+
+static int
+virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+{
+	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
+	int ret;
+
+	/*
+	 * Setup the BAR where the 'notify' exists to be used by vfio as well
+	 * This will let us mmap it only once and use it when needed.
+	 */
+	ret = vfio_pci_core_setup_barmap(core_device,
+					 virtvdev->notify_bar);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
+			virtvdev->notify_offset;
+	return 0;
+}
+
+static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	if (virtvdev->bar0_virtual_buf) {
+		/*
+		 * Upon close_device() the vfio_pci_core_disable() is called
+		 * and will close all the previous mmaps, so it seems that the
+		 * valid life cycle for the 'notify' addr is per open/close.
+		 */
+		ret = virtiovf_set_notify_addr(virtvdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+	}
+
+	vfio_pci_core_finish_enable(vdev);
+	return 0;
+}
+
+static int virtiovf_get_device_config_size(unsigned short device)
+{
+	/* Network card */
+	return offsetofend(struct virtio_net_config, status);
+}
+
+static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
+{
+	u64 offset;
+	int ret;
+	u8 bar;
+
+	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
+				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
+				&bar, &offset);
+	if (ret)
+		return ret;
+
+	virtvdev->notify_bar = bar;
+	virtvdev->notify_offset = offset;
+	return 0;
+}
+
+static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+	struct pci_dev *pdev;
+	int ret;
+
+	ret = vfio_pci_core_init_dev(core_vdev);
+	if (ret)
+		return ret;
+
+	pdev = virtvdev->core_device.pdev;
+	ret = virtiovf_read_notify_info(virtvdev);
+	if (ret)
+		return ret;
+
+	/* Being ready with a buffer that supports MSIX */
+	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
+				virtiovf_get_device_config_size(pdev->device);
+	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
+					     GFP_KERNEL);
+	if (!virtvdev->bar0_virtual_buf)
+		return -ENOMEM;
+	mutex_init(&virtvdev->bar_mutex);
+	return 0;
+}
+
+static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = container_of(
+		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
+
+	kfree(virtvdev->bar0_virtual_buf);
+	vfio_pci_core_release_dev(core_vdev);
+}
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
+	.name = "virtio-transitional-vfio-pci",
+	.init = virtiovf_pci_init_device,
+	.release = virtiovf_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = virtiovf_vfio_pci_core_ioctl,
+	.read = virtiovf_pci_core_read,
+	.write = virtiovf_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
+	.name = "virtio-acc-vfio-pci",
+	.init = vfio_pci_core_init_dev,
+	.release = vfio_pci_core_release_dev,
+	.open_device = virtiovf_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+};
+
+static bool virtiovf_bar0_exists(struct pci_dev *pdev)
+{
+	struct resource *res = pdev->resource;
+
+	return res->flags ? true : false;
+}
+
+#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
+	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
+	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
+
+static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
+{
+	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
+	u8 *buf;
+	int ret;
+
+	buf = kzalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return false;
+
+	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
+	if (ret)
+		goto end;
+
+	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
+		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
+		ret = -EOPNOTSUPP;
+		goto end;
+	}
+
+	/* Confirm the used commands */
+	memset(buf, 0, buf_size);
+	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
+	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
+end:
+	kfree(buf);
+	return ret ? false : true;
+}
+
+static int virtiovf_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *id)
+{
+	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
+	struct virtiovf_pci_core_device *virtvdev;
+	int ret;
+
+	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
+	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
+		ops = &virtiovf_acc_vfio_pci_tran_ops;
+
+	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
+				     &pdev->dev, ops);
+	if (IS_ERR(virtvdev))
+		return PTR_ERR(virtvdev);
+
+	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
+	ret = vfio_pci_core_register_device(&virtvdev->core_device);
+	if (ret)
+		goto out;
+	return 0;
+out:
+	vfio_put_device(&virtvdev->core_device.vdev);
+	return ret;
+}
+
+static void virtiovf_pci_remove(struct pci_dev *pdev)
+{
+	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&virtvdev->core_device);
+	vfio_put_device(&virtvdev->core_device.vdev);
+}
+
+static const struct pci_device_id virtiovf_pci_table[] = {
+	/* Only virtio-net is supported/tested so far */
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
+
+static struct pci_driver virtiovf_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = virtiovf_pci_table,
+	.probe = virtiovf_pci_probe,
+	.remove = virtiovf_pci_remove,
+	.err_handler = &vfio_pci_core_err_handlers,
+	.driver_managed_dma = true,
+};
+
+module_pci_driver(virtiovf_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-17 13:42   ` Yishai Hadas via Virtualization
@ 2023-10-17 20:24     ` Alex Williamson
  -1 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-17 20:24 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;


This is still an issue for me, it's a very narrow use case where we
have a modern device and want to enable legacy support.  Implementing an
IO BAR and mangling the device ID seems like it should be an opt-in,
not standard behavior for any compatible device.  Users should
generally expect that the device they see in the host is the device
they see in the guest.  They might even rely on that principle.

We can't use the argument that users wanting the default device should
use vfio-pci rather than virtio-vfio-pci because we've already defined
the algorithm by which libvirt should choose a variant driver for a
device.  libvirt will choose this driver for all virtio-net devices.

This driver effectively has the option to expose two different profiles
for the device, native or transitional.  We've discussed profile
support for variant drivers previously as an equivalent functionality
to mdev types, but the only use case for this currently is out-of-tree.
I think this might be the opportunity to define how device profiles are
exposed and selected in a variant driver.

Jason had previously suggested a devlink interface for this, but I
understand that path had been shot down by devlink developers.  Another
obvious option is sysfs, where we might imagine an optional "profiles"
directory, perhaps under vfio-dev.  Attributes of "available" and
"current" could allow discovery and selection of a profile similar to
mdev types.

Is this where we should head with this or are there other options to
confine this transitional behavior?

BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?

> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	/* Only virtio-net is supported/tested so far */
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

Not yet "family" per the device table.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-17 20:24     ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-17 20:24 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> +		ops = &virtiovf_acc_vfio_pci_tran_ops;


This is still an issue for me, it's a very narrow use case where we
have a modern device and want to enable legacy support.  Implementing an
IO BAR and mangling the device ID seems like it should be an opt-in,
not standard behavior for any compatible device.  Users should
generally expect that the device they see in the host is the device
they see in the guest.  They might even rely on that principle.

We can't use the argument that users wanting the default device should
use vfio-pci rather than virtio-vfio-pci because we've already defined
the algorithm by which libvirt should choose a variant driver for a
device.  libvirt will choose this driver for all virtio-net devices.

This driver effectively has the option to expose two different profiles
for the device, native or transitional.  We've discussed profile
support for variant drivers previously as an equivalent functionality
to mdev types, but the only use case for this currently is out-of-tree.
I think this might be the opportunity to define how device profiles are
exposed and selected in a variant driver.

Jason had previously suggested a devlink interface for this, but I
understand that path had been shot down by devlink developers.  Another
obvious option is sysfs, where we might imagine an optional "profiles"
directory, perhaps under vfio-dev.  Attributes of "available" and
"current" could allow discovery and selection of a profile similar to
mdev types.

Is this where we should head with this or are there other options to
confine this transitional behavior?

BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?

> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	/* Only virtio-net is supported/tested so far */
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

Not yet "family" per the device table.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-17 13:42   ` Yishai Hadas via Virtualization
@ 2023-10-17 20:33     ` kernel test robot
  -1 siblings, 0 replies; 100+ messages in thread
From: kernel test robot @ 2023-10-17 20:33 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: oe-kbuild-all, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.6-rc6 next-20231017]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Fix-common-config-map-for-modern-device/20231017-214450
base:   linus/master
patch link:    https://lore.kernel.org/r/20231017134217.82497-7-yishaih%40nvidia.com
patch subject: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20231018/202310180437.jo2csM6u-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231018/202310180437.jo2csM6u-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310180437.jo2csM6u-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_pci_modern.c:731:5: warning: no previous prototype for 'virtio_pci_admin_list_query' [-Wmissing-prototypes]
     731 | int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:758:5: warning: no previous prototype for 'virtio_pci_admin_list_use' [-Wmissing-prototypes]
     758 | int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:786:5: warning: no previous prototype for 'virtio_pci_admin_legacy_io_write' [-Wmissing-prototypes]
     786 | int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:831:5: warning: no previous prototype for 'virtio_pci_admin_legacy_io_read' [-Wmissing-prototypes]
     831 | int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:877:5: warning: no previous prototype for 'virtio_pci_admin_legacy_io_notify_info' [-Wmissing-prototypes]
     877 | int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/virtio_pci_admin_list_query +731 drivers/virtio/virtio_pci_modern.c

   721	
   722	/*
   723	 * virtio_pci_admin_list_query - Provides to driver list of commands
   724	 * supported for the PCI VF.
   725	 * @dev: VF pci_dev
   726	 * @buf: buffer to hold the returned list
   727	 * @buf_size: size of the given buffer
   728	 *
   729	 * Returns 0 on success, or negative on failure.
   730	 */
 > 731	int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
   732	{
   733		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   734		struct virtio_admin_cmd cmd = {};
   735		struct scatterlist result_sg;
   736	
   737		if (!virtio_dev)
   738			return -ENODEV;
   739	
   740		sg_init_one(&result_sg, buf, buf_size);
   741		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
   742		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   743		cmd.result_sg = &result_sg;
   744	
   745		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   746	}
   747	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
   748	
   749	/*
   750	 * virtio_pci_admin_list_use - Provides to device list of commands
   751	 * used for the PCI VF.
   752	 * @dev: VF pci_dev
   753	 * @buf: buffer which holds the list
   754	 * @buf_size: size of the given buffer
   755	 *
   756	 * Returns 0 on success, or negative on failure.
   757	 */
 > 758	int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
   759	{
   760		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   761		struct virtio_admin_cmd cmd = {};
   762		struct scatterlist data_sg;
   763	
   764		if (!virtio_dev)
   765			return -ENODEV;
   766	
   767		sg_init_one(&data_sg, buf, buf_size);
   768		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
   769		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   770		cmd.data_sg = &data_sg;
   771	
   772		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   773	}
   774	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
   775	
   776	/*
   777	 * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
   778	 * @dev: VF pci_dev
   779	 * @opcode: op code of the io write command
   780	 * @offset: starting byte offset within the registers to write to
   781	 * @size: size of the data to write
   782	 * @buf: buffer which holds the data
   783	 *
   784	 * Returns 0 on success, or negative on failure.
   785	 */
 > 786	int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
   787					     u8 offset, u8 size, u8 *buf)
   788	{
   789		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   790		struct virtio_admin_cmd_legacy_wr_data *data;
   791		struct virtio_admin_cmd cmd = {};
   792		struct scatterlist data_sg;
   793		int vf_id;
   794		int ret;
   795	
   796		if (!virtio_dev)
   797			return -ENODEV;
   798	
   799		vf_id = pci_iov_vf_id(pdev);
   800		if (vf_id < 0)
   801			return vf_id;
   802	
   803		data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
   804		if (!data)
   805			return -ENOMEM;
   806	
   807		data->offset = offset;
   808		memcpy(data->registers, buf, size);
   809		sg_init_one(&data_sg, data, sizeof(*data) + size);
   810		cmd.opcode = cpu_to_le16(opcode);
   811		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   812		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   813		cmd.data_sg = &data_sg;
   814		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   815	
   816		kfree(data);
   817		return ret;
   818	}
   819	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
   820	
   821	/*
   822	 * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
   823	 * @dev: VF pci_dev
   824	 * @opcode: op code of the io read command
   825	 * @offset: starting byte offset within the registers to read from
   826	 * @size: size of the data to be read
   827	 * @buf: buffer to hold the returned data
   828	 *
   829	 * Returns 0 on success, or negative on failure.
   830	 */
 > 831	int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
   832					    u8 offset, u8 size, u8 *buf)
   833	{
   834		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   835		struct virtio_admin_cmd_legacy_rd_data *data;
   836		struct scatterlist data_sg, result_sg;
   837		struct virtio_admin_cmd cmd = {};
   838		int vf_id;
   839		int ret;
   840	
   841		if (!virtio_dev)
   842			return -ENODEV;
   843	
   844		vf_id = pci_iov_vf_id(pdev);
   845		if (vf_id < 0)
   846			return vf_id;
   847	
   848		data = kzalloc(sizeof(*data), GFP_KERNEL);
   849		if (!data)
   850			return -ENOMEM;
   851	
   852		data->offset = offset;
   853		sg_init_one(&data_sg, data, sizeof(*data));
   854		sg_init_one(&result_sg, buf, size);
   855		cmd.opcode = cpu_to_le16(opcode);
   856		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   857		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   858		cmd.data_sg = &data_sg;
   859		cmd.result_sg = &result_sg;
   860		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   861	
   862		kfree(data);
   863		return ret;
   864	}
   865	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
   866	
   867	/*
   868	 * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
   869	 * information for legacy interface
   870	 * @dev: VF pci_dev
   871	 * @req_bar_flags: requested bar flags
   872	 * @bar: on output the BAR number of the member device
   873	 * @bar_offset: on output the offset within bar
   874	 *
   875	 * Returns 0 on success, or negative on failure.
   876	 */
 > 877	int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
   878						   u8 req_bar_flags, u8 *bar,
   879						   u64 *bar_offset)
   880	{
   881		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   882		struct virtio_admin_cmd_notify_info_result *result;
   883		struct virtio_admin_cmd cmd = {};
   884		struct scatterlist result_sg;
   885		int vf_id;
   886		int ret;
   887	
   888		if (!virtio_dev)
   889			return -ENODEV;
   890	
   891		vf_id = pci_iov_vf_id(pdev);
   892		if (vf_id < 0)
   893			return vf_id;
   894	
   895		result = kzalloc(sizeof(*result), GFP_KERNEL);
   896		if (!result)
   897			return -ENOMEM;
   898	
   899		sg_init_one(&result_sg, result, sizeof(*result));
   900		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
   901		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   902		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   903		cmd.result_sg = &result_sg;
   904		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   905		if (!ret) {
   906			struct virtio_admin_cmd_notify_info_data *entry;
   907			int i;
   908	
   909			ret = -ENOENT;
   910			for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
   911				entry = &result->entries[i];
   912				if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
   913					break;
   914				if (entry->flags != req_bar_flags)
   915					continue;
   916				*bar = entry->bar;
   917				*bar_offset = le64_to_cpu(entry->offset);
   918				ret = 0;
   919				break;
   920			}
   921		}
   922	
   923		kfree(result);
   924		return ret;
   925	}
   926	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
   927	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-17 20:33     ` kernel test robot
  0 siblings, 0 replies; 100+ messages in thread
From: kernel test robot @ 2023-10-17 20:33 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, oe-kbuild-all, virtualization, jiri, leonro

Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.6-rc6 next-20231017]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Fix-common-config-map-for-modern-device/20231017-214450
base:   linus/master
patch link:    https://lore.kernel.org/r/20231017134217.82497-7-yishaih%40nvidia.com
patch subject: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20231018/202310180437.jo2csM6u-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231018/202310180437.jo2csM6u-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310180437.jo2csM6u-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_pci_modern.c:731:5: warning: no previous prototype for 'virtio_pci_admin_list_query' [-Wmissing-prototypes]
     731 | int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:758:5: warning: no previous prototype for 'virtio_pci_admin_list_use' [-Wmissing-prototypes]
     758 | int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:786:5: warning: no previous prototype for 'virtio_pci_admin_legacy_io_write' [-Wmissing-prototypes]
     786 | int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:831:5: warning: no previous prototype for 'virtio_pci_admin_legacy_io_read' [-Wmissing-prototypes]
     831 | int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/virtio/virtio_pci_modern.c:877:5: warning: no previous prototype for 'virtio_pci_admin_legacy_io_notify_info' [-Wmissing-prototypes]
     877 | int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/virtio_pci_admin_list_query +731 drivers/virtio/virtio_pci_modern.c

   721	
   722	/*
   723	 * virtio_pci_admin_list_query - Provides to driver list of commands
   724	 * supported for the PCI VF.
   725	 * @dev: VF pci_dev
   726	 * @buf: buffer to hold the returned list
   727	 * @buf_size: size of the given buffer
   728	 *
   729	 * Returns 0 on success, or negative on failure.
   730	 */
 > 731	int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
   732	{
   733		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   734		struct virtio_admin_cmd cmd = {};
   735		struct scatterlist result_sg;
   736	
   737		if (!virtio_dev)
   738			return -ENODEV;
   739	
   740		sg_init_one(&result_sg, buf, buf_size);
   741		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
   742		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   743		cmd.result_sg = &result_sg;
   744	
   745		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   746	}
   747	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
   748	
   749	/*
   750	 * virtio_pci_admin_list_use - Provides to device list of commands
   751	 * used for the PCI VF.
   752	 * @dev: VF pci_dev
   753	 * @buf: buffer which holds the list
   754	 * @buf_size: size of the given buffer
   755	 *
   756	 * Returns 0 on success, or negative on failure.
   757	 */
 > 758	int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
   759	{
   760		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   761		struct virtio_admin_cmd cmd = {};
   762		struct scatterlist data_sg;
   763	
   764		if (!virtio_dev)
   765			return -ENODEV;
   766	
   767		sg_init_one(&data_sg, buf, buf_size);
   768		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
   769		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   770		cmd.data_sg = &data_sg;
   771	
   772		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   773	}
   774	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
   775	
   776	/*
   777	 * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
   778	 * @dev: VF pci_dev
   779	 * @opcode: op code of the io write command
   780	 * @offset: starting byte offset within the registers to write to
   781	 * @size: size of the data to write
   782	 * @buf: buffer which holds the data
   783	 *
   784	 * Returns 0 on success, or negative on failure.
   785	 */
 > 786	int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
   787					     u8 offset, u8 size, u8 *buf)
   788	{
   789		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   790		struct virtio_admin_cmd_legacy_wr_data *data;
   791		struct virtio_admin_cmd cmd = {};
   792		struct scatterlist data_sg;
   793		int vf_id;
   794		int ret;
   795	
   796		if (!virtio_dev)
   797			return -ENODEV;
   798	
   799		vf_id = pci_iov_vf_id(pdev);
   800		if (vf_id < 0)
   801			return vf_id;
   802	
   803		data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
   804		if (!data)
   805			return -ENOMEM;
   806	
   807		data->offset = offset;
   808		memcpy(data->registers, buf, size);
   809		sg_init_one(&data_sg, data, sizeof(*data) + size);
   810		cmd.opcode = cpu_to_le16(opcode);
   811		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   812		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   813		cmd.data_sg = &data_sg;
   814		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   815	
   816		kfree(data);
   817		return ret;
   818	}
   819	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
   820	
   821	/*
   822	 * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
   823	 * @dev: VF pci_dev
   824	 * @opcode: op code of the io read command
   825	 * @offset: starting byte offset within the registers to read from
   826	 * @size: size of the data to be read
   827	 * @buf: buffer to hold the returned data
   828	 *
   829	 * Returns 0 on success, or negative on failure.
   830	 */
 > 831	int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
   832					    u8 offset, u8 size, u8 *buf)
   833	{
   834		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   835		struct virtio_admin_cmd_legacy_rd_data *data;
   836		struct scatterlist data_sg, result_sg;
   837		struct virtio_admin_cmd cmd = {};
   838		int vf_id;
   839		int ret;
   840	
   841		if (!virtio_dev)
   842			return -ENODEV;
   843	
   844		vf_id = pci_iov_vf_id(pdev);
   845		if (vf_id < 0)
   846			return vf_id;
   847	
   848		data = kzalloc(sizeof(*data), GFP_KERNEL);
   849		if (!data)
   850			return -ENOMEM;
   851	
   852		data->offset = offset;
   853		sg_init_one(&data_sg, data, sizeof(*data));
   854		sg_init_one(&result_sg, buf, size);
   855		cmd.opcode = cpu_to_le16(opcode);
   856		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   857		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   858		cmd.data_sg = &data_sg;
   859		cmd.result_sg = &result_sg;
   860		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   861	
   862		kfree(data);
   863		return ret;
   864	}
   865	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
   866	
   867	/*
   868	 * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
   869	 * information for legacy interface
   870	 * @dev: VF pci_dev
   871	 * @req_bar_flags: requested bar flags
   872	 * @bar: on output the BAR number of the member device
   873	 * @bar_offset: on output the offset within bar
   874	 *
   875	 * Returns 0 on success, or negative on failure.
   876	 */
 > 877	int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
   878						   u8 req_bar_flags, u8 *bar,
   879						   u64 *bar_offset)
   880	{
   881		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   882		struct virtio_admin_cmd_notify_info_result *result;
   883		struct virtio_admin_cmd cmd = {};
   884		struct scatterlist result_sg;
   885		int vf_id;
   886		int ret;
   887	
   888		if (!virtio_dev)
   889			return -ENODEV;
   890	
   891		vf_id = pci_iov_vf_id(pdev);
   892		if (vf_id < 0)
   893			return vf_id;
   894	
   895		result = kzalloc(sizeof(*result), GFP_KERNEL);
   896		if (!result)
   897			return -ENOMEM;
   898	
   899		sg_init_one(&result_sg, result, sizeof(*result));
   900		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
   901		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   902		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   903		cmd.result_sg = &result_sg;
   904		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   905		if (!ret) {
   906			struct virtio_admin_cmd_notify_info_data *entry;
   907			int i;
   908	
   909			ret = -ENOENT;
   910			for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
   911				entry = &result->entries[i];
   912				if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
   913					break;
   914				if (entry->flags != req_bar_flags)
   915					continue;
   916				*bar = entry->bar;
   917				*bar_offset = le64_to_cpu(entry->offset);
   918				ret = 0;
   919				break;
   920			}
   921		}
   922	
   923		kfree(result);
   924		return ret;
   925	}
   926	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
   927	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-17 20:24     ` Alex Williamson
@ 2023-10-18  9:01       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-18  9:01 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: mst, jasowang, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On 17/10/2023 23:24, Alex Williamson wrote:
> On Tue, 17 Oct 2023 16:42:17 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>
> This is still an issue for me, it's a very narrow use case where we
> have a modern device and want to enable legacy support.  Implementing an
> IO BAR and mangling the device ID seems like it should be an opt-in,
> not standard behavior for any compatible device.  Users should
> generally expect that the device they see in the host is the device
> they see in the guest.  They might even rely on that principle.

Users here mainly refer to cloud operators.

We may assume, I believe, that they will be fine with seeing a 
transitional device in the guest as they would like to get the legacy IO 
support for their system.

However, we can still consider supplying a configuration knob in the 
device layer (e.g. in the DPU side) to let a cloud operator turning off 
the legacy capability.

In that case upon probe() of the vfio-virtio driver, we'll just pick-up 
the default vfio-pci 'ops' and in the guest we may have the same device 
ID as of in the host.

With that approach we may not require a HOST side control (i.e. sysfs, 
etc.), but stay with a s device control based on its user manual.

At the end, we don't expect any functional issue nor any compatible 
problem with the new driver, both modern and legacy drivers can work in 
the guest.

Can that work for you ?

>
> We can't use the argument that users wanting the default device should
> use vfio-pci rather than virtio-vfio-pci because we've already defined
> the algorithm by which libvirt should choose a variant driver for a
> device.  libvirt will choose this driver for all virtio-net devices.
>
> This driver effectively has the option to expose two different profiles
> for the device, native or transitional.  We've discussed profile
> support for variant drivers previously as an equivalent functionality
> to mdev types, but the only use case for this currently is out-of-tree.
> I think this might be the opportunity to define how device profiles are
> exposed and selected in a variant driver.
>
> Jason had previously suggested a devlink interface for this, but I
> understand that path had been shot down by devlink developers.  Another
> obvious option is sysfs, where we might imagine an optional "profiles"
> directory, perhaps under vfio-dev.  Attributes of "available" and
> "current" could allow discovery and selection of a profile similar to
> mdev types.

Referring to the sysfs option,

Do you expect the sysfs data to effect the libvirt decision ? may that 
require changes in libvirt ?

In addition,
May that be too late as the sysfs entry will be created upon driver 
binding by libvirt or that we have in mind some other option to control 
with that ?

Jason,
Can you please comment here as well ?

> Is this where we should head with this or are there other options to
> confine this transitional behavior?
>
> BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?

"acc" is just a short-cut to "access", see also here[1] a similar usage.

[1] 
https://elixir.bootlin.com/linux/v6.6-rc6/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1380

>
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	/* Only virtio-net is supported/tested so far */
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> Not yet "family" per the device table.  Thanks,

Right

How about dropping the word "family" and say instead ".. for VIRTIO 
devices" as we have in the Kconfig in that patch [1] ?

[1] "This provides support for exposing VIRTIO VF devices .."

Yishai

> Alex
>


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-18  9:01       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-18  9:01 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: kvm, mst, maorg, virtualization, jiri, leonro

On 17/10/2023 23:24, Alex Williamson wrote:
> On Tue, 17 Oct 2023 16:42:17 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>
> This is still an issue for me, it's a very narrow use case where we
> have a modern device and want to enable legacy support.  Implementing an
> IO BAR and mangling the device ID seems like it should be an opt-in,
> not standard behavior for any compatible device.  Users should
> generally expect that the device they see in the host is the device
> they see in the guest.  They might even rely on that principle.

Users here mainly refer to cloud operators.

We may assume, I believe, that they will be fine with seeing a 
transitional device in the guest as they would like to get the legacy IO 
support for their system.

However, we can still consider supplying a configuration knob in the 
device layer (e.g. in the DPU side) to let a cloud operator turning off 
the legacy capability.

In that case upon probe() of the vfio-virtio driver, we'll just pick-up 
the default vfio-pci 'ops' and in the guest we may have the same device 
ID as of in the host.

With that approach we may not require a HOST side control (i.e. sysfs, 
etc.), but stay with a s device control based on its user manual.

At the end, we don't expect any functional issue nor any compatible 
problem with the new driver, both modern and legacy drivers can work in 
the guest.

Can that work for you ?

>
> We can't use the argument that users wanting the default device should
> use vfio-pci rather than virtio-vfio-pci because we've already defined
> the algorithm by which libvirt should choose a variant driver for a
> device.  libvirt will choose this driver for all virtio-net devices.
>
> This driver effectively has the option to expose two different profiles
> for the device, native or transitional.  We've discussed profile
> support for variant drivers previously as an equivalent functionality
> to mdev types, but the only use case for this currently is out-of-tree.
> I think this might be the opportunity to define how device profiles are
> exposed and selected in a variant driver.
>
> Jason had previously suggested a devlink interface for this, but I
> understand that path had been shot down by devlink developers.  Another
> obvious option is sysfs, where we might imagine an optional "profiles"
> directory, perhaps under vfio-dev.  Attributes of "available" and
> "current" could allow discovery and selection of a profile similar to
> mdev types.

Referring to the sysfs option,

Do you expect the sysfs data to effect the libvirt decision ? may that 
require changes in libvirt ?

In addition,
May that be too late as the sysfs entry will be created upon driver 
binding by libvirt or that we have in mind some other option to control 
with that ?

Jason,
Can you please comment here as well ?

> Is this where we should head with this or are there other options to
> confine this transitional behavior?
>
> BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?

"acc" is just a short-cut to "access", see also here[1] a similar usage.

[1] 
https://elixir.bootlin.com/linux/v6.6-rc6/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1380

>
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	/* Only virtio-net is supported/tested so far */
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
> Not yet "family" per the device table.  Thanks,

Right

How about dropping the word "family" and say instead ".. for VIRTIO 
devices" as we have in the Kconfig in that patch [1] ?

[1] "This provides support for exposing VIRTIO VF devices .."

Yishai

> Alex
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-18  9:01       ` Yishai Hadas via Virtualization
@ 2023-10-18 12:51         ` Alex Williamson
  -1 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-18 12:51 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: kvm, mst, maorg, virtualization, Jason Gunthorpe, jiri, leonro

On Wed, 18 Oct 2023 12:01:57 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 17/10/2023 23:24, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:  
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> >> +		ops = &virtiovf_acc_vfio_pci_tran_ops;  
> >
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> Users here mainly refer to cloud operators.
> 
> We may assume, I believe, that they will be fine with seeing a 
> transitional device in the guest as they would like to get the legacy IO 
> support for their system.
> 
> However, we can still consider supplying a configuration knob in the 
> device layer (e.g. in the DPU side) to let a cloud operator turning off 
> the legacy capability.

This is a driver that implements to the virtio standard, so I don't see
how we can assume that the current use case is the only use case we'll
ever see.  Therefore we cannot assume this will only be consumed by a
specific cloud operator making use of NVIDIA hardware.  Other vendors
may implement this spec for other environments.  We might even see an
implementation of a virtual virtio-net device with SR-IOV.

> In that case upon probe() of the vfio-virtio driver, we'll just pick-up 
> the default vfio-pci 'ops' and in the guest we may have the same device 
> ID as of in the host.
> 
> With that approach we may not require a HOST side control (i.e. sysfs, 
> etc.), but stay with a s device control based on its user manual.
> 
> At the end, we don't expect any functional issue nor any compatible 
> problem with the new driver, both modern and legacy drivers can work in 
> the guest.
> 
> Can that work for you ?

This is not being proposed as an NVIDIA specific driver, we can't make
such claims relative to all foreseeable implementations of virtio-net.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.
> >
> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.
> >
> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers.  Another
> > obvious option is sysfs, where we might imagine an optional "profiles"
> > directory, perhaps under vfio-dev.  Attributes of "available" and
> > "current" could allow discovery and selection of a profile similar to
> > mdev types.  
> 
> Referring to the sysfs option,
> 
> Do you expect the sysfs data to effect the libvirt decision ? may that 
> require changes in libvirt ?

We don't have such changes in libvirt for mdev, other than the ability
of the nodedev information to return available type information.
Generally the mdev type is configured outside of libvirt, which falls
into the same sort of configuration as necessary to enable migration on
mlx5-vfio-pci.

It's possible we could allows a default profile which would be used if
the open_device callback is used without setting a profile, but we need
to be careful of vGPU use cases where profiles consume resources and a
default selection may affect other devices.

> In addition,
> May that be too late as the sysfs entry will be created upon driver 
> binding by libvirt or that we have in mind some other option to control 
> with that ?

No different than mlx5-vfio-pci, there's a necessary point between
binding the driver and using the device where configuration is needed.

> Jason,
> Can you please comment here as well ?
> 
> > Is this where we should head with this or are there other options to
> > confine this transitional behavior?
> >
> > BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?  
> 
> "acc" is just a short-cut to "access", see also here[1] a similar usage.
> 
> [1] 
> https://elixir.bootlin.com/linux/v6.6-rc6/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1380

Per the Kconfig:

	  This provides generic PCI support for HiSilicon ACC devices
	  using the VFIO framework.

Therefore I understood acc in this use case to be a formal reference to
the controller name.

> >> +
> >> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> >> +				     &pdev->dev, ops);
> >> +	if (IS_ERR(virtvdev))
> >> +		return PTR_ERR(virtvdev);
> >> +
> >> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> >> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> >> +	if (ret)
> >> +		goto out;
> >> +	return 0;
> >> +out:
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +	return ret;
> >> +}
> >> +
> >> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +}
> >> +
> >> +static const struct pci_device_id virtiovf_pci_table[] = {
> >> +	/* Only virtio-net is supported/tested so far */
> >> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> >> +	{}
> >> +};
> >> +
> >> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> >> +
> >> +static struct pci_driver virtiovf_pci_driver = {
> >> +	.name = KBUILD_MODNAME,
> >> +	.id_table = virtiovf_pci_table,
> >> +	.probe = virtiovf_pci_probe,
> >> +	.remove = virtiovf_pci_remove,
> >> +	.err_handler = &vfio_pci_core_err_handlers,
> >> +	.driver_managed_dma = true,
> >> +};
> >> +
> >> +module_pci_driver(virtiovf_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> >> +MODULE_DESCRIPTION(
> >> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");  
> > Not yet "family" per the device table.  Thanks,  
> 
> Right
> 
> How about dropping the word "family" and say instead ".. for VIRTIO 
> devices" as we have in the Kconfig in that patch [1] ?
> 
> [1] "This provides support for exposing VIRTIO VF devices .."

Are we realistically extending this beyond virtio-net?  Maybe all the
descriptions should be limited to what is actually supported as
proposed rather than aspirational goals.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-18 12:51         ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-18 12:51 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Jason Gunthorpe, mst, jasowang, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, 18 Oct 2023 12:01:57 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 17/10/2023 23:24, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:  
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> >> +		ops = &virtiovf_acc_vfio_pci_tran_ops;  
> >
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> Users here mainly refer to cloud operators.
> 
> We may assume, I believe, that they will be fine with seeing a 
> transitional device in the guest as they would like to get the legacy IO 
> support for their system.
> 
> However, we can still consider supplying a configuration knob in the 
> device layer (e.g. in the DPU side) to let a cloud operator turning off 
> the legacy capability.

This is a driver that implements to the virtio standard, so I don't see
how we can assume that the current use case is the only use case we'll
ever see.  Therefore we cannot assume this will only be consumed by a
specific cloud operator making use of NVIDIA hardware.  Other vendors
may implement this spec for other environments.  We might even see an
implementation of a virtual virtio-net device with SR-IOV.

> In that case upon probe() of the vfio-virtio driver, we'll just pick-up 
> the default vfio-pci 'ops' and in the guest we may have the same device 
> ID as of in the host.
> 
> With that approach we may not require a HOST side control (i.e. sysfs, 
> etc.), but stay with a s device control based on its user manual.
> 
> At the end, we don't expect any functional issue nor any compatible 
> problem with the new driver, both modern and legacy drivers can work in 
> the guest.
> 
> Can that work for you ?

This is not being proposed as an NVIDIA specific driver, we can't make
such claims relative to all foreseeable implementations of virtio-net.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.
> >
> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.
> >
> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers.  Another
> > obvious option is sysfs, where we might imagine an optional "profiles"
> > directory, perhaps under vfio-dev.  Attributes of "available" and
> > "current" could allow discovery and selection of a profile similar to
> > mdev types.  
> 
> Referring to the sysfs option,
> 
> Do you expect the sysfs data to effect the libvirt decision ? may that 
> require changes in libvirt ?

We don't have such changes in libvirt for mdev, other than the ability
of the nodedev information to return available type information.
Generally the mdev type is configured outside of libvirt, which falls
into the same sort of configuration as necessary to enable migration on
mlx5-vfio-pci.

It's possible we could allows a default profile which would be used if
the open_device callback is used without setting a profile, but we need
to be careful of vGPU use cases where profiles consume resources and a
default selection may affect other devices.

> In addition,
> May that be too late as the sysfs entry will be created upon driver 
> binding by libvirt or that we have in mind some other option to control 
> with that ?

No different than mlx5-vfio-pci, there's a necessary point between
binding the driver and using the device where configuration is needed.

> Jason,
> Can you please comment here as well ?
> 
> > Is this where we should head with this or are there other options to
> > confine this transitional behavior?
> >
> > BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?  
> 
> "acc" is just a short-cut to "access", see also here[1] a similar usage.
> 
> [1] 
> https://elixir.bootlin.com/linux/v6.6-rc6/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1380

Per the Kconfig:

	  This provides generic PCI support for HiSilicon ACC devices
	  using the VFIO framework.

Therefore I understood acc in this use case to be a formal reference to
the controller name.

> >> +
> >> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> >> +				     &pdev->dev, ops);
> >> +	if (IS_ERR(virtvdev))
> >> +		return PTR_ERR(virtvdev);
> >> +
> >> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> >> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> >> +	if (ret)
> >> +		goto out;
> >> +	return 0;
> >> +out:
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +	return ret;
> >> +}
> >> +
> >> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +}
> >> +
> >> +static const struct pci_device_id virtiovf_pci_table[] = {
> >> +	/* Only virtio-net is supported/tested so far */
> >> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> >> +	{}
> >> +};
> >> +
> >> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> >> +
> >> +static struct pci_driver virtiovf_pci_driver = {
> >> +	.name = KBUILD_MODNAME,
> >> +	.id_table = virtiovf_pci_table,
> >> +	.probe = virtiovf_pci_probe,
> >> +	.remove = virtiovf_pci_remove,
> >> +	.err_handler = &vfio_pci_core_err_handlers,
> >> +	.driver_managed_dma = true,
> >> +};
> >> +
> >> +module_pci_driver(virtiovf_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> >> +MODULE_DESCRIPTION(
> >> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");  
> > Not yet "family" per the device table.  Thanks,  
> 
> Right
> 
> How about dropping the word "family" and say instead ".. for VIRTIO 
> devices" as we have in the Kconfig in that patch [1] ?
> 
> [1] "This provides support for exposing VIRTIO VF devices .."

Are we realistically extending this beyond virtio-net?  Maybe all the
descriptions should be limited to what is actually supported as
proposed rather than aspirational goals.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-18 12:51         ` Alex Williamson
@ 2023-10-18 13:06           ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2023-10-18 13:06 UTC (permalink / raw)
  To: Alex Williamson, Yishai Hadas
  Cc: Jason Gunthorpe, mst, jasowang, kvm, virtualization, Feng Liu,
	Jiri Pirko, kevin.tian, joao.m.martins, si-wei.liu,
	Leon Romanovsky, Maor Gottlieb


> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, October 18, 2023 6:22 PM

> Are we realistically extending this beyond virtio-net?  Maybe all the descriptions
> should be limited to what is actually supported as proposed rather than
> aspirational goals.  Thanks,
Virtio blk would the second user of it.
The series didn't cover the test of virtio-blk mainly due to time limitation due to which Yishai only included the net device id.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-18 13:06           ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-18 13:06 UTC (permalink / raw)
  To: Alex Williamson, Yishai Hadas
  Cc: kvm, mst, Maor Gottlieb, virtualization, Jason Gunthorpe,
	Jiri Pirko, Leon Romanovsky


> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, October 18, 2023 6:22 PM

> Are we realistically extending this beyond virtio-net?  Maybe all the descriptions
> should be limited to what is actually supported as proposed rather than
> aspirational goals.  Thanks,
Virtio blk would the second user of it.
The series didn't cover the test of virtio-blk mainly due to time limitation due to which Yishai only included the net device id.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-17 20:24     ` Alex Williamson
  (?)
  (?)
@ 2023-10-18 16:33     ` Jason Gunthorpe
  2023-10-18 18:29         ` Alex Williamson
  -1 siblings, 1 reply; 100+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 16:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Tue, Oct 17, 2023 at 02:24:48PM -0600, Alex Williamson wrote:
> On Tue, 17 Oct 2023 16:42:17 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
> > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > +			      const struct pci_device_id *id)
> > +{
> > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > +	struct virtiovf_pci_core_device *virtvdev;
> > +	int ret;
> > +
> > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> 
> This is still an issue for me, it's a very narrow use case where we
> have a modern device and want to enable legacy support.  Implementing an
> IO BAR and mangling the device ID seems like it should be an opt-in,
> not standard behavior for any compatible device.  Users should
> generally expect that the device they see in the host is the device
> they see in the guest.  They might even rely on that principle.

I think this should be configured when the VF is provisioned. If the
user does not want legacy IO bar support then the VFIO VF function
should not advertise the capability, and they won't get driver
support.

I think that is a very reasonable way to approach this - it is how we
approached similar problems for mlx5. The provisioning interface is
what "profiles" the VF, regardless of if VFIO is driving it or not.

> We can't use the argument that users wanting the default device should
> use vfio-pci rather than virtio-vfio-pci because we've already defined
> the algorithm by which libvirt should choose a variant driver for a
> device.  libvirt will choose this driver for all virtio-net devices.

Well, we can if the use case is niche. I think profiling a virtio VF
to support legacy IO bar emulation and then not wanting to use it is
a niche case.

The same argument is going come with live migration. This same driver
will still bind and enable live migration if the virtio function is
profiled to support it. If you don't want that in your system then
don't profile the VF for migration support.

> This driver effectively has the option to expose two different profiles
> for the device, native or transitional.  We've discussed profile
> support for variant drivers previously as an equivalent functionality
> to mdev types, but the only use case for this currently is out-of-tree.
> I think this might be the opportunity to define how device profiles are
> exposed and selected in a variant driver.

Honestly, I've been trying to keep this out of VFIO...

The function is profiled when it is created, by whatever created
it. As in the other thread we have a vast amount of variation in what
is required to provision the function in the first place. "Legacy IO
BAR emulation support" is just one thing. virtio-net needs to be
hooked up to real network and get a MAC, virtio-blk needs to be hooked
up to real storage and get a media. At a minimum. This is big and
complicated.

It may not even be the x86 running VFIO that is doing this
provisioning, the PCI function may come pre-provisioned from a DPU.

It feels better to keep that all in one place, in whatever external
thing is preparing the function before giving it to VFIO. VFIO is
concerned with operating a prepared function.

When we get to SIOV it should not be VFIO that is
provisioning/creating functions. The owning driver should be doing
this and routing the function to VFIO (eg with an aux device or
otherwise)

This gets back to the qemu thread on the grace patch where we need to
ask how does the libvirt world see this, given there is no good way to
generically handle all scenarios without a userspace driver to operate
elements.

> Jason had previously suggested a devlink interface for this, but I
> understand that path had been shot down by devlink developers.  

I think we go some things support but supporting all things was shot
down.

> Another obvious option is sysfs, where we might imagine an optional
> "profiles" directory, perhaps under vfio-dev.  Attributes of
> "available" and "current" could allow discovery and selection of a
> profile similar to mdev types.

IMHO it is a far too complex problem for sysfs.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-18 16:33     ` Jason Gunthorpe
@ 2023-10-18 18:29         ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-18 18:29 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, mst, maorg, virtualization, jiri, leonro

On Wed, 18 Oct 2023 13:33:33 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Oct 17, 2023 at 02:24:48PM -0600, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:  
> > > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > > +			      const struct pci_device_id *id)
> > > +{
> > > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > > +	struct virtiovf_pci_core_device *virtvdev;
> > > +	int ret;
> > > +
> > > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > > +		ops = &virtiovf_acc_vfio_pci_tran_ops;  
> > 
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> I think this should be configured when the VF is provisioned. If the
> user does not want legacy IO bar support then the VFIO VF function
> should not advertise the capability, and they won't get driver
> support.
> 
> I think that is a very reasonable way to approach this - it is how we
> approached similar problems for mlx5. The provisioning interface is
> what "profiles" the VF, regardless of if VFIO is driving it or not.

It seems like a huge assumption that every device is going to allow
this degree of specification in provisioning VFs.  mlx5 is a vendor
specific driver, it can make such assumptions in design philosophy.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.  
> 
> Well, we can if the use case is niche. I think profiling a virtio VF
> to support legacy IO bar emulation and then not wanting to use it is
> a niche case.
> 
> The same argument is going come with live migration. This same driver
> will still bind and enable live migration if the virtio function is
> profiled to support it. If you don't want that in your system then
> don't profile the VF for migration support.

What in the virtio or SR-IOV spec requires a vendor to make this
configurable?

> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.  
> 
> Honestly, I've been trying to keep this out of VFIO...
> 
> The function is profiled when it is created, by whatever created
> it. As in the other thread we have a vast amount of variation in what
> is required to provision the function in the first place. "Legacy IO
> BAR emulation support" is just one thing. virtio-net needs to be
> hooked up to real network and get a MAC, virtio-blk needs to be hooked
> up to real storage and get a media. At a minimum. This is big and
> complicated.
> 
> It may not even be the x86 running VFIO that is doing this
> provisioning, the PCI function may come pre-provisioned from a DPU.
> 
> It feels better to keep that all in one place, in whatever external
> thing is preparing the function before giving it to VFIO. VFIO is
> concerned with operating a prepared function.
> 
> When we get to SIOV it should not be VFIO that is
> provisioning/creating functions. The owning driver should be doing
> this and routing the function to VFIO (eg with an aux device or
> otherwise)
> 
> This gets back to the qemu thread on the grace patch where we need to
> ask how does the libvirt world see this, given there is no good way to
> generically handle all scenarios without a userspace driver to operate
> elements.

So nothing here is really "all in one place", it may be in the
provisioning of the VF, outside of the scope of the host OS, it might
be a collection of scripts or operators with device or interface
specific tooling to configure the device.  Sometimes this configuration
will be before the device is probed by the vfio-pci variant driver,
sometimes in between probing and opening the device.

I don't see why it becomes out of scope if the variant driver itself
provides some means for selecting a device profile.  We have evidence
both from mdev vGPUs and here (imo) that we can expect to see that
behavior, so why wouldn't we want to attempt some basic shared
interface for variant drivers to implement for selecting such a profile
rather than add to this hodgepodge 

> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers.    
> 
> I think we go some things support but supporting all things was shot
> down.
> 
> > Another obvious option is sysfs, where we might imagine an optional
> > "profiles" directory, perhaps under vfio-dev.  Attributes of
> > "available" and "current" could allow discovery and selection of a
> > profile similar to mdev types.  
> 
> IMHO it is a far too complex problem for sysfs.

Isn't it then just like devlink, not a silver bullet, but useful for
some configuration?  AIUI, devlink shot down a means to list available
profiles for a device and a means to select one of those profiles.
There are a variety of attributes in sysfs which perform this sort of
behavior.  Specifying a specific profile in sysfs can be difficult, and
I'm not proposing sysfs profile support as a mandatory feature, but I'm
also not a fan of the vendor specific sysfs approach that out of tree
drivers have taken.

The mdev type interface is certainly not perfect, but from it we've
been able to develop mdevctl to allow persistent and complex
configurations of mdev devices.  I'd like to see the ability to do
something like that with variant drivers that offer multiple profiles
without always depending on vendor specific interfaces.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-18 18:29         ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-18 18:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Wed, 18 Oct 2023 13:33:33 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Oct 17, 2023 at 02:24:48PM -0600, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:  
> > > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > > +			      const struct pci_device_id *id)
> > > +{
> > > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > > +	struct virtiovf_pci_core_device *virtvdev;
> > > +	int ret;
> > > +
> > > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > > +		ops = &virtiovf_acc_vfio_pci_tran_ops;  
> > 
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> I think this should be configured when the VF is provisioned. If the
> user does not want legacy IO bar support then the VFIO VF function
> should not advertise the capability, and they won't get driver
> support.
> 
> I think that is a very reasonable way to approach this - it is how we
> approached similar problems for mlx5. The provisioning interface is
> what "profiles" the VF, regardless of if VFIO is driving it or not.

It seems like a huge assumption that every device is going to allow
this degree of specification in provisioning VFs.  mlx5 is a vendor
specific driver, it can make such assumptions in design philosophy.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.  
> 
> Well, we can if the use case is niche. I think profiling a virtio VF
> to support legacy IO bar emulation and then not wanting to use it is
> a niche case.
> 
> The same argument is going come with live migration. This same driver
> will still bind and enable live migration if the virtio function is
> profiled to support it. If you don't want that in your system then
> don't profile the VF for migration support.

What in the virtio or SR-IOV spec requires a vendor to make this
configurable?

> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.  
> 
> Honestly, I've been trying to keep this out of VFIO...
> 
> The function is profiled when it is created, by whatever created
> it. As in the other thread we have a vast amount of variation in what
> is required to provision the function in the first place. "Legacy IO
> BAR emulation support" is just one thing. virtio-net needs to be
> hooked up to real network and get a MAC, virtio-blk needs to be hooked
> up to real storage and get a media. At a minimum. This is big and
> complicated.
> 
> It may not even be the x86 running VFIO that is doing this
> provisioning, the PCI function may come pre-provisioned from a DPU.
> 
> It feels better to keep that all in one place, in whatever external
> thing is preparing the function before giving it to VFIO. VFIO is
> concerned with operating a prepared function.
> 
> When we get to SIOV it should not be VFIO that is
> provisioning/creating functions. The owning driver should be doing
> this and routing the function to VFIO (eg with an aux device or
> otherwise)
> 
> This gets back to the qemu thread on the grace patch where we need to
> ask how does the libvirt world see this, given there is no good way to
> generically handle all scenarios without a userspace driver to operate
> elements.

So nothing here is really "all in one place", it may be in the
provisioning of the VF, outside of the scope of the host OS, it might
be a collection of scripts or operators with device or interface
specific tooling to configure the device.  Sometimes this configuration
will be before the device is probed by the vfio-pci variant driver,
sometimes in between probing and opening the device.

I don't see why it becomes out of scope if the variant driver itself
provides some means for selecting a device profile.  We have evidence
both from mdev vGPUs and here (imo) that we can expect to see that
behavior, so why wouldn't we want to attempt some basic shared
interface for variant drivers to implement for selecting such a profile
rather than add to this hodgepodge 

> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers.    
> 
> I think we go some things support but supporting all things was shot
> down.
> 
> > Another obvious option is sysfs, where we might imagine an optional
> > "profiles" directory, perhaps under vfio-dev.  Attributes of
> > "available" and "current" could allow discovery and selection of a
> > profile similar to mdev types.  
> 
> IMHO it is a far too complex problem for sysfs.

Isn't it then just like devlink, not a silver bullet, but useful for
some configuration?  AIUI, devlink shot down a means to list available
profiles for a device and a means to select one of those profiles.
There are a variety of attributes in sysfs which perform this sort of
behavior.  Specifying a specific profile in sysfs can be difficult, and
I'm not proposing sysfs profile support as a mandatory feature, but I'm
also not a fan of the vendor specific sysfs approach that out of tree
drivers have taken.

The mdev type interface is certainly not perfect, but from it we've
been able to develop mdevctl to allow persistent and complex
configurations of mdev devices.  I'd like to see the ability to do
something like that with variant drivers that offer multiple profiles
without always depending on vendor specific interfaces.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-18 18:29         ` Alex Williamson
  (?)
@ 2023-10-18 19:28         ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 19:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, jasowang, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Wed, Oct 18, 2023 at 12:29:25PM -0600, Alex Williamson wrote:

> > I think this should be configured when the VF is provisioned. If the
> > user does not want legacy IO bar support then the VFIO VF function
> > should not advertise the capability, and they won't get driver
> > support.
> > 
> > I think that is a very reasonable way to approach this - it is how we
> > approached similar problems for mlx5. The provisioning interface is
> > what "profiles" the VF, regardless of if VFIO is driving it or not.
> 
> It seems like a huge assumption that every device is going to allow
> this degree of specification in provisioning VFs.  mlx5 is a vendor
> specific driver, it can make such assumptions in design philosophy.

I don't think it is a huge assumption.  Some degree of configuration
is already mandatory just to get basic functionality, and it isn't
like virtio can really be a full fixed HW implementation on the
control plane.

So the assumption is that some device SW that already must exist, and
already must be configurable just gains 1 more bit of
configuration. It does not seem like a big assumption to me at all.

Regardless, if we set an architecture/philosophy from the kernel side
vendors will align to it.

> > The same argument is going come with live migration. This same driver
> > will still bind and enable live migration if the virtio function is
> > profiled to support it. If you don't want that in your system then
> > don't profile the VF for migration support.
> 
> What in the virtio or SR-IOV spec requires a vendor to make this
> configurable?

The same part that describes how to make live migration work :)

> So nothing here is really "all in one place", it may be in the
> provisioning of the VF, outside of the scope of the host OS, it might
> be a collection of scripts or operators with device or interface
> specific tooling to configure the device.  Sometimes this configuration
> will be before the device is probed by the vfio-pci variant driver,
> sometimes in between probing and opening the device.

We don't have any in tree examples of between probing and opening -
I'd like to keep it that way..

> I don't see why it becomes out of scope if the variant driver itself
> provides some means for selecting a device profile.  We have evidence
> both from mdev vGPUs and here (imo) that we can expect to see that
> behavior, so why wouldn't we want to attempt some basic shared
> interface for variant drivers to implement for selecting such a profile
> rather than add to this hodgepodge

The GPU profiling approach is an artifact of the mdev sysfs. I do not
expect to actually do this in tree.. The function should be profiled
before it reaches VFIO, not after. This is often necessary anyhow
because a function can be bound to kernel driver in almost all cases
too.

Consistently following this approach prevents future problems where we
end up with different ways to profile/provision functions depending on
what driver is attached (vfio/in-kernel). That would be a mess.

> > > Another obvious option is sysfs, where we might imagine an optional
> > > "profiles" directory, perhaps under vfio-dev.  Attributes of
> > > "available" and "current" could allow discovery and selection of a
> > > profile similar to mdev types.  
> > 
> > IMHO it is a far too complex problem for sysfs.
> 
> Isn't it then just like devlink, not a silver bullet, but useful for
> some configuration? 

Yes, but that accepts the architecture that configuration and
provisioning should happen on the VFIO side at all, which I think is
not a good direction.

> AIUI, devlink shot down a means to list available
> profiles for a device and a means to select one of those profiles.

And other things, yes.

> There are a variety of attributes in sysfs which perform this sort of
> behavior.  Specifying a specific profile in sysfs can be difficult, and
> I'm not proposing sysfs profile support as a mandatory feature, but I'm
> also not a fan of the vendor specific sysfs approach that out of tree
> drivers have taken.

It is my belief we are going to have to build some good general
infrastructure to support SIOV. The action to spawn, provision and
activate a SIOV function should be a generic infrastructure of some
kind. We have already been through a precursor to all this with mlx5's
devlink infrastructure for SFs (which are basically SIOV functions),
so we have a pretty deep experience now.

mdev mushed all those steps into VFIO, but it belongs in different
layers. SIOV devices are not going to be exclusively consumed by VFIO.

If we have such a layer then it would be possible to also configure
VFIO "through the back door" of the provisioning layer in the kernel.

I think that is the closest we can get to some kind of generic API
here. The trouble is that it will not actually be generic because
provisioning is not generic or standardized. It doesn't eliminate the
need for having a user space driver component that actually
understands exactly what to do in order to fully provision something.

I don't know what to say about that from a libvirt perspective. Like
how does that world imagine provisioning network and storage
functions? All I know is at the openshift level it is done with
operators (aka user space drivers).

> The mdev type interface is certainly not perfect, but from it we've
> been able to develop mdevctl to allow persistent and complex
> configurations of mdev devices.  I'd like to see the ability to do
> something like that with variant drivers that offer multiple profiles
> without always depending on vendor specific interfaces.

I think profiles are too narrow an abstraction to be that broadly
useful beyond simple device types. Given the device variety we already
have I don't know if there is an alternative to a user space driver to
manage provisioning. Indeed that is how we see our actual deployments
already.

IOW I'm worried we invest a lot of effort in VFIO profiling for little
return.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-17 13:42   ` Yishai Hadas via Virtualization
@ 2023-10-22  1:14     ` kernel test robot
  -1 siblings, 0 replies; 100+ messages in thread
From: kernel test robot @ 2023-10-22  1:14 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: kvm, maorg, llvm, virtualization, jiri, oe-kbuild-all, leonro

Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.6-rc6]
[cannot apply to next-20231020]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Fix-common-config-map-for-modern-device/20231017-214450
base:   linus/master
patch link:    https://lore.kernel.org/r/20231017134217.82497-7-yishaih%40nvidia.com
patch subject: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20231022/202310220842.ADAIiZsO-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231022/202310220842.ADAIiZsO-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310220842.ADAIiZsO-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_pci_modern.c:731:5: warning: no previous prototype for function 'virtio_pci_admin_list_query' [-Wmissing-prototypes]
   int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
       ^
   drivers/virtio/virtio_pci_modern.c:731:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:758:5: warning: no previous prototype for function 'virtio_pci_admin_list_use' [-Wmissing-prototypes]
   int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
       ^
   drivers/virtio/virtio_pci_modern.c:758:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:786:5: warning: no previous prototype for function 'virtio_pci_admin_legacy_io_write' [-Wmissing-prototypes]
   int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
       ^
   drivers/virtio/virtio_pci_modern.c:786:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:831:5: warning: no previous prototype for function 'virtio_pci_admin_legacy_io_read' [-Wmissing-prototypes]
   int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
       ^
   drivers/virtio/virtio_pci_modern.c:831:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:877:5: warning: no previous prototype for function 'virtio_pci_admin_legacy_io_notify_info' [-Wmissing-prototypes]
   int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
       ^
   drivers/virtio/virtio_pci_modern.c:877:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
   ^
   static 
   5 warnings generated.


vim +/virtio_pci_admin_list_query +731 drivers/virtio/virtio_pci_modern.c

   721	
   722	/*
   723	 * virtio_pci_admin_list_query - Provides to driver list of commands
   724	 * supported for the PCI VF.
   725	 * @dev: VF pci_dev
   726	 * @buf: buffer to hold the returned list
   727	 * @buf_size: size of the given buffer
   728	 *
   729	 * Returns 0 on success, or negative on failure.
   730	 */
 > 731	int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
   732	{
   733		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   734		struct virtio_admin_cmd cmd = {};
   735		struct scatterlist result_sg;
   736	
   737		if (!virtio_dev)
   738			return -ENODEV;
   739	
   740		sg_init_one(&result_sg, buf, buf_size);
   741		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
   742		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   743		cmd.result_sg = &result_sg;
   744	
   745		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   746	}
   747	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
   748	
   749	/*
   750	 * virtio_pci_admin_list_use - Provides to device list of commands
   751	 * used for the PCI VF.
   752	 * @dev: VF pci_dev
   753	 * @buf: buffer which holds the list
   754	 * @buf_size: size of the given buffer
   755	 *
   756	 * Returns 0 on success, or negative on failure.
   757	 */
 > 758	int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
   759	{
   760		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   761		struct virtio_admin_cmd cmd = {};
   762		struct scatterlist data_sg;
   763	
   764		if (!virtio_dev)
   765			return -ENODEV;
   766	
   767		sg_init_one(&data_sg, buf, buf_size);
   768		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
   769		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   770		cmd.data_sg = &data_sg;
   771	
   772		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   773	}
   774	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
   775	
   776	/*
   777	 * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
   778	 * @dev: VF pci_dev
   779	 * @opcode: op code of the io write command
   780	 * @offset: starting byte offset within the registers to write to
   781	 * @size: size of the data to write
   782	 * @buf: buffer which holds the data
   783	 *
   784	 * Returns 0 on success, or negative on failure.
   785	 */
 > 786	int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
   787					     u8 offset, u8 size, u8 *buf)
   788	{
   789		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   790		struct virtio_admin_cmd_legacy_wr_data *data;
   791		struct virtio_admin_cmd cmd = {};
   792		struct scatterlist data_sg;
   793		int vf_id;
   794		int ret;
   795	
   796		if (!virtio_dev)
   797			return -ENODEV;
   798	
   799		vf_id = pci_iov_vf_id(pdev);
   800		if (vf_id < 0)
   801			return vf_id;
   802	
   803		data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
   804		if (!data)
   805			return -ENOMEM;
   806	
   807		data->offset = offset;
   808		memcpy(data->registers, buf, size);
   809		sg_init_one(&data_sg, data, sizeof(*data) + size);
   810		cmd.opcode = cpu_to_le16(opcode);
   811		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   812		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   813		cmd.data_sg = &data_sg;
   814		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   815	
   816		kfree(data);
   817		return ret;
   818	}
   819	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
   820	
   821	/*
   822	 * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
   823	 * @dev: VF pci_dev
   824	 * @opcode: op code of the io read command
   825	 * @offset: starting byte offset within the registers to read from
   826	 * @size: size of the data to be read
   827	 * @buf: buffer to hold the returned data
   828	 *
   829	 * Returns 0 on success, or negative on failure.
   830	 */
 > 831	int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
   832					    u8 offset, u8 size, u8 *buf)
   833	{
   834		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   835		struct virtio_admin_cmd_legacy_rd_data *data;
   836		struct scatterlist data_sg, result_sg;
   837		struct virtio_admin_cmd cmd = {};
   838		int vf_id;
   839		int ret;
   840	
   841		if (!virtio_dev)
   842			return -ENODEV;
   843	
   844		vf_id = pci_iov_vf_id(pdev);
   845		if (vf_id < 0)
   846			return vf_id;
   847	
   848		data = kzalloc(sizeof(*data), GFP_KERNEL);
   849		if (!data)
   850			return -ENOMEM;
   851	
   852		data->offset = offset;
   853		sg_init_one(&data_sg, data, sizeof(*data));
   854		sg_init_one(&result_sg, buf, size);
   855		cmd.opcode = cpu_to_le16(opcode);
   856		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   857		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   858		cmd.data_sg = &data_sg;
   859		cmd.result_sg = &result_sg;
   860		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   861	
   862		kfree(data);
   863		return ret;
   864	}
   865	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
   866	
   867	/*
   868	 * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
   869	 * information for legacy interface
   870	 * @dev: VF pci_dev
   871	 * @req_bar_flags: requested bar flags
   872	 * @bar: on output the BAR number of the member device
   873	 * @bar_offset: on output the offset within bar
   874	 *
   875	 * Returns 0 on success, or negative on failure.
   876	 */
 > 877	int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
   878						   u8 req_bar_flags, u8 *bar,
   879						   u64 *bar_offset)
   880	{
   881		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   882		struct virtio_admin_cmd_notify_info_result *result;
   883		struct virtio_admin_cmd cmd = {};
   884		struct scatterlist result_sg;
   885		int vf_id;
   886		int ret;
   887	
   888		if (!virtio_dev)
   889			return -ENODEV;
   890	
   891		vf_id = pci_iov_vf_id(pdev);
   892		if (vf_id < 0)
   893			return vf_id;
   894	
   895		result = kzalloc(sizeof(*result), GFP_KERNEL);
   896		if (!result)
   897			return -ENOMEM;
   898	
   899		sg_init_one(&result_sg, result, sizeof(*result));
   900		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
   901		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   902		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   903		cmd.result_sg = &result_sg;
   904		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   905		if (!ret) {
   906			struct virtio_admin_cmd_notify_info_data *entry;
   907			int i;
   908	
   909			ret = -ENOENT;
   910			for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
   911				entry = &result->entries[i];
   912				if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
   913					break;
   914				if (entry->flags != req_bar_flags)
   915					continue;
   916				*bar = entry->bar;
   917				*bar_offset = le64_to_cpu(entry->offset);
   918				ret = 0;
   919				break;
   920			}
   921		}
   922	
   923		kfree(result);
   924		return ret;
   925	}
   926	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
   927	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-22  1:14     ` kernel test robot
  0 siblings, 0 replies; 100+ messages in thread
From: kernel test robot @ 2023-10-22  1:14 UTC (permalink / raw)
  To: Yishai Hadas, alex.williamson, mst, jasowang, jgg
  Cc: llvm, oe-kbuild-all, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, yishaih, maorg

Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.6-rc6]
[cannot apply to next-20231020]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Fix-common-config-map-for-modern-device/20231017-214450
base:   linus/master
patch link:    https://lore.kernel.org/r/20231017134217.82497-7-yishaih%40nvidia.com
patch subject: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20231022/202310220842.ADAIiZsO-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231022/202310220842.ADAIiZsO-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310220842.ADAIiZsO-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_pci_modern.c:731:5: warning: no previous prototype for function 'virtio_pci_admin_list_query' [-Wmissing-prototypes]
   int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
       ^
   drivers/virtio/virtio_pci_modern.c:731:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:758:5: warning: no previous prototype for function 'virtio_pci_admin_list_use' [-Wmissing-prototypes]
   int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
       ^
   drivers/virtio/virtio_pci_modern.c:758:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:786:5: warning: no previous prototype for function 'virtio_pci_admin_legacy_io_write' [-Wmissing-prototypes]
   int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
       ^
   drivers/virtio/virtio_pci_modern.c:786:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:831:5: warning: no previous prototype for function 'virtio_pci_admin_legacy_io_read' [-Wmissing-prototypes]
   int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
       ^
   drivers/virtio/virtio_pci_modern.c:831:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
   ^
   static 
>> drivers/virtio/virtio_pci_modern.c:877:5: warning: no previous prototype for function 'virtio_pci_admin_legacy_io_notify_info' [-Wmissing-prototypes]
   int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
       ^
   drivers/virtio/virtio_pci_modern.c:877:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
   ^
   static 
   5 warnings generated.


vim +/virtio_pci_admin_list_query +731 drivers/virtio/virtio_pci_modern.c

   721	
   722	/*
   723	 * virtio_pci_admin_list_query - Provides to driver list of commands
   724	 * supported for the PCI VF.
   725	 * @dev: VF pci_dev
   726	 * @buf: buffer to hold the returned list
   727	 * @buf_size: size of the given buffer
   728	 *
   729	 * Returns 0 on success, or negative on failure.
   730	 */
 > 731	int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
   732	{
   733		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   734		struct virtio_admin_cmd cmd = {};
   735		struct scatterlist result_sg;
   736	
   737		if (!virtio_dev)
   738			return -ENODEV;
   739	
   740		sg_init_one(&result_sg, buf, buf_size);
   741		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
   742		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   743		cmd.result_sg = &result_sg;
   744	
   745		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   746	}
   747	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
   748	
   749	/*
   750	 * virtio_pci_admin_list_use - Provides to device list of commands
   751	 * used for the PCI VF.
   752	 * @dev: VF pci_dev
   753	 * @buf: buffer which holds the list
   754	 * @buf_size: size of the given buffer
   755	 *
   756	 * Returns 0 on success, or negative on failure.
   757	 */
 > 758	int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
   759	{
   760		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   761		struct virtio_admin_cmd cmd = {};
   762		struct scatterlist data_sg;
   763	
   764		if (!virtio_dev)
   765			return -ENODEV;
   766	
   767		sg_init_one(&data_sg, buf, buf_size);
   768		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
   769		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   770		cmd.data_sg = &data_sg;
   771	
   772		return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   773	}
   774	EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
   775	
   776	/*
   777	 * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
   778	 * @dev: VF pci_dev
   779	 * @opcode: op code of the io write command
   780	 * @offset: starting byte offset within the registers to write to
   781	 * @size: size of the data to write
   782	 * @buf: buffer which holds the data
   783	 *
   784	 * Returns 0 on success, or negative on failure.
   785	 */
 > 786	int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
   787					     u8 offset, u8 size, u8 *buf)
   788	{
   789		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   790		struct virtio_admin_cmd_legacy_wr_data *data;
   791		struct virtio_admin_cmd cmd = {};
   792		struct scatterlist data_sg;
   793		int vf_id;
   794		int ret;
   795	
   796		if (!virtio_dev)
   797			return -ENODEV;
   798	
   799		vf_id = pci_iov_vf_id(pdev);
   800		if (vf_id < 0)
   801			return vf_id;
   802	
   803		data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
   804		if (!data)
   805			return -ENOMEM;
   806	
   807		data->offset = offset;
   808		memcpy(data->registers, buf, size);
   809		sg_init_one(&data_sg, data, sizeof(*data) + size);
   810		cmd.opcode = cpu_to_le16(opcode);
   811		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   812		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   813		cmd.data_sg = &data_sg;
   814		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   815	
   816		kfree(data);
   817		return ret;
   818	}
   819	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
   820	
   821	/*
   822	 * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
   823	 * @dev: VF pci_dev
   824	 * @opcode: op code of the io read command
   825	 * @offset: starting byte offset within the registers to read from
   826	 * @size: size of the data to be read
   827	 * @buf: buffer to hold the returned data
   828	 *
   829	 * Returns 0 on success, or negative on failure.
   830	 */
 > 831	int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
   832					    u8 offset, u8 size, u8 *buf)
   833	{
   834		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   835		struct virtio_admin_cmd_legacy_rd_data *data;
   836		struct scatterlist data_sg, result_sg;
   837		struct virtio_admin_cmd cmd = {};
   838		int vf_id;
   839		int ret;
   840	
   841		if (!virtio_dev)
   842			return -ENODEV;
   843	
   844		vf_id = pci_iov_vf_id(pdev);
   845		if (vf_id < 0)
   846			return vf_id;
   847	
   848		data = kzalloc(sizeof(*data), GFP_KERNEL);
   849		if (!data)
   850			return -ENOMEM;
   851	
   852		data->offset = offset;
   853		sg_init_one(&data_sg, data, sizeof(*data));
   854		sg_init_one(&result_sg, buf, size);
   855		cmd.opcode = cpu_to_le16(opcode);
   856		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   857		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   858		cmd.data_sg = &data_sg;
   859		cmd.result_sg = &result_sg;
   860		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   861	
   862		kfree(data);
   863		return ret;
   864	}
   865	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
   866	
   867	/*
   868	 * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
   869	 * information for legacy interface
   870	 * @dev: VF pci_dev
   871	 * @req_bar_flags: requested bar flags
   872	 * @bar: on output the BAR number of the member device
   873	 * @bar_offset: on output the offset within bar
   874	 *
   875	 * Returns 0 on success, or negative on failure.
   876	 */
 > 877	int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
   878						   u8 req_bar_flags, u8 *bar,
   879						   u64 *bar_offset)
   880	{
   881		struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
   882		struct virtio_admin_cmd_notify_info_result *result;
   883		struct virtio_admin_cmd cmd = {};
   884		struct scatterlist result_sg;
   885		int vf_id;
   886		int ret;
   887	
   888		if (!virtio_dev)
   889			return -ENODEV;
   890	
   891		vf_id = pci_iov_vf_id(pdev);
   892		if (vf_id < 0)
   893			return vf_id;
   894	
   895		result = kzalloc(sizeof(*result), GFP_KERNEL);
   896		if (!result)
   897			return -ENOMEM;
   898	
   899		sg_init_one(&result_sg, result, sizeof(*result));
   900		cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
   901		cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   902		cmd.group_member_id = cpu_to_le64(vf_id + 1);
   903		cmd.result_sg = &result_sg;
   904		ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   905		if (!ret) {
   906			struct virtio_admin_cmd_notify_info_data *entry;
   907			int i;
   908	
   909			ret = -ENOENT;
   910			for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
   911				entry = &result->entries[i];
   912				if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
   913					break;
   914				if (entry->flags != req_bar_flags)
   915					continue;
   916				*bar = entry->bar;
   917				*bar_offset = le64_to_cpu(entry->offset);
   918				ret = 0;
   919				break;
   920			}
   921		}
   922	
   923		kfree(result);
   924		return ret;
   925	}
   926	EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
   927	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-17 13:42 ` Yishai Hadas via Virtualization
@ 2023-10-22  8:20   ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-22  8:20 UTC (permalink / raw)
  To: alex.williamson, mst, jgg
  Cc: kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On 17/10/2023 16:42, Yishai Hadas wrote:
> This series introduce a vfio driver over virtio devices to support the
> legacy interface functionality for VFs.
>
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
>
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
>
> The first 6 patches are in the virtio area and handle the below:
> - Fix common config map for modern device as was reported by Michael Tsirkin.
> - Introduce the admin virtqueue infrastcture.
> - Expose the layout of the commands that should be used for
>    supporting the legacy access.
> - Expose APIs to enable upper layers as of vfio, net, etc
>    to execute admin commands.
>
> The above follows the virtio spec that was lastly accepted in that area
> [1].
>
> The last 3 patches are in the vfio area and handle the below:
> - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> - Introduce a vfio driver over virtio devices to support the legacy
>    interface functionality for VFs.
>
> The series was tested successfully over virtio-net VFs in the host,
> while running in the guest both modern and legacy drivers.
>
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>
> Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
>
> Virtio:
> - Fix the common config map size issue that was reported by Michael
>    Tsirkin.
> - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
>    Michael, instead skip the AQ specifically.
> - Move admin vq implementation into virtio_pci_modern.c as was asked by
>    Michael.
> - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
>    corresponding renames.
> - Remove exported symbols virtio_pci_vf_get_pf_dev(),
>    virtio_admin_cmd_exec() as now callers are local to the module.
> - Handle inflight commands as part of the device reset flow.
> - Introduce APIs per admin command in virtio-pci as was asked by Michael.
>
> Vfio:
> - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
>    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
>    Alex.
> - Drop the intermediate patch which prepares the commands and calls the
>    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> - Instead, call directly to the new APIs per admin command that are
>    exported from Virtio - based on Michael's request.
> - Enable only virtio-net as part of the pci_device_id table to enforce
>    upon binding only what is supported as suggested by Alex.
> - Add support for byte-wise access (read/write) over the device config
>    region as was asked by Alex.
> - Consider whether MSIX is practically enabled/disabled to choose the
>    right opcode upon issuing read/write admin command, as mentioned
>    by Michael.
> - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
>    as was suggested by Michael.
> - Set the '.close_device' op to vfio_pci_core_close_device() as was
>    pointed by Alex.
> - Adapt to Vfio multi-line comment style in a few places.
> - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
>    to be CCed for the new driver as was suggested by Jason.
>
> Yishai
>
> Feng Liu (5):
>    virtio-pci: Fix common config map for modern device
>    virtio: Define feature bit for administration virtqueue
>    virtio-pci: Introduce admin virtqueue
>    virtio-pci: Introduce admin command sending function
>    virtio-pci: Introduce admin commands
>
> Yishai Hadas (4):
>    virtio-pci: Introduce APIs to execute legacy IO admin commands
>    vfio/pci: Expose vfio_pci_core_setup_barmap()
>    vfio/pci: Expose vfio_pci_iowrite/read##size()
>    vfio/virtio: Introduce a vfio driver over virtio devices
>
>   MAINTAINERS                            |   7 +
>   drivers/vfio/pci/Kconfig               |   2 +
>   drivers/vfio/pci/Makefile              |   2 +
>   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
>   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
>   drivers/vfio/pci/virtio/Kconfig        |  15 +
>   drivers/vfio/pci/virtio/Makefile       |   4 +
>   drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
>   drivers/virtio/virtio.c                |  37 +-
>   drivers/virtio/virtio_pci_common.c     |  14 +
>   drivers/virtio/virtio_pci_common.h     |  20 +-
>   drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
>   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
>   include/linux/vfio_pci_core.h          |  20 +
>   include/linux/virtio.h                 |   8 +
>   include/linux/virtio_config.h          |   4 +
>   include/linux/virtio_pci_admin.h       |  18 +
>   include/linux/virtio_pci_modern.h      |   5 +
>   include/uapi/linux/virtio_config.h     |   8 +-
>   include/uapi/linux/virtio_pci.h        |  66 +++
>   20 files changed, 1295 insertions(+), 40 deletions(-)
>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>   create mode 100644 drivers/vfio/pci/virtio/main.c
>   create mode 100644 include/linux/virtio_pci_admin.h
>
Hi Michael,

Did you have the chance to review the virtio part of that series ?

IMO, we addressed all your notes on V0, I would be happy to get your 
feedback on V1 before sending V2.

In my TO-DO list for V2, have for now the below minor items.
Virtio:
Patch #6: Fix a krobot note where it needs to include the H file as part 
of the export symbols C file.
Vfio:
#patch #9: Rename the 'ops' variable to drop the 'acc' and potentially 
some rename in the description of the module with regards to 'family'.

Alex,
Are you fine to leave the provisioning of the VF including the control 
of its transitional capability in the device hands as was suggested by 
Jason ?
Any specific recommendation following the discussion in the ML, for the 
'family' note ?

Once I'll have the above feedback I may prepare and send V2.

Yishai


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-22  8:20   ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-22  8:20 UTC (permalink / raw)
  To: alex.williamson, mst, jgg; +Cc: kvm, maorg, virtualization, jiri, leonro

On 17/10/2023 16:42, Yishai Hadas wrote:
> This series introduce a vfio driver over virtio devices to support the
> legacy interface functionality for VFs.
>
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
>
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
>
> The first 6 patches are in the virtio area and handle the below:
> - Fix common config map for modern device as was reported by Michael Tsirkin.
> - Introduce the admin virtqueue infrastcture.
> - Expose the layout of the commands that should be used for
>    supporting the legacy access.
> - Expose APIs to enable upper layers as of vfio, net, etc
>    to execute admin commands.
>
> The above follows the virtio spec that was lastly accepted in that area
> [1].
>
> The last 3 patches are in the vfio area and handle the below:
> - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> - Introduce a vfio driver over virtio devices to support the legacy
>    interface functionality for VFs.
>
> The series was tested successfully over virtio-net VFs in the host,
> while running in the guest both modern and legacy drivers.
>
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>
> Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
>
> Virtio:
> - Fix the common config map size issue that was reported by Michael
>    Tsirkin.
> - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
>    Michael, instead skip the AQ specifically.
> - Move admin vq implementation into virtio_pci_modern.c as was asked by
>    Michael.
> - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
>    corresponding renames.
> - Remove exported symbols virtio_pci_vf_get_pf_dev(),
>    virtio_admin_cmd_exec() as now callers are local to the module.
> - Handle inflight commands as part of the device reset flow.
> - Introduce APIs per admin command in virtio-pci as was asked by Michael.
>
> Vfio:
> - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
>    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
>    Alex.
> - Drop the intermediate patch which prepares the commands and calls the
>    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> - Instead, call directly to the new APIs per admin command that are
>    exported from Virtio - based on Michael's request.
> - Enable only virtio-net as part of the pci_device_id table to enforce
>    upon binding only what is supported as suggested by Alex.
> - Add support for byte-wise access (read/write) over the device config
>    region as was asked by Alex.
> - Consider whether MSIX is practically enabled/disabled to choose the
>    right opcode upon issuing read/write admin command, as mentioned
>    by Michael.
> - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
>    as was suggested by Michael.
> - Set the '.close_device' op to vfio_pci_core_close_device() as was
>    pointed by Alex.
> - Adapt to Vfio multi-line comment style in a few places.
> - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
>    to be CCed for the new driver as was suggested by Jason.
>
> Yishai
>
> Feng Liu (5):
>    virtio-pci: Fix common config map for modern device
>    virtio: Define feature bit for administration virtqueue
>    virtio-pci: Introduce admin virtqueue
>    virtio-pci: Introduce admin command sending function
>    virtio-pci: Introduce admin commands
>
> Yishai Hadas (4):
>    virtio-pci: Introduce APIs to execute legacy IO admin commands
>    vfio/pci: Expose vfio_pci_core_setup_barmap()
>    vfio/pci: Expose vfio_pci_iowrite/read##size()
>    vfio/virtio: Introduce a vfio driver over virtio devices
>
>   MAINTAINERS                            |   7 +
>   drivers/vfio/pci/Kconfig               |   2 +
>   drivers/vfio/pci/Makefile              |   2 +
>   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
>   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
>   drivers/vfio/pci/virtio/Kconfig        |  15 +
>   drivers/vfio/pci/virtio/Makefile       |   4 +
>   drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
>   drivers/virtio/virtio.c                |  37 +-
>   drivers/virtio/virtio_pci_common.c     |  14 +
>   drivers/virtio/virtio_pci_common.h     |  20 +-
>   drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
>   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
>   include/linux/vfio_pci_core.h          |  20 +
>   include/linux/virtio.h                 |   8 +
>   include/linux/virtio_config.h          |   4 +
>   include/linux/virtio_pci_admin.h       |  18 +
>   include/linux/virtio_pci_modern.h      |   5 +
>   include/uapi/linux/virtio_config.h     |   8 +-
>   include/uapi/linux/virtio_pci.h        |  66 +++
>   20 files changed, 1295 insertions(+), 40 deletions(-)
>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>   create mode 100644 drivers/vfio/pci/virtio/main.c
>   create mode 100644 include/linux/virtio_pci_admin.h
>
Hi Michael,

Did you have the chance to review the virtio part of that series ?

IMO, we addressed all your notes on V0, I would be happy to get your 
feedback on V1 before sending V2.

In my TO-DO list for V2, have for now the below minor items.
Virtio:
Patch #6: Fix a krobot note where it needs to include the H file as part 
of the export symbols C file.
Vfio:
#patch #9: Rename the 'ops' variable to drop the 'acc' and potentially 
some rename in the description of the module with regards to 'family'.

Alex,
Are you fine to leave the provisioning of the VF including the control 
of its transitional capability in the device hands as was suggested by 
Jason ?
Any specific recommendation following the discussion in the ML, for the 
'family' note ?

Once I'll have the above feedback I may prepare and send V2.

Yishai

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-22  8:20   ` Yishai Hadas via Virtualization
@ 2023-10-22  9:12     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-22  9:12 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Sun, Oct 22, 2023 at 11:20:31AM +0300, Yishai Hadas wrote:
> On 17/10/2023 16:42, Yishai Hadas wrote:
> > This series introduce a vfio driver over virtio devices to support the
> > legacy interface functionality for VFs.
> > 
> > Background, from the virtio spec [1].
> > --------------------------------------------------------------------
> > In some systems, there is a need to support a virtio legacy driver with
> > a device that does not directly support the legacy interface. In such
> > scenarios, a group owner device can provide the legacy interface
> > functionality for the group member devices. The driver of the owner
> > device can then access the legacy interface of a member device on behalf
> > of the legacy member device driver.
> > 
> > For example, with the SR-IOV group type, group members (VFs) can not
> > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > legacy pci driver. If the legacy driver is running inside a virtual
> > machine, the hypervisor executing the virtual machine can present a
> > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > legacy driver accesses to this I/O BAR and forwards them to the group
> > owner device (PF) using group administration commands.
> > --------------------------------------------------------------------
> > 
> > The first 6 patches are in the virtio area and handle the below:
> > - Fix common config map for modern device as was reported by Michael Tsirkin.
> > - Introduce the admin virtqueue infrastcture.
> > - Expose the layout of the commands that should be used for
> >    supporting the legacy access.
> > - Expose APIs to enable upper layers as of vfio, net, etc
> >    to execute admin commands.
> > 
> > The above follows the virtio spec that was lastly accepted in that area
> > [1].
> > 
> > The last 3 patches are in the vfio area and handle the below:
> > - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> > - Introduce a vfio driver over virtio devices to support the legacy
> >    interface functionality for VFs.
> > 
> > The series was tested successfully over virtio-net VFs in the host,
> > while running in the guest both modern and legacy drivers.
> > 
> > [1]
> > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > 
> > Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
> > 
> > Virtio:
> > - Fix the common config map size issue that was reported by Michael
> >    Tsirkin.
> > - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
> >    Michael, instead skip the AQ specifically.
> > - Move admin vq implementation into virtio_pci_modern.c as was asked by
> >    Michael.
> > - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
> >    corresponding renames.
> > - Remove exported symbols virtio_pci_vf_get_pf_dev(),
> >    virtio_admin_cmd_exec() as now callers are local to the module.
> > - Handle inflight commands as part of the device reset flow.
> > - Introduce APIs per admin command in virtio-pci as was asked by Michael.
> > 
> > Vfio:
> > - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
> >    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
> >    Alex.
> > - Drop the intermediate patch which prepares the commands and calls the
> >    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> > - Instead, call directly to the new APIs per admin command that are
> >    exported from Virtio - based on Michael's request.
> > - Enable only virtio-net as part of the pci_device_id table to enforce
> >    upon binding only what is supported as suggested by Alex.
> > - Add support for byte-wise access (read/write) over the device config
> >    region as was asked by Alex.
> > - Consider whether MSIX is practically enabled/disabled to choose the
> >    right opcode upon issuing read/write admin command, as mentioned
> >    by Michael.
> > - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
> >    as was suggested by Michael.
> > - Set the '.close_device' op to vfio_pci_core_close_device() as was
> >    pointed by Alex.
> > - Adapt to Vfio multi-line comment style in a few places.
> > - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
> >    to be CCed for the new driver as was suggested by Jason.
> > 
> > Yishai
> > 
> > Feng Liu (5):
> >    virtio-pci: Fix common config map for modern device
> >    virtio: Define feature bit for administration virtqueue
> >    virtio-pci: Introduce admin virtqueue
> >    virtio-pci: Introduce admin command sending function
> >    virtio-pci: Introduce admin commands
> > 
> > Yishai Hadas (4):
> >    virtio-pci: Introduce APIs to execute legacy IO admin commands
> >    vfio/pci: Expose vfio_pci_core_setup_barmap()
> >    vfio/pci: Expose vfio_pci_iowrite/read##size()
> >    vfio/virtio: Introduce a vfio driver over virtio devices
> > 
> >   MAINTAINERS                            |   7 +
> >   drivers/vfio/pci/Kconfig               |   2 +
> >   drivers/vfio/pci/Makefile              |   2 +
> >   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
> >   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
> >   drivers/vfio/pci/virtio/Kconfig        |  15 +
> >   drivers/vfio/pci/virtio/Makefile       |   4 +
> >   drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
> >   drivers/virtio/virtio.c                |  37 +-
> >   drivers/virtio/virtio_pci_common.c     |  14 +
> >   drivers/virtio/virtio_pci_common.h     |  20 +-
> >   drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
> >   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
> >   include/linux/vfio_pci_core.h          |  20 +
> >   include/linux/virtio.h                 |   8 +
> >   include/linux/virtio_config.h          |   4 +
> >   include/linux/virtio_pci_admin.h       |  18 +
> >   include/linux/virtio_pci_modern.h      |   5 +
> >   include/uapi/linux/virtio_config.h     |   8 +-
> >   include/uapi/linux/virtio_pci.h        |  66 +++
> >   20 files changed, 1295 insertions(+), 40 deletions(-)
> >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >   create mode 100644 drivers/vfio/pci/virtio/main.c
> >   create mode 100644 include/linux/virtio_pci_admin.h
> > 
> Hi Michael,
> 
> Did you have the chance to review the virtio part of that series ?

Not yet, will take a couple more days.

> IMO, we addressed all your notes on V0, I would be happy to get your
> feedback on V1 before sending V2.
> 
> In my TO-DO list for V2, have for now the below minor items.
> Virtio:
> Patch #6: Fix a krobot note where it needs to include the H file as part of
> the export symbols C file.
> Vfio:
> #patch #9: Rename the 'ops' variable to drop the 'acc' and potentially some
> rename in the description of the module with regards to 'family'.
> 
> Alex,
> Are you fine to leave the provisioning of the VF including the control of
> its transitional capability in the device hands as was suggested by Jason ?
> Any specific recommendation following the discussion in the ML, for the
> 'family' note ?
> 
> Once I'll have the above feedback I may prepare and send V2.
> 
> Yishai


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-22  9:12     ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-22  9:12 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Sun, Oct 22, 2023 at 11:20:31AM +0300, Yishai Hadas wrote:
> On 17/10/2023 16:42, Yishai Hadas wrote:
> > This series introduce a vfio driver over virtio devices to support the
> > legacy interface functionality for VFs.
> > 
> > Background, from the virtio spec [1].
> > --------------------------------------------------------------------
> > In some systems, there is a need to support a virtio legacy driver with
> > a device that does not directly support the legacy interface. In such
> > scenarios, a group owner device can provide the legacy interface
> > functionality for the group member devices. The driver of the owner
> > device can then access the legacy interface of a member device on behalf
> > of the legacy member device driver.
> > 
> > For example, with the SR-IOV group type, group members (VFs) can not
> > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > legacy pci driver. If the legacy driver is running inside a virtual
> > machine, the hypervisor executing the virtual machine can present a
> > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > legacy driver accesses to this I/O BAR and forwards them to the group
> > owner device (PF) using group administration commands.
> > --------------------------------------------------------------------
> > 
> > The first 6 patches are in the virtio area and handle the below:
> > - Fix common config map for modern device as was reported by Michael Tsirkin.
> > - Introduce the admin virtqueue infrastcture.
> > - Expose the layout of the commands that should be used for
> >    supporting the legacy access.
> > - Expose APIs to enable upper layers as of vfio, net, etc
> >    to execute admin commands.
> > 
> > The above follows the virtio spec that was lastly accepted in that area
> > [1].
> > 
> > The last 3 patches are in the vfio area and handle the below:
> > - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> > - Introduce a vfio driver over virtio devices to support the legacy
> >    interface functionality for VFs.
> > 
> > The series was tested successfully over virtio-net VFs in the host,
> > while running in the guest both modern and legacy drivers.
> > 
> > [1]
> > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > 
> > Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
> > 
> > Virtio:
> > - Fix the common config map size issue that was reported by Michael
> >    Tsirkin.
> > - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
> >    Michael, instead skip the AQ specifically.
> > - Move admin vq implementation into virtio_pci_modern.c as was asked by
> >    Michael.
> > - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
> >    corresponding renames.
> > - Remove exported symbols virtio_pci_vf_get_pf_dev(),
> >    virtio_admin_cmd_exec() as now callers are local to the module.
> > - Handle inflight commands as part of the device reset flow.
> > - Introduce APIs per admin command in virtio-pci as was asked by Michael.
> > 
> > Vfio:
> > - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
> >    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
> >    Alex.
> > - Drop the intermediate patch which prepares the commands and calls the
> >    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> > - Instead, call directly to the new APIs per admin command that are
> >    exported from Virtio - based on Michael's request.
> > - Enable only virtio-net as part of the pci_device_id table to enforce
> >    upon binding only what is supported as suggested by Alex.
> > - Add support for byte-wise access (read/write) over the device config
> >    region as was asked by Alex.
> > - Consider whether MSIX is practically enabled/disabled to choose the
> >    right opcode upon issuing read/write admin command, as mentioned
> >    by Michael.
> > - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
> >    as was suggested by Michael.
> > - Set the '.close_device' op to vfio_pci_core_close_device() as was
> >    pointed by Alex.
> > - Adapt to Vfio multi-line comment style in a few places.
> > - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
> >    to be CCed for the new driver as was suggested by Jason.
> > 
> > Yishai
> > 
> > Feng Liu (5):
> >    virtio-pci: Fix common config map for modern device
> >    virtio: Define feature bit for administration virtqueue
> >    virtio-pci: Introduce admin virtqueue
> >    virtio-pci: Introduce admin command sending function
> >    virtio-pci: Introduce admin commands
> > 
> > Yishai Hadas (4):
> >    virtio-pci: Introduce APIs to execute legacy IO admin commands
> >    vfio/pci: Expose vfio_pci_core_setup_barmap()
> >    vfio/pci: Expose vfio_pci_iowrite/read##size()
> >    vfio/virtio: Introduce a vfio driver over virtio devices
> > 
> >   MAINTAINERS                            |   7 +
> >   drivers/vfio/pci/Kconfig               |   2 +
> >   drivers/vfio/pci/Makefile              |   2 +
> >   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
> >   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
> >   drivers/vfio/pci/virtio/Kconfig        |  15 +
> >   drivers/vfio/pci/virtio/Makefile       |   4 +
> >   drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
> >   drivers/virtio/virtio.c                |  37 +-
> >   drivers/virtio/virtio_pci_common.c     |  14 +
> >   drivers/virtio/virtio_pci_common.h     |  20 +-
> >   drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
> >   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
> >   include/linux/vfio_pci_core.h          |  20 +
> >   include/linux/virtio.h                 |   8 +
> >   include/linux/virtio_config.h          |   4 +
> >   include/linux/virtio_pci_admin.h       |  18 +
> >   include/linux/virtio_pci_modern.h      |   5 +
> >   include/uapi/linux/virtio_config.h     |   8 +-
> >   include/uapi/linux/virtio_pci.h        |  66 +++
> >   20 files changed, 1295 insertions(+), 40 deletions(-)
> >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >   create mode 100644 drivers/vfio/pci/virtio/main.c
> >   create mode 100644 include/linux/virtio_pci_admin.h
> > 
> Hi Michael,
> 
> Did you have the chance to review the virtio part of that series ?

Not yet, will take a couple more days.

> IMO, we addressed all your notes on V0, I would be happy to get your
> feedback on V1 before sending V2.
> 
> In my TO-DO list for V2, have for now the below minor items.
> Virtio:
> Patch #6: Fix a krobot note where it needs to include the H file as part of
> the export symbols C file.
> Vfio:
> #patch #9: Rename the 'ops' variable to drop the 'acc' and potentially some
> rename in the description of the module with regards to 'family'.
> 
> Alex,
> Are you fine to leave the provisioning of the VF including the control of
> its transitional capability in the device hands as was suggested by Jason ?
> Any specific recommendation following the discussion in the ML, for the
> 'family' note ?
> 
> Once I'll have the above feedback I may prepare and send V2.
> 
> Yishai

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-22  8:20   ` Yishai Hadas via Virtualization
@ 2023-10-23 15:33     ` Alex Williamson
  -1 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-23 15:33 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jgg, kvm, virtualization, parav, feliu, jiri, kevin.tian,
	joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Sun, 22 Oct 2023 11:20:31 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 17/10/2023 16:42, Yishai Hadas wrote:
> > This series introduce a vfio driver over virtio devices to support the
> > legacy interface functionality for VFs.
> >
> > Background, from the virtio spec [1].
> > --------------------------------------------------------------------
> > In some systems, there is a need to support a virtio legacy driver with
> > a device that does not directly support the legacy interface. In such
> > scenarios, a group owner device can provide the legacy interface
> > functionality for the group member devices. The driver of the owner
> > device can then access the legacy interface of a member device on behalf
> > of the legacy member device driver.
> >
> > For example, with the SR-IOV group type, group members (VFs) can not
> > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > legacy pci driver. If the legacy driver is running inside a virtual
> > machine, the hypervisor executing the virtual machine can present a
> > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > legacy driver accesses to this I/O BAR and forwards them to the group
> > owner device (PF) using group administration commands.
> > --------------------------------------------------------------------
> >
> > The first 6 patches are in the virtio area and handle the below:
> > - Fix common config map for modern device as was reported by Michael Tsirkin.
> > - Introduce the admin virtqueue infrastcture.
> > - Expose the layout of the commands that should be used for
> >    supporting the legacy access.
> > - Expose APIs to enable upper layers as of vfio, net, etc
> >    to execute admin commands.
> >
> > The above follows the virtio spec that was lastly accepted in that area
> > [1].
> >
> > The last 3 patches are in the vfio area and handle the below:
> > - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> > - Introduce a vfio driver over virtio devices to support the legacy
> >    interface functionality for VFs.
> >
> > The series was tested successfully over virtio-net VFs in the host,
> > while running in the guest both modern and legacy drivers.
> >
> > [1]
> > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >
> > Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
> >
> > Virtio:
> > - Fix the common config map size issue that was reported by Michael
> >    Tsirkin.
> > - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
> >    Michael, instead skip the AQ specifically.
> > - Move admin vq implementation into virtio_pci_modern.c as was asked by
> >    Michael.
> > - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
> >    corresponding renames.
> > - Remove exported symbols virtio_pci_vf_get_pf_dev(),
> >    virtio_admin_cmd_exec() as now callers are local to the module.
> > - Handle inflight commands as part of the device reset flow.
> > - Introduce APIs per admin command in virtio-pci as was asked by Michael.
> >
> > Vfio:
> > - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
> >    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
> >    Alex.
> > - Drop the intermediate patch which prepares the commands and calls the
> >    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> > - Instead, call directly to the new APIs per admin command that are
> >    exported from Virtio - based on Michael's request.
> > - Enable only virtio-net as part of the pci_device_id table to enforce
> >    upon binding only what is supported as suggested by Alex.
> > - Add support for byte-wise access (read/write) over the device config
> >    region as was asked by Alex.
> > - Consider whether MSIX is practically enabled/disabled to choose the
> >    right opcode upon issuing read/write admin command, as mentioned
> >    by Michael.
> > - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
> >    as was suggested by Michael.
> > - Set the '.close_device' op to vfio_pci_core_close_device() as was
> >    pointed by Alex.
> > - Adapt to Vfio multi-line comment style in a few places.
> > - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
> >    to be CCed for the new driver as was suggested by Jason.
> >
> > Yishai
> >
> > Feng Liu (5):
> >    virtio-pci: Fix common config map for modern device
> >    virtio: Define feature bit for administration virtqueue
> >    virtio-pci: Introduce admin virtqueue
> >    virtio-pci: Introduce admin command sending function
> >    virtio-pci: Introduce admin commands
> >
> > Yishai Hadas (4):
> >    virtio-pci: Introduce APIs to execute legacy IO admin commands
> >    vfio/pci: Expose vfio_pci_core_setup_barmap()
> >    vfio/pci: Expose vfio_pci_iowrite/read##size()
> >    vfio/virtio: Introduce a vfio driver over virtio devices
> >
> >   MAINTAINERS                            |   7 +
> >   drivers/vfio/pci/Kconfig               |   2 +
> >   drivers/vfio/pci/Makefile              |   2 +
> >   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
> >   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
> >   drivers/vfio/pci/virtio/Kconfig        |  15 +
> >   drivers/vfio/pci/virtio/Makefile       |   4 +
> >   drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
> >   drivers/virtio/virtio.c                |  37 +-
> >   drivers/virtio/virtio_pci_common.c     |  14 +
> >   drivers/virtio/virtio_pci_common.h     |  20 +-
> >   drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
> >   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
> >   include/linux/vfio_pci_core.h          |  20 +
> >   include/linux/virtio.h                 |   8 +
> >   include/linux/virtio_config.h          |   4 +
> >   include/linux/virtio_pci_admin.h       |  18 +
> >   include/linux/virtio_pci_modern.h      |   5 +
> >   include/uapi/linux/virtio_config.h     |   8 +-
> >   include/uapi/linux/virtio_pci.h        |  66 +++
> >   20 files changed, 1295 insertions(+), 40 deletions(-)
> >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >   create mode 100644 drivers/vfio/pci/virtio/main.c
> >   create mode 100644 include/linux/virtio_pci_admin.h
> >  
> Hi Michael,
> 
> Did you have the chance to review the virtio part of that series ?
> 
> IMO, we addressed all your notes on V0, I would be happy to get your 
> feedback on V1 before sending V2.
> 
> In my TO-DO list for V2, have for now the below minor items.
> Virtio:
> Patch #6: Fix a krobot note where it needs to include the H file as part 
> of the export symbols C file.
> Vfio:
> #patch #9: Rename the 'ops' variable to drop the 'acc' and potentially 
> some rename in the description of the module with regards to 'family'.
> 
> Alex,
> Are you fine to leave the provisioning of the VF including the control 
> of its transitional capability in the device hands as was suggested by 
> Jason ?

If this is the standard we're going to follow, ie. profiling of a
device is expected to occur prior to the probe of the vfio-pci variant
driver, then we should get the out-of-tree NVIDIA vGPU driver on board
with this too.

> Any specific recommendation following the discussion in the ML, for the 
> 'family' note ?

It's not super important, it's just overly broad vs what's actually
implemented.  Limiting the description to virtio-net for the current
implementation is fine.

> Once I'll have the above feedback I may prepare and send V2.

I'll try to take a more thorough look, but also note my comments to
Ankit relative to config space emulation.  This driver correctly
implements the flags for the IO Port BAR, but does not support sizing
of the BAR through config space, which I think is a shortcoming
relative to that implemented by vfio-pci.  QEMU doesn't rely on this,
but we don't know there aren't other userspaces that depend on this
behavior.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-23 15:33     ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-23 15:33 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Sun, 22 Oct 2023 11:20:31 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 17/10/2023 16:42, Yishai Hadas wrote:
> > This series introduce a vfio driver over virtio devices to support the
> > legacy interface functionality for VFs.
> >
> > Background, from the virtio spec [1].
> > --------------------------------------------------------------------
> > In some systems, there is a need to support a virtio legacy driver with
> > a device that does not directly support the legacy interface. In such
> > scenarios, a group owner device can provide the legacy interface
> > functionality for the group member devices. The driver of the owner
> > device can then access the legacy interface of a member device on behalf
> > of the legacy member device driver.
> >
> > For example, with the SR-IOV group type, group members (VFs) can not
> > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > legacy pci driver. If the legacy driver is running inside a virtual
> > machine, the hypervisor executing the virtual machine can present a
> > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > legacy driver accesses to this I/O BAR and forwards them to the group
> > owner device (PF) using group administration commands.
> > --------------------------------------------------------------------
> >
> > The first 6 patches are in the virtio area and handle the below:
> > - Fix common config map for modern device as was reported by Michael Tsirkin.
> > - Introduce the admin virtqueue infrastcture.
> > - Expose the layout of the commands that should be used for
> >    supporting the legacy access.
> > - Expose APIs to enable upper layers as of vfio, net, etc
> >    to execute admin commands.
> >
> > The above follows the virtio spec that was lastly accepted in that area
> > [1].
> >
> > The last 3 patches are in the vfio area and handle the below:
> > - Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
> > - Introduce a vfio driver over virtio devices to support the legacy
> >    interface functionality for VFs.
> >
> > The series was tested successfully over virtio-net VFs in the host,
> > while running in the guest both modern and legacy drivers.
> >
> > [1]
> > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >
> > Changes from V0: https://www.spinics.net/lists/linux-virtualization/msg63802.html
> >
> > Virtio:
> > - Fix the common config map size issue that was reported by Michael
> >    Tsirkin.
> > - Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
> >    Michael, instead skip the AQ specifically.
> > - Move admin vq implementation into virtio_pci_modern.c as was asked by
> >    Michael.
> > - Rename structure virtio_avq to virtio_pci_admin_vq and some extra
> >    corresponding renames.
> > - Remove exported symbols virtio_pci_vf_get_pf_dev(),
> >    virtio_admin_cmd_exec() as now callers are local to the module.
> > - Handle inflight commands as part of the device reset flow.
> > - Introduce APIs per admin command in virtio-pci as was asked by Michael.
> >
> > Vfio:
> > - Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
> >    vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
> >    Alex.
> > - Drop the intermediate patch which prepares the commands and calls the
> >    generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
> > - Instead, call directly to the new APIs per admin command that are
> >    exported from Virtio - based on Michael's request.
> > - Enable only virtio-net as part of the pci_device_id table to enforce
> >    upon binding only what is supported as suggested by Alex.
> > - Add support for byte-wise access (read/write) over the device config
> >    region as was asked by Alex.
> > - Consider whether MSIX is practically enabled/disabled to choose the
> >    right opcode upon issuing read/write admin command, as mentioned
> >    by Michael.
> > - Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
> >    as was suggested by Michael.
> > - Set the '.close_device' op to vfio_pci_core_close_device() as was
> >    pointed by Alex.
> > - Adapt to Vfio multi-line comment style in a few places.
> > - Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
> >    to be CCed for the new driver as was suggested by Jason.
> >
> > Yishai
> >
> > Feng Liu (5):
> >    virtio-pci: Fix common config map for modern device
> >    virtio: Define feature bit for administration virtqueue
> >    virtio-pci: Introduce admin virtqueue
> >    virtio-pci: Introduce admin command sending function
> >    virtio-pci: Introduce admin commands
> >
> > Yishai Hadas (4):
> >    virtio-pci: Introduce APIs to execute legacy IO admin commands
> >    vfio/pci: Expose vfio_pci_core_setup_barmap()
> >    vfio/pci: Expose vfio_pci_iowrite/read##size()
> >    vfio/virtio: Introduce a vfio driver over virtio devices
> >
> >   MAINTAINERS                            |   7 +
> >   drivers/vfio/pci/Kconfig               |   2 +
> >   drivers/vfio/pci/Makefile              |   2 +
> >   drivers/vfio/pci/vfio_pci_core.c       |  25 ++
> >   drivers/vfio/pci/vfio_pci_rdwr.c       |  38 +-
> >   drivers/vfio/pci/virtio/Kconfig        |  15 +
> >   drivers/vfio/pci/virtio/Makefile       |   4 +
> >   drivers/vfio/pci/virtio/main.c         | 577 +++++++++++++++++++++++++
> >   drivers/virtio/virtio.c                |  37 +-
> >   drivers/virtio/virtio_pci_common.c     |  14 +
> >   drivers/virtio/virtio_pci_common.h     |  20 +-
> >   drivers/virtio/virtio_pci_modern.c     | 441 ++++++++++++++++++-
> >   drivers/virtio/virtio_pci_modern_dev.c |  24 +-
> >   include/linux/vfio_pci_core.h          |  20 +
> >   include/linux/virtio.h                 |   8 +
> >   include/linux/virtio_config.h          |   4 +
> >   include/linux/virtio_pci_admin.h       |  18 +
> >   include/linux/virtio_pci_modern.h      |   5 +
> >   include/uapi/linux/virtio_config.h     |   8 +-
> >   include/uapi/linux/virtio_pci.h        |  66 +++
> >   20 files changed, 1295 insertions(+), 40 deletions(-)
> >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >   create mode 100644 drivers/vfio/pci/virtio/main.c
> >   create mode 100644 include/linux/virtio_pci_admin.h
> >  
> Hi Michael,
> 
> Did you have the chance to review the virtio part of that series ?
> 
> IMO, we addressed all your notes on V0, I would be happy to get your 
> feedback on V1 before sending V2.
> 
> In my TO-DO list for V2, have for now the below minor items.
> Virtio:
> Patch #6: Fix a krobot note where it needs to include the H file as part 
> of the export symbols C file.
> Vfio:
> #patch #9: Rename the 'ops' variable to drop the 'acc' and potentially 
> some rename in the description of the module with regards to 'family'.
> 
> Alex,
> Are you fine to leave the provisioning of the VF including the control 
> of its transitional capability in the device hands as was suggested by 
> Jason ?

If this is the standard we're going to follow, ie. profiling of a
device is expected to occur prior to the probe of the vfio-pci variant
driver, then we should get the out-of-tree NVIDIA vGPU driver on board
with this too.

> Any specific recommendation following the discussion in the ML, for the 
> 'family' note ?

It's not super important, it's just overly broad vs what's actually
implemented.  Limiting the description to virtio-net for the current
implementation is fine.

> Once I'll have the above feedback I may prepare and send V2.

I'll try to take a more thorough look, but also note my comments to
Ankit relative to config space emulation.  This driver correctly
implements the flags for the IO Port BAR, but does not support sizing
of the BAR through config space, which I think is a shortcoming
relative to that implemented by vfio-pci.  QEMU doesn't rely on this,
but we don't know there aren't other userspaces that depend on this
behavior.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-23 15:33     ` Alex Williamson
  (?)
@ 2023-10-23 15:42     ` Jason Gunthorpe
  2023-10-23 16:09         ` Alex Williamson
  2023-10-25  8:34         ` Tian, Kevin
  -1 siblings, 2 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2023-10-23 15:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:

> > Alex,
> > Are you fine to leave the provisioning of the VF including the control 
> > of its transitional capability in the device hands as was suggested by 
> > Jason ?
> 
> If this is the standard we're going to follow, ie. profiling of a
> device is expected to occur prior to the probe of the vfio-pci variant
> driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> with this too.

Those GPU drivers are using mdev not vfio-pci..

mdev doesn't have a way in its uapi to configure the mdev before it is
created.

I'm hopeful that the SIOV work will develop something better because
we clearly need it for the general use cases of SIOV beyond VFIO.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-23 15:42     ` Jason Gunthorpe
@ 2023-10-23 16:09         ` Alex Williamson
  2023-10-25  8:34         ` Tian, Kevin
  1 sibling, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-23 16:09 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, mst, maorg, virtualization, jiri, leonro

On Mon, 23 Oct 2023 12:42:57 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> 
> > > Alex,
> > > Are you fine to leave the provisioning of the VF including the control 
> > > of its transitional capability in the device hands as was suggested by 
> > > Jason ?  
> > 
> > If this is the standard we're going to follow, ie. profiling of a
> > device is expected to occur prior to the probe of the vfio-pci variant
> > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > with this too.  
> 
> Those GPU drivers are using mdev not vfio-pci..

The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
was removed from upstream.  They only exist in the mdev form on
downstreams which have retained this interface for compatibility and
continuity.  I'm not aware of any other means by which the SR-IOV RID
can be used in the mdev model, therefore only the pre-SR-IOV GPUs
should continue to use the mdev interface.

> mdev doesn't have a way in its uapi to configure the mdev before it is
> created.

Of course.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-23 16:09         ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-23 16:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, mst, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Mon, 23 Oct 2023 12:42:57 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> 
> > > Alex,
> > > Are you fine to leave the provisioning of the VF including the control 
> > > of its transitional capability in the device hands as was suggested by 
> > > Jason ?  
> > 
> > If this is the standard we're going to follow, ie. profiling of a
> > device is expected to occur prior to the probe of the vfio-pci variant
> > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > with this too.  
> 
> Those GPU drivers are using mdev not vfio-pci..

The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
was removed from upstream.  They only exist in the mdev form on
downstreams which have retained this interface for compatibility and
continuity.  I'm not aware of any other means by which the SR-IOV RID
can be used in the mdev model, therefore only the pre-SR-IOV GPUs
should continue to use the mdev interface.

> mdev doesn't have a way in its uapi to configure the mdev before it is
> created.

Of course.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-23 16:09         ` Alex Williamson
  (?)
@ 2023-10-23 16:20         ` Jason Gunthorpe
  2023-10-23 16:45             ` Alex Williamson
  -1 siblings, 1 reply; 100+ messages in thread
From: Jason Gunthorpe @ 2023-10-23 16:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Mon, Oct 23, 2023 at 10:09:13AM -0600, Alex Williamson wrote:
> On Mon, 23 Oct 2023 12:42:57 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> > 
> > > > Alex,
> > > > Are you fine to leave the provisioning of the VF including the control 
> > > > of its transitional capability in the device hands as was suggested by 
> > > > Jason ?  
> > > 
> > > If this is the standard we're going to follow, ie. profiling of a
> > > device is expected to occur prior to the probe of the vfio-pci variant
> > > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > > with this too.  
> > 
> > Those GPU drivers are using mdev not vfio-pci..
> 
> The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
> was removed from upstream.  

It wasn't, but it changed forms.

mdev is a sysfs framework for managing lifecycle with GUIDs only.

The thing using mdev can call vfio_register_emulated_iommu_dev() or
vfio_register_group_dev(). 

It doesn't matter to the mdev stuff.

The thing using mdev is responsible to get the struct device to pass
to vfio_register_group_dev()

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-23 16:20         ` Jason Gunthorpe
@ 2023-10-23 16:45             ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-23 16:45 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, mst, maorg, virtualization, jiri, leonro

On Mon, 23 Oct 2023 13:20:43 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 23, 2023 at 10:09:13AM -0600, Alex Williamson wrote:
> > On Mon, 23 Oct 2023 12:42:57 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> > >   
> > > > > Alex,
> > > > > Are you fine to leave the provisioning of the VF including the control 
> > > > > of its transitional capability in the device hands as was suggested by 
> > > > > Jason ?    
> > > > 
> > > > If this is the standard we're going to follow, ie. profiling of a
> > > > device is expected to occur prior to the probe of the vfio-pci variant
> > > > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > > > with this too.    
> > > 
> > > Those GPU drivers are using mdev not vfio-pci..  
> > 
> > The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
> > was removed from upstream.    
> 
> It wasn't, but it changed forms.
> 
> mdev is a sysfs framework for managing lifecycle with GUIDs only.
> 
> The thing using mdev can call vfio_register_emulated_iommu_dev() or
> vfio_register_group_dev(). 
> 
> It doesn't matter to the mdev stuff.
> 
> The thing using mdev is responsible to get the struct device to pass
> to vfio_register_group_dev()

Are we describing what can be done (possibly limited to out-of-tree
drivers) or what should be done and would be accepted upstream?

I'm under the impression that mdev has been redefined to be more
narrowly focused for emulated IOMMU devices and that devices based
around a PCI VF should be making use of a vfio-pci variant driver.

Are you suggesting it's the vendor's choice based on whether they want
the mdev lifecycle support?

We've defined certain aspects of the vfio-mdev interface as only
available for emulated IOMMU devices, ex. page pinning.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-23 16:45             ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-23 16:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, mst, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Mon, 23 Oct 2023 13:20:43 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 23, 2023 at 10:09:13AM -0600, Alex Williamson wrote:
> > On Mon, 23 Oct 2023 12:42:57 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> > >   
> > > > > Alex,
> > > > > Are you fine to leave the provisioning of the VF including the control 
> > > > > of its transitional capability in the device hands as was suggested by 
> > > > > Jason ?    
> > > > 
> > > > If this is the standard we're going to follow, ie. profiling of a
> > > > device is expected to occur prior to the probe of the vfio-pci variant
> > > > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > > > with this too.    
> > > 
> > > Those GPU drivers are using mdev not vfio-pci..  
> > 
> > The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
> > was removed from upstream.    
> 
> It wasn't, but it changed forms.
> 
> mdev is a sysfs framework for managing lifecycle with GUIDs only.
> 
> The thing using mdev can call vfio_register_emulated_iommu_dev() or
> vfio_register_group_dev(). 
> 
> It doesn't matter to the mdev stuff.
> 
> The thing using mdev is responsible to get the struct device to pass
> to vfio_register_group_dev()

Are we describing what can be done (possibly limited to out-of-tree
drivers) or what should be done and would be accepted upstream?

I'm under the impression that mdev has been redefined to be more
narrowly focused for emulated IOMMU devices and that devices based
around a PCI VF should be making use of a vfio-pci variant driver.

Are you suggesting it's the vendor's choice based on whether they want
the mdev lifecycle support?

We've defined certain aspects of the vfio-mdev interface as only
available for emulated IOMMU devices, ex. page pinning.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-23 16:45             ` Alex Williamson
  (?)
@ 2023-10-23 17:27             ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2023-10-23 17:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, mst, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg, jasowang

On Mon, Oct 23, 2023 at 10:45:48AM -0600, Alex Williamson wrote:
> On Mon, 23 Oct 2023 13:20:43 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Mon, Oct 23, 2023 at 10:09:13AM -0600, Alex Williamson wrote:
> > > On Mon, 23 Oct 2023 12:42:57 -0300
> > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >   
> > > > On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> > > >   
> > > > > > Alex,
> > > > > > Are you fine to leave the provisioning of the VF including the control 
> > > > > > of its transitional capability in the device hands as was suggested by 
> > > > > > Jason ?    
> > > > > 
> > > > > If this is the standard we're going to follow, ie. profiling of a
> > > > > device is expected to occur prior to the probe of the vfio-pci variant
> > > > > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > > > > with this too.    
> > > > 
> > > > Those GPU drivers are using mdev not vfio-pci..  
> > > 
> > > The SR-IOV mdev vGPUs rely on the IOMMU backing device support which
> > > was removed from upstream.    
> > 
> > It wasn't, but it changed forms.
> > 
> > mdev is a sysfs framework for managing lifecycle with GUIDs only.
> > 
> > The thing using mdev can call vfio_register_emulated_iommu_dev() or
> > vfio_register_group_dev(). 
> > 
> > It doesn't matter to the mdev stuff.
> > 
> > The thing using mdev is responsible to get the struct device to pass
> > to vfio_register_group_dev()
> 
> Are we describing what can be done (possibly limited to out-of-tree
> drivers) or what should be done and would be accepted upstream?

Beyond disliking mdev, I'm not really set on how we should try to
setup an extensively mediated PCI SRIOV driver. There is quite a lot
of similarity to SIOV, so it may be the right answer is to put SIOV
and this special mediated SRIOV case on the same, new, infrastructure.

SIOV can't use variant vfio PCI drivers.

mdev guid lifecycle is really ugly and quite limited anyhow.

So I've been thinking we need something else.

> I'm under the impression that mdev has been redefined to be more
> narrowly focused for emulated IOMMU devices and that devices based
> around a PCI VF should be making use of a vfio-pci variant driver.

I've been viewing mdev as legacy, just let it die off with the S390
drivers and Intel GPU as the only users, ever.

When we solve the SIOV issue we should come with something that can
absorb what S390/GPU need too.

At the end of the day we need an API to create /dev/vfioXX on demand,
to configure them before creating them, and the destroy them. It
doesn't matter at all how the driver that owns vfioXX operates, it
will call the right iommufd APIs for RID/PASID/access/etc to do
whatever its thing is.

It would be wonderful if we could get to the point where the new
interface can also create/destroy SRIOV vfios directly too.

> Are you suggesting it's the vendor's choice based on whether they want
> the mdev lifecycle support?

So, in tree I would like to discourage new mdev drivers. Out of tree,
I don't care, the APIs exist if people want to build things with them
then they get the usual out of tree cavet.

> We've defined certain aspects of the vfio-mdev interface as only
> available for emulated IOMMU devices, ex. page pinning.  Thanks,

Did we?

iommufd made it up to the driver to decide what to do, and a driver
can certainly create a concurrent iommufd_access and iommufd_device if
it wants.

AFAICT the container stuff doesn't check, drivers can do both
concurrently?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-17 13:42   ` Yishai Hadas via Virtualization
@ 2023-10-24 19:57     ` Alex Williamson
  -1 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-24 19:57 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   7 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
>  6 files changed, 607 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7a7bd8bd80e9..680a70063775 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +L:	virtualization@lists.linux-foundation.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.

This description is a little bit subtle to the hard requirements on the
device.  Reading this, one might think that this should work for any
SR-IOV VF virtio device, when in reality it only support virtio-net
currently and places a number of additional requirements on the device
(ex. legacy access and MSI-X support).

> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..2039b39fb723
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o
> +
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..3fef4b21f7e6
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,577 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_admin.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;

Push the above u8 to the end of the structure for better packing.

> +	u16 pci_cmd;
> +	u16 msix_ctrl;
> +};
> +
> +static int
> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
> +			     loff_t pos, char __user *buf,
> +			     size_t count, bool read)
> +{
> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
> +						      bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}

TBH, I think the symmetry of read vs write would be more apparent if
this were an else branch.

> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
> +					       bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}

Same.

> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
> +						   read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_intersect_range(loff_t range1_start, size_t count1,
> +				  loff_t range2_start, size_t count2,
> +				  loff_t *start_offset,
> +				  size_t *intersect_count,
> +				  size_t *register_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 > range2_start) {
> +		*start_offset = range2_start - range1_start;
> +		*intersect_count = min_t(size_t, count2,
> +					 range1_start + count1 - range2_start);
> +		if (register_offset)
> +			*register_offset = 0;
> +		return true;
> +	}
> +
> +	if (range1_start > range2_start &&
> +	    range1_start < range2_start + count2) {
> +		*start_offset = range1_start;
> +		*intersect_count = min_t(size_t, count1,
> +					 range2_start + count2 - range1_start);
> +		if (register_offset)
> +			*register_offset = range1_start - range2_start;
> +		return true;
> +	}

Seems like we're missing a case, and some documentation.

The first test requires range1 to fully enclose range2 and provides the
offset of range2 within range1 and the length of the intersection.

The second test requires range1 to start from a non-zero offset within
range2 and returns the absolute offset of range1 and the length of the
intersection.

The register offset is then non-zero offset of range1 into range2.  So
does the caller use the zero value in the previous test to know range2
exists within range1?

We miss the cases where range1_start is <= range2_start and range1
terminates within range2.  I suppose we'll see below how this is used,
but it seems asymmetric and incomplete.

> +
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	size_t register_offset;
> +	loff_t copy_offset;
> +	size_t copy_count;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, NULL)) {

If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
within range2) they'll not enter this branch and see 41 rather than 00.

If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
contained within range 2), the above function returns a copy_offset of
range1_start (ie. 3).  But that offset is applied to the buffer, which
is out of bounds.  The function needs to have returned an offset of 1
and it should have applied to the val16 address.

I don't think this works like it's intended.


> +		val16 = cpu_to_le16(0x1000);

Please #define this somewhere rather than hiding a magic value here.

> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
> +				   copy_count))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				  &copy_offset, &copy_count, NULL)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				  &copy_offset, &copy_count, NULL)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);

I'd still like to see the remainder of the BAR follow the semantics
vfio-pci does.  I think this requires a __le32 bar0 field on the
virtvdev struct to store writes and the read here would mask the lower
bits up to the BAR size and OR in the IO indicator bit.


> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, NULL)) {
> +		/*
> +		 * Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.

Where did we require the subsystem vendor ID to be 0x1af4?  This
subsystem device ID really only makes since given that subsystem
vendor ID, right?  Otherwise I don't see that non-transitional devices,
such as the VF, have a hard requirement per the spec for the subsystem
vendor ID.

Do we want to make this only probe the correct subsystem vendor ID or do
we want to emulate the subsystem vendor ID as well?  I don't see this is
correct without one of those options.

> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		size_t register_offset;
> +		loff_t copy_offset;
> +		size_t copy_count;
> +
> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +
> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
> +					  sizeof(virtvdev->msix_ctrl),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}

MSI-X is setup via ioctl, so you're relying on a userspace that writes
through the control register bit even though it doesn't do anything.
Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
enabled?

> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/*
> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/*
> +		 * Upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	/* Network card */
> +	return offsetofend(struct virtio_net_config, status);
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	/* Being ready with a buffer that supports MSIX */
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> +				virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* Confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)


All but the last test here are fairly evident requirements of the
driver.  Why do we require a device that supports MSI-X?

Thanks,
Alex


> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	/* Only virtio-net is supported/tested so far */
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-24 19:57     ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-24 19:57 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> Introduce a vfio driver over virtio devices to support the legacy
> interface functionality for VFs.
> 
> Background, from the virtio spec [1].
> --------------------------------------------------------------------
> In some systems, there is a need to support a virtio legacy driver with
> a device that does not directly support the legacy interface. In such
> scenarios, a group owner device can provide the legacy interface
> functionality for the group member devices. The driver of the owner
> device can then access the legacy interface of a member device on behalf
> of the legacy member device driver.
> 
> For example, with the SR-IOV group type, group members (VFs) can not
> present the legacy interface in an I/O BAR in BAR0 as expected by the
> legacy pci driver. If the legacy driver is running inside a virtual
> machine, the hypervisor executing the virtual machine can present a
> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> legacy driver accesses to this I/O BAR and forwards them to the group
> owner device (PF) using group administration commands.
> --------------------------------------------------------------------
> 
> Specifically, this driver adds support for a virtio-net VF to be exposed
> as a transitional device to a guest driver and allows the legacy IO BAR
> functionality on top.
> 
> This allows a VM which uses a legacy virtio-net driver in the guest to
> work transparently over a VF which its driver in the host is that new
> driver.
> 
> The driver can be extended easily to support some other types of virtio
> devices (e.g virtio-blk), by adding in a few places the specific type
> properties as was done for virtio-net.
> 
> For now, only the virtio-net use case was tested and as such we introduce
> the support only for such a device.
> 
> Practically,
> Upon probing a VF for a virtio-net device, in case its PF supports
> legacy access over the virtio admin commands and the VF doesn't have BAR
> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> transitional device with I/O BAR in BAR 0.
> 
> The existence of the simulated I/O bar is reported later on by
> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> exposes itself as a transitional device by overwriting some properties
> upon reading its config space.
> 
> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> guest may use it via read/write calls according to the virtio
> specification.
> 
> Any read/write towards the control parts of the BAR will be captured by
> the new driver and will be translated into admin commands towards the
> device.
> 
> Any data path read/write access (i.e. virtio driver notifications) will
> be forwarded to the physical BAR which its properties were supplied by
> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> probing/init flow.
> 
> With that code in place a legacy driver in the guest has the look and
> feel as if having a transitional device with legacy support for both its
> control and data path flows.
> 
> [1]
> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  MAINTAINERS                      |   7 +
>  drivers/vfio/pci/Kconfig         |   2 +
>  drivers/vfio/pci/Makefile        |   2 +
>  drivers/vfio/pci/virtio/Kconfig  |  15 +
>  drivers/vfio/pci/virtio/Makefile |   4 +
>  drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
>  6 files changed, 607 insertions(+)
>  create mode 100644 drivers/vfio/pci/virtio/Kconfig
>  create mode 100644 drivers/vfio/pci/virtio/Makefile
>  create mode 100644 drivers/vfio/pci/virtio/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7a7bd8bd80e9..680a70063775 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO VIRTIO PCI DRIVER
> +M:	Yishai Hadas <yishaih@nvidia.com>
> +L:	kvm@vger.kernel.org
> +L:	virtualization@lists.linux-foundation.org
> +S:	Maintained
> +F:	drivers/vfio/pci/virtio
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..18c397df566d 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/virtio/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..046139a4eca5 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> new file mode 100644
> index 000000000000..89eddce8b1bd
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VIRTIO_VFIO_PCI
> +        tristate "VFIO support for VIRTIO PCI devices"
> +        depends on VIRTIO_PCI
> +        select VFIO_PCI_CORE
> +        help
> +          This provides support for exposing VIRTIO VF devices using the VFIO
> +          framework that can work with a legacy virtio driver in the guest.
> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> +          not indicate I/O Space.
> +          As of that this driver emulated I/O BAR in software to let a VF be
> +          seen as a transitional device in the guest and let it work with
> +          a legacy driver.

This description is a little bit subtle to the hard requirements on the
device.  Reading this, one might think that this should work for any
SR-IOV VF virtio device, when in reality it only support virtio-net
currently and places a number of additional requirements on the device
(ex. legacy access and MSI-X support).

> +
> +          If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> new file mode 100644
> index 000000000000..2039b39fb723
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> +virtio-vfio-pci-y := main.o
> +
> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> new file mode 100644
> index 000000000000..3fef4b21f7e6
> --- /dev/null
> +++ b/drivers/vfio/pci/virtio/main.c
> @@ -0,0 +1,577 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/virtio_pci.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_pci_admin.h>
> +
> +struct virtiovf_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	u8 bar0_virtual_buf_size;
> +	u8 *bar0_virtual_buf;
> +	/* synchronize access to the virtual buf */
> +	struct mutex bar_mutex;
> +	void __iomem *notify_addr;
> +	u32 notify_offset;
> +	u8 notify_bar;

Push the above u8 to the end of the structure for better packing.

> +	u16 pci_cmd;
> +	u16 msix_ctrl;
> +};
> +
> +static int
> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
> +			     loff_t pos, char __user *buf,
> +			     size_t count, bool read)
> +{
> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> +	u16 opcode;
> +	int ret;
> +
> +	mutex_lock(&virtvdev->bar_mutex);
> +	if (read) {
> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
> +						      bar0_buf + pos);
> +		if (ret)
> +			goto out;
> +		if (copy_to_user(buf, bar0_buf + pos, count))
> +			ret = -EFAULT;
> +		goto out;
> +	}

TBH, I think the symmetry of read vs write would be more apparent if
this were an else branch.

> +
> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
> +					       bar0_buf + pos);
> +out:
> +	mutex_unlock(&virtvdev->bar_mutex);
> +	return ret;
> +}
> +
> +static int
> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> +			    loff_t pos, char __user *buf,
> +			    size_t count, bool read)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	u16 queue_notify;
> +	int ret;
> +
> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> +		return -EINVAL;
> +
> +	switch (pos) {
> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> +		if (count != sizeof(queue_notify))
> +			return -EINVAL;
> +		if (read) {
> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> +						virtvdev->notify_addr);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &queue_notify,
> +					 sizeof(queue_notify)))
> +				return -EFAULT;
> +			break;
> +		}

Same.

> +
> +		if (copy_from_user(&queue_notify, buf, count))
> +			return -EFAULT;
> +
> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> +					 virtvdev->notify_addr);
> +		break;
> +	default:
> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
> +						   read);
> +	}
> +
> +	return ret ? ret : count;
> +}
> +
> +static bool range_intersect_range(loff_t range1_start, size_t count1,
> +				  loff_t range2_start, size_t count2,
> +				  loff_t *start_offset,
> +				  size_t *intersect_count,
> +				  size_t *register_offset)
> +{
> +	if (range1_start <= range2_start &&
> +	    range1_start + count1 > range2_start) {
> +		*start_offset = range2_start - range1_start;
> +		*intersect_count = min_t(size_t, count2,
> +					 range1_start + count1 - range2_start);
> +		if (register_offset)
> +			*register_offset = 0;
> +		return true;
> +	}
> +
> +	if (range1_start > range2_start &&
> +	    range1_start < range2_start + count2) {
> +		*start_offset = range1_start;
> +		*intersect_count = min_t(size_t, count1,
> +					 range2_start + count2 - range1_start);
> +		if (register_offset)
> +			*register_offset = range1_start - range2_start;
> +		return true;
> +	}

Seems like we're missing a case, and some documentation.

The first test requires range1 to fully enclose range2 and provides the
offset of range2 within range1 and the length of the intersection.

The second test requires range1 to start from a non-zero offset within
range2 and returns the absolute offset of range1 and the length of the
intersection.

The register offset is then non-zero offset of range1 into range2.  So
does the caller use the zero value in the previous test to know range2
exists within range1?

We miss the cases where range1_start is <= range2_start and range1
terminates within range2.  I suppose we'll see below how this is used,
but it seems asymmetric and incomplete.

> +
> +	return false;
> +}
> +
> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> +					char __user *buf, size_t count,
> +					loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	size_t register_offset;
> +	loff_t copy_offset;
> +	size_t copy_count;
> +	__le32 val32;
> +	__le16 val16;
> +	u8 val8;
> +	int ret;
> +
> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, NULL)) {

If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
within range2) they'll not enter this branch and see 41 rather than 00.

If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
contained within range 2), the above function returns a copy_offset of
range1_start (ie. 3).  But that offset is applied to the buffer, which
is out of bounds.  The function needs to have returned an offset of 1
and it should have applied to the val16 address.

I don't think this works like it's intended.


> +		val16 = cpu_to_le16(0x1000);

Please #define this somewhere rather than hiding a magic value here.

> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> +				  &copy_offset, &copy_count, &register_offset)) {
> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
> +				   copy_count))
> +			return -EFAULT;
> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> +				 copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> +				  &copy_offset, &copy_count, NULL)) {
> +		/* Transional needs to have revision 0 */
> +		val8 = 0;
> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> +				  &copy_offset, &copy_count, NULL)) {
> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);

I'd still like to see the remainder of the BAR follow the semantics
vfio-pci does.  I think this requires a __le32 bar0 field on the
virtvdev struct to store writes and the read here would mask the lower
bits up to the BAR size and OR in the IO indicator bit.


> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> +				  &copy_offset, &copy_count, NULL)) {
> +		/*
> +		 * Transitional devices use the PCI subsystem device id as
> +		 * virtio device id, same as legacy driver always did.

Where did we require the subsystem vendor ID to be 0x1af4?  This
subsystem device ID really only makes since given that subsystem
vendor ID, right?  Otherwise I don't see that non-transitional devices,
such as the VF, have a hard requirement per the spec for the subsystem
vendor ID.

Do we want to make this only probe the correct subsystem vendor ID or do
we want to emulate the subsystem vendor ID as well?  I don't see this is
correct without one of those options.

> +		 */
> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> +			return -EFAULT;
> +	}
> +
> +	return count;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> +		       size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> +				     ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static ssize_t
> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> +			size_t count, loff_t *ppos)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +	int ret;
> +
> +	if (!count)
> +		return 0;
> +
> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> +		size_t register_offset;
> +		loff_t copy_offset;
> +		size_t copy_count;
> +
> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}
> +
> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
> +					  sizeof(virtvdev->msix_ctrl),
> +					  &copy_offset, &copy_count,
> +					  &register_offset)) {
> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
> +					   buf + copy_offset,
> +					   copy_count))
> +				return -EFAULT;
> +		}

MSI-X is setup via ioctl, so you're relying on a userspace that writes
through the control register bit even though it doesn't do anything.
Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
enabled?

> +	}
> +
> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret) {
> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> +		return -EIO;
> +	}
> +
> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> +	pm_runtime_put(&pdev->dev);
> +	return ret;
> +}
> +
> +static int
> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> +	void __user *uarg = (void __user *)arg;
> +	struct vfio_region_info info = {};
> +
> +	if (copy_from_user(&info, uarg, minsz))
> +		return -EFAULT;
> +
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	switch (info.index) {
> +	case VFIO_PCI_BAR0_REGION_INDEX:
> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +		info.size = virtvdev->bar0_virtual_buf_size;
> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> +			     VFIO_REGION_INFO_FLAG_WRITE;
> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static long
> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	switch (cmd) {
> +	case VFIO_DEVICE_GET_REGION_INFO:
> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> +	default:
> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +	}
> +}
> +
> +static int
> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> +	int ret;
> +
> +	/*
> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
> +	 * This will let us mmap it only once and use it when needed.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(core_device,
> +					 virtvdev->notify_bar);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> +			virtvdev->notify_offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	if (virtvdev->bar0_virtual_buf) {
> +		/*
> +		 * Upon close_device() the vfio_pci_core_disable() is called
> +		 * and will close all the previous mmaps, so it seems that the
> +		 * valid life cycle for the 'notify' addr is per open/close.
> +		 */
> +		ret = virtiovf_set_notify_addr(virtvdev);
> +		if (ret) {
> +			vfio_pci_core_disable(vdev);
> +			return ret;
> +		}
> +	}
> +
> +	vfio_pci_core_finish_enable(vdev);
> +	return 0;
> +}
> +
> +static int virtiovf_get_device_config_size(unsigned short device)
> +{
> +	/* Network card */
> +	return offsetofend(struct virtio_net_config, status);
> +}
> +
> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> +{
> +	u64 offset;
> +	int ret;
> +	u8 bar;
> +
> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> +				&bar, &offset);
> +	if (ret)
> +		return ret;
> +
> +	virtvdev->notify_bar = bar;
> +	virtvdev->notify_offset = offset;
> +	return 0;
> +}
> +
> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	ret = vfio_pci_core_init_dev(core_vdev);
> +	if (ret)
> +		return ret;
> +
> +	pdev = virtvdev->core_device.pdev;
> +	ret = virtiovf_read_notify_info(virtvdev);
> +	if (ret)
> +		return ret;
> +
> +	/* Being ready with a buffer that supports MSIX */
> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> +				virtiovf_get_device_config_size(pdev->device);
> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> +					     GFP_KERNEL);
> +	if (!virtvdev->bar0_virtual_buf)
> +		return -ENOMEM;
> +	mutex_init(&virtvdev->bar_mutex);
> +	return 0;
> +}
> +
> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> +
> +	kfree(virtvdev->bar0_virtual_buf);
> +	vfio_pci_core_release_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> +	.name = "virtio-transitional-vfio-pci",
> +	.init = virtiovf_pci_init_device,
> +	.release = virtiovf_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> +	.read = virtiovf_pci_core_read,
> +	.write = virtiovf_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> +	.name = "virtio-acc-vfio-pci",
> +	.init = vfio_pci_core_init_dev,
> +	.release = vfio_pci_core_release_dev,
> +	.open_device = virtiovf_pci_open_device,
> +	.close_device = vfio_pci_core_close_device,
> +	.ioctl = vfio_pci_core_ioctl,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read = vfio_pci_core_read,
> +	.write = vfio_pci_core_write,
> +	.mmap = vfio_pci_core_mmap,
> +	.request = vfio_pci_core_request,
> +	.match = vfio_pci_core_match,
> +	.bind_iommufd = vfio_iommufd_physical_bind,
> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> +};
> +
> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> +{
> +	struct resource *res = pdev->resource;
> +
> +	return res->flags ? true : false;
> +}
> +
> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> +
> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> +{
> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> +	u8 *buf;
> +	int ret;
> +
> +	buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!buf)
> +		return false;
> +
> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
> +	if (ret)
> +		goto end;
> +
> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> +		ret = -EOPNOTSUPP;
> +		goto end;
> +	}
> +
> +	/* Confirm the used commands */
> +	memset(buf, 0, buf_size);
> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
> +end:
> +	kfree(buf);
> +	return ret ? false : true;
> +}
> +
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *id)
> +{
> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> +	struct virtiovf_pci_core_device *virtvdev;
> +	int ret;
> +
> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)


All but the last test here are fairly evident requirements of the
driver.  Why do we require a device that supports MSI-X?

Thanks,
Alex


> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> +
> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +				     &pdev->dev, ops);
> +	if (IS_ERR(virtvdev))
> +		return PTR_ERR(virtvdev);
> +
> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> +	if (ret)
> +		goto out;
> +	return 0;
> +out:
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +	return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> +	vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> +	/* Only virtio-net is supported/tested so far */
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> +	.name = KBUILD_MODNAME,
> +	.id_table = virtiovf_pci_table,
> +	.probe = virtiovf_pci_probe,
> +	.remove = virtiovf_pci_remove,
> +	.err_handler = &vfio_pci_core_err_handlers,
> +	.driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> +MODULE_DESCRIPTION(
> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-17 13:42   ` Yishai Hadas via Virtualization
@ 2023-10-24 21:01     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-24 21:01 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> Introduce APIs to execute legacy IO admin commands.
> 
> It includes: list_query/use, io_legacy_read/write,
> io_legacy_notify_info.
> 
> Those APIs will be used by the next patches from this series.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c |  11 ++
>  drivers/virtio/virtio_pci_common.h |   2 +
>  drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>  include/linux/virtio_pci_admin.h   |  18 +++
>  4 files changed, 237 insertions(+)
>  create mode 100644 include/linux/virtio_pci_admin.h
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 6b4766d5abe6..212d68401d2c 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>  	.sriov_configure = virtio_pci_sriov_configure,
>  };
>  
> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
> +{
> +	struct virtio_pci_device *pf_vp_dev;
> +
> +	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
> +	if (IS_ERR(pf_vp_dev))
> +		return NULL;
> +
> +	return &pf_vp_dev->vdev;
> +}
> +
>  module_pci_driver(virtio_pci_driver);
>  
>  MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index a21b9ba01a60..2785e61ed668 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>  int virtio_pci_modern_probe(struct virtio_pci_device *);
>  void virtio_pci_modern_remove(struct virtio_pci_device *);
>  
> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> +
>  #endif
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index cc159a8e6c70..00b65e20b2f5 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>  	vp_dev->del_vq(&vp_dev->admin_vq.info);
>  }
>  
> +/*
> + * virtio_pci_admin_list_query - Provides to driver list of commands
> + * supported for the PCI VF.
> + * @dev: VF pci_dev
> + * @buf: buffer to hold the returned list
> + * @buf_size: size of the given buffer
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist result_sg;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	sg_init_one(&result_sg, buf, buf_size);
> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.result_sg = &result_sg;
> +
> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
> +
> +/*
> + * virtio_pci_admin_list_use - Provides to device list of commands
> + * used for the PCI VF.
> + * @dev: VF pci_dev
> + * @buf: buffer which holds the list
> + * @buf_size: size of the given buffer
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist data_sg;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	sg_init_one(&data_sg, buf, buf_size);
> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.data_sg = &data_sg;
> +
> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);

list commands are actually for a group, not for the VF.

> +
> +/*
> + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
> + * @dev: VF pci_dev
> + * @opcode: op code of the io write command

opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?

So please just add 2 APIs for this so users don't need to care.
Could be wrappers around these two things.




> + * @offset: starting byte offset within the registers to write to
> + * @size: size of the data to write
> + * @buf: buffer which holds the data
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> +				     u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd_legacy_wr_data *data;
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist data_sg;
> +	int vf_id;
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	vf_id = pci_iov_vf_id(pdev);
> +	if (vf_id < 0)
> +		return vf_id;
> +
> +	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->offset = offset;
> +	memcpy(data->registers, buf, size);
> +	sg_init_one(&data_sg, data, sizeof(*data) + size);
> +	cmd.opcode = cpu_to_le16(opcode);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
> +	cmd.data_sg = &data_sg;
> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(data);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
> +
> +/*
> + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
> + * @dev: VF pci_dev
> + * @opcode: op code of the io read command
> + * @offset: starting byte offset within the registers to read from
> + * @size: size of the data to be read
> + * @buf: buffer to hold the returned data
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> +				    u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd_legacy_rd_data *data;
> +	struct scatterlist data_sg, result_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int vf_id;
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	vf_id = pci_iov_vf_id(pdev);
> +	if (vf_id < 0)
> +		return vf_id;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->offset = offset;
> +	sg_init_one(&data_sg, data, sizeof(*data));
> +	sg_init_one(&result_sg, buf, size);
> +	cmd.opcode = cpu_to_le16(opcode);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
> +	cmd.data_sg = &data_sg;
> +	cmd.result_sg = &result_sg;
> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(data);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
> +
> +/*
> + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
> + * information for legacy interface
> + * @dev: VF pci_dev
> + * @req_bar_flags: requested bar flags
> + * @bar: on output the BAR number of the member device
> + * @bar_offset: on output the offset within bar
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> +					   u8 req_bar_flags, u8 *bar,
> +					   u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd_notify_info_result *result;
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist result_sg;
> +	int vf_id;
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	vf_id = pci_iov_vf_id(pdev);
> +	if (vf_id < 0)
> +		return vf_id;
> +
> +	result = kzalloc(sizeof(*result), GFP_KERNEL);
> +	if (!result)
> +		return -ENOMEM;
> +
> +	sg_init_one(&result_sg, result, sizeof(*result));
> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
> +	cmd.result_sg = &result_sg;
> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &result->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(result);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
> +
>  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  	.get		= NULL,
>  	.set		= NULL,
> diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
> new file mode 100644
> index 000000000000..cb916a4bc1b1
> --- /dev/null
> +++ b/include/linux/virtio_pci_admin.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
> +#define _LINUX_VIRTIO_PCI_ADMIN_H
> +
> +#include <linux/types.h>
> +#include <linux/pci.h>
> +
> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> +				     u8 offset, u8 size, u8 *buf);
> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> +				    u8 offset, u8 size, u8 *buf);
> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> +					   u8 req_bar_flags, u8 *bar,
> +					   u64 *bar_offset);
> +
> +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-24 21:01     ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-24 21:01 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> Introduce APIs to execute legacy IO admin commands.
> 
> It includes: list_query/use, io_legacy_read/write,
> io_legacy_notify_info.
> 
> Those APIs will be used by the next patches from this series.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c |  11 ++
>  drivers/virtio/virtio_pci_common.h |   2 +
>  drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>  include/linux/virtio_pci_admin.h   |  18 +++
>  4 files changed, 237 insertions(+)
>  create mode 100644 include/linux/virtio_pci_admin.h
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 6b4766d5abe6..212d68401d2c 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>  	.sriov_configure = virtio_pci_sriov_configure,
>  };
>  
> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
> +{
> +	struct virtio_pci_device *pf_vp_dev;
> +
> +	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
> +	if (IS_ERR(pf_vp_dev))
> +		return NULL;
> +
> +	return &pf_vp_dev->vdev;
> +}
> +
>  module_pci_driver(virtio_pci_driver);
>  
>  MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> index a21b9ba01a60..2785e61ed668 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>  int virtio_pci_modern_probe(struct virtio_pci_device *);
>  void virtio_pci_modern_remove(struct virtio_pci_device *);
>  
> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> +
>  #endif
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index cc159a8e6c70..00b65e20b2f5 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>  	vp_dev->del_vq(&vp_dev->admin_vq.info);
>  }
>  
> +/*
> + * virtio_pci_admin_list_query - Provides to driver list of commands
> + * supported for the PCI VF.
> + * @dev: VF pci_dev
> + * @buf: buffer to hold the returned list
> + * @buf_size: size of the given buffer
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist result_sg;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	sg_init_one(&result_sg, buf, buf_size);
> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.result_sg = &result_sg;
> +
> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
> +
> +/*
> + * virtio_pci_admin_list_use - Provides to device list of commands
> + * used for the PCI VF.
> + * @dev: VF pci_dev
> + * @buf: buffer which holds the list
> + * @buf_size: size of the given buffer
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist data_sg;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	sg_init_one(&data_sg, buf, buf_size);
> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.data_sg = &data_sg;
> +
> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);

list commands are actually for a group, not for the VF.

> +
> +/*
> + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
> + * @dev: VF pci_dev
> + * @opcode: op code of the io write command

opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?

So please just add 2 APIs for this so users don't need to care.
Could be wrappers around these two things.




> + * @offset: starting byte offset within the registers to write to
> + * @size: size of the data to write
> + * @buf: buffer which holds the data
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> +				     u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd_legacy_wr_data *data;
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist data_sg;
> +	int vf_id;
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	vf_id = pci_iov_vf_id(pdev);
> +	if (vf_id < 0)
> +		return vf_id;
> +
> +	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->offset = offset;
> +	memcpy(data->registers, buf, size);
> +	sg_init_one(&data_sg, data, sizeof(*data) + size);
> +	cmd.opcode = cpu_to_le16(opcode);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
> +	cmd.data_sg = &data_sg;
> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(data);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
> +
> +/*
> + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
> + * @dev: VF pci_dev
> + * @opcode: op code of the io read command
> + * @offset: starting byte offset within the registers to read from
> + * @size: size of the data to be read
> + * @buf: buffer to hold the returned data
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> +				    u8 offset, u8 size, u8 *buf)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd_legacy_rd_data *data;
> +	struct scatterlist data_sg, result_sg;
> +	struct virtio_admin_cmd cmd = {};
> +	int vf_id;
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	vf_id = pci_iov_vf_id(pdev);
> +	if (vf_id < 0)
> +		return vf_id;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->offset = offset;
> +	sg_init_one(&data_sg, data, sizeof(*data));
> +	sg_init_one(&result_sg, buf, size);
> +	cmd.opcode = cpu_to_le16(opcode);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
> +	cmd.data_sg = &data_sg;
> +	cmd.result_sg = &result_sg;
> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +
> +	kfree(data);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
> +
> +/*
> + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
> + * information for legacy interface
> + * @dev: VF pci_dev
> + * @req_bar_flags: requested bar flags
> + * @bar: on output the BAR number of the member device
> + * @bar_offset: on output the offset within bar
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> +					   u8 req_bar_flags, u8 *bar,
> +					   u64 *bar_offset)
> +{
> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> +	struct virtio_admin_cmd_notify_info_result *result;
> +	struct virtio_admin_cmd cmd = {};
> +	struct scatterlist result_sg;
> +	int vf_id;
> +	int ret;
> +
> +	if (!virtio_dev)
> +		return -ENODEV;
> +
> +	vf_id = pci_iov_vf_id(pdev);
> +	if (vf_id < 0)
> +		return vf_id;
> +
> +	result = kzalloc(sizeof(*result), GFP_KERNEL);
> +	if (!result)
> +		return -ENOMEM;
> +
> +	sg_init_one(&result_sg, result, sizeof(*result));
> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
> +	cmd.result_sg = &result_sg;
> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> +	if (!ret) {
> +		struct virtio_admin_cmd_notify_info_data *entry;
> +		int i;
> +
> +		ret = -ENOENT;
> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> +			entry = &result->entries[i];
> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> +				break;
> +			if (entry->flags != req_bar_flags)
> +				continue;
> +			*bar = entry->bar;
> +			*bar_offset = le64_to_cpu(entry->offset);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	kfree(result);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
> +
>  static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>  	.get		= NULL,
>  	.set		= NULL,
> diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
> new file mode 100644
> index 000000000000..cb916a4bc1b1
> --- /dev/null
> +++ b/include/linux/virtio_pci_admin.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
> +#define _LINUX_VIRTIO_PCI_ADMIN_H
> +
> +#include <linux/types.h>
> +#include <linux/pci.h>
> +
> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> +				     u8 offset, u8 size, u8 *buf);
> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> +				    u8 offset, u8 size, u8 *buf);
> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> +					   u8 req_bar_flags, u8 *bar,
> +					   u64 *bar_offset);
> +
> +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
> -- 
> 2.27.0


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
  2023-10-23 15:42     ` Jason Gunthorpe
@ 2023-10-25  8:34         ` Tian, Kevin
  2023-10-25  8:34         ` Tian, Kevin
  1 sibling, 0 replies; 100+ messages in thread
From: Tian, Kevin @ 2023-10-25  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Yishai Hadas, mst, kvm, virtualization, parav, feliu, jiri,
	Martins, Joao, si-wei.liu, leonro, maorg, jasowang

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, October 23, 2023 11:43 PM
> 
> On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> 
> > > Alex,
> > > Are you fine to leave the provisioning of the VF including the control
> > > of its transitional capability in the device hands as was suggested by
> > > Jason ?
> >
> > If this is the standard we're going to follow, ie. profiling of a
> > device is expected to occur prior to the probe of the vfio-pci variant
> > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > with this too.
> 
> Those GPU drivers are using mdev not vfio-pci..
> 
> mdev doesn't have a way in its uapi to configure the mdev before it is
> created.
> 
> I'm hopeful that the SIOV work will develop something better because
> we clearly need it for the general use cases of SIOV beyond VFIO.
> 

The internal idxd driver version which I looked at last time leaves
provisioning via idxd's own config interface. sure let's brainstorm
what'd be (if possible) a general provisioning framework after it's
sent out for review.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices
@ 2023-10-25  8:34         ` Tian, Kevin
  0 siblings, 0 replies; 100+ messages in thread
From: Tian, Kevin @ 2023-10-25  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: kvm, mst, maorg, virtualization, jiri, leonro

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, October 23, 2023 11:43 PM
> 
> On Mon, Oct 23, 2023 at 09:33:23AM -0600, Alex Williamson wrote:
> 
> > > Alex,
> > > Are you fine to leave the provisioning of the VF including the control
> > > of its transitional capability in the device hands as was suggested by
> > > Jason ?
> >
> > If this is the standard we're going to follow, ie. profiling of a
> > device is expected to occur prior to the probe of the vfio-pci variant
> > driver, then we should get the out-of-tree NVIDIA vGPU driver on board
> > with this too.
> 
> Those GPU drivers are using mdev not vfio-pci..
> 
> mdev doesn't have a way in its uapi to configure the mdev before it is
> created.
> 
> I'm hopeful that the SIOV work will develop something better because
> we clearly need it for the general use cases of SIOV beyond VFIO.
> 

The internal idxd driver version which I looked at last time leaves
provisioning via idxd's own config interface. sure let's brainstorm
what'd be (if possible) a general provisioning framework after it's
sent out for review.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-24 21:01     ` Michael S. Tsirkin
  (?)
@ 2023-10-25  9:18     ` Yishai Hadas via Virtualization
  2023-10-25 10:17         ` Michael S. Tsirkin
  -1 siblings, 1 reply; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-25  9:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro


[-- Attachment #1.1: Type: text/plain, Size: 11053 bytes --]

On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>> Introduce APIs to execute legacy IO admin commands.
>>
>> It includes: list_query/use, io_legacy_read/write,
>> io_legacy_notify_info.
>>
>> Those APIs will be used by the next patches from this series.
>>
>> Signed-off-by: Yishai Hadas<yishaih@nvidia.com>
>> ---
>>   drivers/virtio/virtio_pci_common.c |  11 ++
>>   drivers/virtio/virtio_pci_common.h |   2 +
>>   drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>   include/linux/virtio_pci_admin.h   |  18 +++
>>   4 files changed, 237 insertions(+)
>>   create mode 100644 include/linux/virtio_pci_admin.h
>>
>> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>> index 6b4766d5abe6..212d68401d2c 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>   	.sriov_configure = virtio_pci_sriov_configure,
>>   };
>>   
>> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>> +{
>> +	struct virtio_pci_device *pf_vp_dev;
>> +
>> +	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>> +	if (IS_ERR(pf_vp_dev))
>> +		return NULL;
>> +
>> +	return &pf_vp_dev->vdev;
>> +}
>> +
>>   module_pci_driver(virtio_pci_driver);
>>   
>>   MODULE_AUTHOR("Anthony Liguori<aliguori@us.ibm.com>");
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index a21b9ba01a60..2785e61ed668 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>   int virtio_pci_modern_probe(struct virtio_pci_device *);
>>   void virtio_pci_modern_remove(struct virtio_pci_device *);
>>   
>> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>> +
>>   #endif
>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>> index cc159a8e6c70..00b65e20b2f5 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>   	vp_dev->del_vq(&vp_dev->admin_vq.info);
>>   }
>>   
>> +/*
>> + * virtio_pci_admin_list_query - Provides to driver list of commands
>> + * supported for the PCI VF.
>> + * @dev: VF pci_dev
>> + * @buf: buffer to hold the returned list
>> + * @buf_size: size of the given buffer
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist result_sg;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	sg_init_one(&result_sg, buf, buf_size);
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.result_sg = &result_sg;
>> +
>> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>> +
>> +/*
>> + * virtio_pci_admin_list_use - Provides to device list of commands
>> + * used for the PCI VF.
>> + * @dev: VF pci_dev
>> + * @buf: buffer which holds the list
>> + * @buf_size: size of the given buffer
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist data_sg;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	sg_init_one(&data_sg, buf, buf_size);
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.data_sg = &data_sg;
>> +
>> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> list commands are actually for a group, not for the VF.

The VF was given to let the function gets the PF from it.

For now, the only existing 'group_type' in the spec is SRIOV, this is 
why we hard-coded it internally to match the VF PCI.

Alternatively,
We can change the API to get the PF and 'group_type' from the caller to 
better match future usage.
However, this will require to export the virtio_pci_vf_get_pf_dev() API 
outside virtio-pci.

Do you prefer to change to the latter option ?

>
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>> + * @dev: VF pci_dev
>> + * @opcode: op code of the io write command
> opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>
> So please just add 2 APIs for this so users don't need to care.
> Could be wrappers around these two things.
>
OK.

We'll export the below 2 APIs [1] which internally will call 
virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.

[1]virtio_pci_admin_legacy_device_io_write()
      virtio_pci_admin_legacy_common_io_write()

Yishai

>
>
>> + * @offset: starting byte offset within the registers to write to
>> + * @size: size of the data to write
>> + * @buf: buffer which holds the data
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>> +				     u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_legacy_wr_data *data;
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist data_sg;
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	data->offset = offset;
>> +	memcpy(data->registers, buf, size);
>> +	sg_init_one(&data_sg, data, sizeof(*data) + size);
>> +	cmd.opcode = cpu_to_le16(opcode);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.data_sg = &data_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>> + * @dev: VF pci_dev
>> + * @opcode: op code of the io read command
>> + * @offset: starting byte offset within the registers to read from
>> + * @size: size of the data to be read
>> + * @buf: buffer to hold the returned data
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>> +				    u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_legacy_rd_data *data;
>> +	struct scatterlist data_sg, result_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	data->offset = offset;
>> +	sg_init_one(&data_sg, data, sizeof(*data));
>> +	sg_init_one(&result_sg, buf, size);
>> +	cmd.opcode = cpu_to_le16(opcode);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.data_sg = &data_sg;
>> +	cmd.result_sg = &result_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>> + * information for legacy interface
>> + * @dev: VF pci_dev
>> + * @req_bar_flags: requested bar flags
>> + * @bar: on output the BAR number of the member device
>> + * @bar_offset: on output the offset within bar
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>> +					   u8 req_bar_flags, u8 *bar,
>> +					   u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_notify_info_result *result;
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist result_sg;
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	result = kzalloc(sizeof(*result), GFP_KERNEL);
>> +	if (!result)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&result_sg, result, sizeof(*result));
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.result_sg = &result_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &result->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(result);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>> +
>>   static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>   	.get		= NULL,
>>   	.set		= NULL,
>> diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>> new file mode 100644
>> index 000000000000..cb916a4bc1b1
>> --- /dev/null
>> +++ b/include/linux/virtio_pci_admin.h
>> @@ -0,0 +1,18 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>> +#define _LINUX_VIRTIO_PCI_ADMIN_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/pci.h>
>> +
>> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>> +				     u8 offset, u8 size, u8 *buf);
>> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>> +				    u8 offset, u8 size, u8 *buf);
>> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>> +					   u8 req_bar_flags, u8 *bar,
>> +					   u64 *bar_offset);
>> +
>> +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>> -- 
>> 2.27.0


[-- Attachment #1.2: Type: text/html, Size: 11844 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-24 21:01     ` Michael S. Tsirkin
@ 2023-10-25  9:36       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-25  9:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

Re sending as previous reply was by mistake not in a text format.

On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>> Introduce APIs to execute legacy IO admin commands.
>>
>> It includes: list_query/use, io_legacy_read/write,
>> io_legacy_notify_info.
>>
>> Those APIs will be used by the next patches from this series.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/virtio/virtio_pci_common.c |  11 ++
>>   drivers/virtio/virtio_pci_common.h |   2 +
>>   drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>   include/linux/virtio_pci_admin.h   |  18 +++
>>   4 files changed, 237 insertions(+)
>>   create mode 100644 include/linux/virtio_pci_admin.h
>>
>> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>> index 6b4766d5abe6..212d68401d2c 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>   	.sriov_configure = virtio_pci_sriov_configure,
>>   };
>>   
>> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>> +{
>> +	struct virtio_pci_device *pf_vp_dev;
>> +
>> +	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>> +	if (IS_ERR(pf_vp_dev))
>> +		return NULL;
>> +
>> +	return &pf_vp_dev->vdev;
>> +}
>> +
>>   module_pci_driver(virtio_pci_driver);
>>   
>>   MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index a21b9ba01a60..2785e61ed668 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>   int virtio_pci_modern_probe(struct virtio_pci_device *);
>>   void virtio_pci_modern_remove(struct virtio_pci_device *);
>>   
>> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>> +
>>   #endif
>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>> index cc159a8e6c70..00b65e20b2f5 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>   	vp_dev->del_vq(&vp_dev->admin_vq.info);
>>   }
>>   
>> +/*
>> + * virtio_pci_admin_list_query - Provides to driver list of commands
>> + * supported for the PCI VF.
>> + * @dev: VF pci_dev
>> + * @buf: buffer to hold the returned list
>> + * @buf_size: size of the given buffer
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist result_sg;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	sg_init_one(&result_sg, buf, buf_size);
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.result_sg = &result_sg;
>> +
>> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>> +
>> +/*
>> + * virtio_pci_admin_list_use - Provides to device list of commands
>> + * used for the PCI VF.
>> + * @dev: VF pci_dev
>> + * @buf: buffer which holds the list
>> + * @buf_size: size of the given buffer
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist data_sg;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	sg_init_one(&data_sg, buf, buf_size);
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.data_sg = &data_sg;
>> +
>> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> list commands are actually for a group, not for the VF.
The VF was given to let the function gets the PF from it.
For now, the only existing 'group_type' in the spec is SRIOV, this is 
why we hard-coded it internally to match the VF PCI.

Alternatively,
We can change the API to get the PF and 'group_type' from the caller to 
better match future usage.
However, this will require to export the virtio_pci_vf_get_pf_dev() API 
outside virtio-pci.

Do you prefer to change to the latter option ?
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>> + * @dev: VF pci_dev
>> + * @opcode: op code of the io write command
> opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>
> So please just add 2 APIs for this so users don't need to care.
> Could be wrappers around these two things.
>
>
OK.
We'll export the below 2 APIs [1] which internally will call 
virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
[1]virtio_pci_admin_legacy_device_io_write()
      virtio_pci_admin_legacy_common_io_write()

Yishai

>
>> + * @offset: starting byte offset within the registers to write to
>> + * @size: size of the data to write
>> + * @buf: buffer which holds the data
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>> +				     u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_legacy_wr_data *data;
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist data_sg;
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	data->offset = offset;
>> +	memcpy(data->registers, buf, size);
>> +	sg_init_one(&data_sg, data, sizeof(*data) + size);
>> +	cmd.opcode = cpu_to_le16(opcode);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.data_sg = &data_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>> + * @dev: VF pci_dev
>> + * @opcode: op code of the io read command
>> + * @offset: starting byte offset within the registers to read from
>> + * @size: size of the data to be read
>> + * @buf: buffer to hold the returned data
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>> +				    u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_legacy_rd_data *data;
>> +	struct scatterlist data_sg, result_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	data->offset = offset;
>> +	sg_init_one(&data_sg, data, sizeof(*data));
>> +	sg_init_one(&result_sg, buf, size);
>> +	cmd.opcode = cpu_to_le16(opcode);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.data_sg = &data_sg;
>> +	cmd.result_sg = &result_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>> + * information for legacy interface
>> + * @dev: VF pci_dev
>> + * @req_bar_flags: requested bar flags
>> + * @bar: on output the BAR number of the member device
>> + * @bar_offset: on output the offset within bar
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>> +					   u8 req_bar_flags, u8 *bar,
>> +					   u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_notify_info_result *result;
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist result_sg;
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	result = kzalloc(sizeof(*result), GFP_KERNEL);
>> +	if (!result)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&result_sg, result, sizeof(*result));
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.result_sg = &result_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &result->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(result);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>> +
>>   static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>   	.get		= NULL,
>>   	.set		= NULL,
>> diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>> new file mode 100644
>> index 000000000000..cb916a4bc1b1
>> --- /dev/null
>> +++ b/include/linux/virtio_pci_admin.h
>> @@ -0,0 +1,18 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>> +#define _LINUX_VIRTIO_PCI_ADMIN_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/pci.h>
>> +
>> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>> +				     u8 offset, u8 size, u8 *buf);
>> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>> +				    u8 offset, u8 size, u8 *buf);
>> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>> +					   u8 req_bar_flags, u8 *bar,
>> +					   u64 *bar_offset);
>> +
>> +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>> -- 
>> 2.27.0



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25  9:36       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-25  9:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

Re sending as previous reply was by mistake not in a text format.

On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>> Introduce APIs to execute legacy IO admin commands.
>>
>> It includes: list_query/use, io_legacy_read/write,
>> io_legacy_notify_info.
>>
>> Those APIs will be used by the next patches from this series.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/virtio/virtio_pci_common.c |  11 ++
>>   drivers/virtio/virtio_pci_common.h |   2 +
>>   drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>   include/linux/virtio_pci_admin.h   |  18 +++
>>   4 files changed, 237 insertions(+)
>>   create mode 100644 include/linux/virtio_pci_admin.h
>>
>> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>> index 6b4766d5abe6..212d68401d2c 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>   	.sriov_configure = virtio_pci_sriov_configure,
>>   };
>>   
>> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>> +{
>> +	struct virtio_pci_device *pf_vp_dev;
>> +
>> +	pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>> +	if (IS_ERR(pf_vp_dev))
>> +		return NULL;
>> +
>> +	return &pf_vp_dev->vdev;
>> +}
>> +
>>   module_pci_driver(virtio_pci_driver);
>>   
>>   MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>> diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>> index a21b9ba01a60..2785e61ed668 100644
>> --- a/drivers/virtio/virtio_pci_common.h
>> +++ b/drivers/virtio/virtio_pci_common.h
>> @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>   int virtio_pci_modern_probe(struct virtio_pci_device *);
>>   void virtio_pci_modern_remove(struct virtio_pci_device *);
>>   
>> +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>> +
>>   #endif
>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>> index cc159a8e6c70..00b65e20b2f5 100644
>> --- a/drivers/virtio/virtio_pci_modern.c
>> +++ b/drivers/virtio/virtio_pci_modern.c
>> @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>   	vp_dev->del_vq(&vp_dev->admin_vq.info);
>>   }
>>   
>> +/*
>> + * virtio_pci_admin_list_query - Provides to driver list of commands
>> + * supported for the PCI VF.
>> + * @dev: VF pci_dev
>> + * @buf: buffer to hold the returned list
>> + * @buf_size: size of the given buffer
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist result_sg;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	sg_init_one(&result_sg, buf, buf_size);
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.result_sg = &result_sg;
>> +
>> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>> +
>> +/*
>> + * virtio_pci_admin_list_use - Provides to device list of commands
>> + * used for the PCI VF.
>> + * @dev: VF pci_dev
>> + * @buf: buffer which holds the list
>> + * @buf_size: size of the given buffer
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist data_sg;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	sg_init_one(&data_sg, buf, buf_size);
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.data_sg = &data_sg;
>> +
>> +	return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> list commands are actually for a group, not for the VF.
The VF was given to let the function gets the PF from it.
For now, the only existing 'group_type' in the spec is SRIOV, this is 
why we hard-coded it internally to match the VF PCI.

Alternatively,
We can change the API to get the PF and 'group_type' from the caller to 
better match future usage.
However, this will require to export the virtio_pci_vf_get_pf_dev() API 
outside virtio-pci.

Do you prefer to change to the latter option ?
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>> + * @dev: VF pci_dev
>> + * @opcode: op code of the io write command
> opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>
> So please just add 2 APIs for this so users don't need to care.
> Could be wrappers around these two things.
>
>
OK.
We'll export the below 2 APIs [1] which internally will call 
virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
[1]virtio_pci_admin_legacy_device_io_write()
      virtio_pci_admin_legacy_common_io_write()

Yishai

>
>> + * @offset: starting byte offset within the registers to write to
>> + * @size: size of the data to write
>> + * @buf: buffer which holds the data
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>> +				     u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_legacy_wr_data *data;
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist data_sg;
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	data->offset = offset;
>> +	memcpy(data->registers, buf, size);
>> +	sg_init_one(&data_sg, data, sizeof(*data) + size);
>> +	cmd.opcode = cpu_to_le16(opcode);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.data_sg = &data_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>> + * @dev: VF pci_dev
>> + * @opcode: op code of the io read command
>> + * @offset: starting byte offset within the registers to read from
>> + * @size: size of the data to be read
>> + * @buf: buffer to hold the returned data
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>> +				    u8 offset, u8 size, u8 *buf)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_legacy_rd_data *data;
>> +	struct scatterlist data_sg, result_sg;
>> +	struct virtio_admin_cmd cmd = {};
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	data->offset = offset;
>> +	sg_init_one(&data_sg, data, sizeof(*data));
>> +	sg_init_one(&result_sg, buf, size);
>> +	cmd.opcode = cpu_to_le16(opcode);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.data_sg = &data_sg;
>> +	cmd.result_sg = &result_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>> +
>> +/*
>> + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>> + * information for legacy interface
>> + * @dev: VF pci_dev
>> + * @req_bar_flags: requested bar flags
>> + * @bar: on output the BAR number of the member device
>> + * @bar_offset: on output the offset within bar
>> + *
>> + * Returns 0 on success, or negative on failure.
>> + */
>> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>> +					   u8 req_bar_flags, u8 *bar,
>> +					   u64 *bar_offset)
>> +{
>> +	struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>> +	struct virtio_admin_cmd_notify_info_result *result;
>> +	struct virtio_admin_cmd cmd = {};
>> +	struct scatterlist result_sg;
>> +	int vf_id;
>> +	int ret;
>> +
>> +	if (!virtio_dev)
>> +		return -ENODEV;
>> +
>> +	vf_id = pci_iov_vf_id(pdev);
>> +	if (vf_id < 0)
>> +		return vf_id;
>> +
>> +	result = kzalloc(sizeof(*result), GFP_KERNEL);
>> +	if (!result)
>> +		return -ENOMEM;
>> +
>> +	sg_init_one(&result_sg, result, sizeof(*result));
>> +	cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>> +	cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>> +	cmd.group_member_id = cpu_to_le64(vf_id + 1);
>> +	cmd.result_sg = &result_sg;
>> +	ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>> +	if (!ret) {
>> +		struct virtio_admin_cmd_notify_info_data *entry;
>> +		int i;
>> +
>> +		ret = -ENOENT;
>> +		for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>> +			entry = &result->entries[i];
>> +			if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>> +				break;
>> +			if (entry->flags != req_bar_flags)
>> +				continue;
>> +			*bar = entry->bar;
>> +			*bar_offset = le64_to_cpu(entry->offset);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	kfree(result);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>> +
>>   static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>   	.get		= NULL,
>>   	.set		= NULL,
>> diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>> new file mode 100644
>> index 000000000000..cb916a4bc1b1
>> --- /dev/null
>> +++ b/include/linux/virtio_pci_admin.h
>> @@ -0,0 +1,18 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>> +#define _LINUX_VIRTIO_PCI_ADMIN_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/pci.h>
>> +
>> +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>> +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>> +				     u8 offset, u8 size, u8 *buf);
>> +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>> +				    u8 offset, u8 size, u8 *buf);
>> +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>> +					   u8 req_bar_flags, u8 *bar,
>> +					   u64 *bar_offset);
>> +
>> +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>> -- 
>> 2.27.0


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-25  9:18     ` Yishai Hadas via Virtualization
@ 2023-10-25 10:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 10:17 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
> On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> 
>     On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> 
>         Introduce APIs to execute legacy IO admin commands.
> 
>         It includes: list_query/use, io_legacy_read/write,
>         io_legacy_notify_info.
> 
>         Those APIs will be used by the next patches from this series.
> 
>         Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>         ---
>          drivers/virtio/virtio_pci_common.c |  11 ++
>          drivers/virtio/virtio_pci_common.h |   2 +
>          drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>          include/linux/virtio_pci_admin.h   |  18 +++
>          4 files changed, 237 insertions(+)
>          create mode 100644 include/linux/virtio_pci_admin.h
> 
>         diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>         index 6b4766d5abe6..212d68401d2c 100644
>         --- a/drivers/virtio/virtio_pci_common.c
>         +++ b/drivers/virtio/virtio_pci_common.c
>         @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>                 .sriov_configure = virtio_pci_sriov_configure,
>          };
> 
>         +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>         +{
>         +       struct virtio_pci_device *pf_vp_dev;
>         +
>         +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>         +       if (IS_ERR(pf_vp_dev))
>         +               return NULL;
>         +
>         +       return &pf_vp_dev->vdev;
>         +}
>         +
>          module_pci_driver(virtio_pci_driver);
> 
>          MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>         diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>         index a21b9ba01a60..2785e61ed668 100644
>         --- a/drivers/virtio/virtio_pci_common.h
>         +++ b/drivers/virtio/virtio_pci_common.h
>         @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>          int virtio_pci_modern_probe(struct virtio_pci_device *);
>          void virtio_pci_modern_remove(struct virtio_pci_device *);
> 
>         +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>         +
>          #endif
>         diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>         index cc159a8e6c70..00b65e20b2f5 100644
>         --- a/drivers/virtio/virtio_pci_modern.c
>         +++ b/drivers/virtio/virtio_pci_modern.c
>         @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>                 vp_dev->del_vq(&vp_dev->admin_vq.info);
>          }
> 
>         +/*
>         + * virtio_pci_admin_list_query - Provides to driver list of commands
>         + * supported for the PCI VF.
>         + * @dev: VF pci_dev
>         + * @buf: buffer to hold the returned list
>         + * @buf_size: size of the given buffer
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist result_sg;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       sg_init_one(&result_sg, buf, buf_size);
>         +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.result_sg = &result_sg;
>         +
>         +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>         +
>         +/*
>         + * virtio_pci_admin_list_use - Provides to device list of commands
>         + * used for the PCI VF.
>         + * @dev: VF pci_dev
>         + * @buf: buffer which holds the list
>         + * @buf_size: size of the given buffer
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist data_sg;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       sg_init_one(&data_sg, buf, buf_size);
>         +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.data_sg = &data_sg;
>         +
>         +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> 
>     list commands are actually for a group, not for the VF.
> 
> The VF was given to let the function gets the PF from it.
> 
> For now, the only existing 'group_type' in the spec is SRIOV, this is why we
> hard-coded it internally to match the VF PCI.
> 
> Alternatively,
> We can change the API to get the PF and 'group_type' from the caller to better
> match future usage.
> However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
> virtio-pci.
> 
> Do you prefer to change to the latter option ?

No, there are several points I wanted to make but this
was not one of them.

First, for query, I was trying to suggest changing the comment.
Something like:
         + * virtio_pci_admin_list_query - Provides to driver list of commands
         + * supported for the group including the given member device.
         + * @dev: member pci device.
	


Second, I don't think using buf/size  like this is necessary.
For now we have a small number of commands just work with u64.


Third, while list could be an OK API, the use API does not
really work. If you call use with one set of parameters for
one VF and another for another then they conflict do they not?

So you need virtio core to do the list/use dance for you,
save the list of commands on the PF (which again is just u64 for now)
and vfio or vdpa or whatnot will just query that.
I hope I'm being clear.



> 
> 
> 
>         +
>         +/*
>         + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>         + * @dev: VF pci_dev
>         + * @opcode: op code of the io write command
> 
>     opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
>     or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
> 
>     So please just add 2 APIs for this so users don't need to care.
>     Could be wrappers around these two things.
> 
> 
> OK.
> 
> We'll export the below 2 APIs [1] which internally will call
> virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
> 
> [1]virtio_pci_admin_legacy_device_io_write()
>      virtio_pci_admin_legacy_common_io_write()
> 
> Yishai
>

Makes sense.
 
> 
> 
>         + * @offset: starting byte offset within the registers to write to
>         + * @size: size of the data to write
>         + * @buf: buffer which holds the data
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>         +                                    u8 offset, u8 size, u8 *buf)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd_legacy_wr_data *data;
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist data_sg;
>         +       int vf_id;
>         +       int ret;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       vf_id = pci_iov_vf_id(pdev);
>         +       if (vf_id < 0)
>         +               return vf_id;
>         +
>         +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>         +       if (!data)
>         +               return -ENOMEM;
>         +
>         +       data->offset = offset;
>         +       memcpy(data->registers, buf, size);
>         +       sg_init_one(&data_sg, data, sizeof(*data) + size);
>         +       cmd.opcode = cpu_to_le16(opcode);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>         +       cmd.data_sg = &data_sg;
>         +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +
>         +       kfree(data);
>         +       return ret;
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>         +
>         +/*
>         + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>         + * @dev: VF pci_dev
>         + * @opcode: op code of the io read command
>         + * @offset: starting byte offset within the registers to read from
>         + * @size: size of the data to be read
>         + * @buf: buffer to hold the returned data
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>         +                                   u8 offset, u8 size, u8 *buf)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd_legacy_rd_data *data;
>         +       struct scatterlist data_sg, result_sg;
>         +       struct virtio_admin_cmd cmd = {};
>         +       int vf_id;
>         +       int ret;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       vf_id = pci_iov_vf_id(pdev);
>         +       if (vf_id < 0)
>         +               return vf_id;
>         +
>         +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>         +       if (!data)
>         +               return -ENOMEM;
>         +
>         +       data->offset = offset;
>         +       sg_init_one(&data_sg, data, sizeof(*data));
>         +       sg_init_one(&result_sg, buf, size);
>         +       cmd.opcode = cpu_to_le16(opcode);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>         +       cmd.data_sg = &data_sg;
>         +       cmd.result_sg = &result_sg;
>         +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +
>         +       kfree(data);
>         +       return ret;
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>         +
>         +/*
>         + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>         + * information for legacy interface
>         + * @dev: VF pci_dev
>         + * @req_bar_flags: requested bar flags
>         + * @bar: on output the BAR number of the member device
>         + * @bar_offset: on output the offset within bar
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>         +                                          u8 req_bar_flags, u8 *bar,
>         +                                          u64 *bar_offset)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd_notify_info_result *result;
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist result_sg;
>         +       int vf_id;
>         +       int ret;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       vf_id = pci_iov_vf_id(pdev);
>         +       if (vf_id < 0)
>         +               return vf_id;
>         +
>         +       result = kzalloc(sizeof(*result), GFP_KERNEL);
>         +       if (!result)
>         +               return -ENOMEM;
>         +
>         +       sg_init_one(&result_sg, result, sizeof(*result));
>         +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>         +       cmd.result_sg = &result_sg;
>         +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +       if (!ret) {
>         +               struct virtio_admin_cmd_notify_info_data *entry;
>         +               int i;
>         +
>         +               ret = -ENOENT;
>         +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>         +                       entry = &result->entries[i];
>         +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>         +                               break;
>         +                       if (entry->flags != req_bar_flags)
>         +                               continue;
>         +                       *bar = entry->bar;
>         +                       *bar_offset = le64_to_cpu(entry->offset);
>         +                       ret = 0;
>         +                       break;
>         +               }
>         +       }
>         +
>         +       kfree(result);
>         +       return ret;
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>         +
>          static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>                 .get            = NULL,
>                 .set            = NULL,
>         diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>         new file mode 100644
>         index 000000000000..cb916a4bc1b1
>         --- /dev/null
>         +++ b/include/linux/virtio_pci_admin.h
>         @@ -0,0 +1,18 @@
>         +/* SPDX-License-Identifier: GPL-2.0 */
>         +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>         +#define _LINUX_VIRTIO_PCI_ADMIN_H
>         +
>         +#include <linux/types.h>
>         +#include <linux/pci.h>
>         +
>         +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>         +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>         +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>         +                                    u8 offset, u8 size, u8 *buf);
>         +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>         +                                   u8 offset, u8 size, u8 *buf);
>         +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>         +                                          u8 req_bar_flags, u8 *bar,
>         +                                          u64 *bar_offset);
>         +
>         +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>         --
>         2.27.0
> 
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25 10:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 10:17 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
> On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> 
>     On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> 
>         Introduce APIs to execute legacy IO admin commands.
> 
>         It includes: list_query/use, io_legacy_read/write,
>         io_legacy_notify_info.
> 
>         Those APIs will be used by the next patches from this series.
> 
>         Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>         ---
>          drivers/virtio/virtio_pci_common.c |  11 ++
>          drivers/virtio/virtio_pci_common.h |   2 +
>          drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>          include/linux/virtio_pci_admin.h   |  18 +++
>          4 files changed, 237 insertions(+)
>          create mode 100644 include/linux/virtio_pci_admin.h
> 
>         diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>         index 6b4766d5abe6..212d68401d2c 100644
>         --- a/drivers/virtio/virtio_pci_common.c
>         +++ b/drivers/virtio/virtio_pci_common.c
>         @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>                 .sriov_configure = virtio_pci_sriov_configure,
>          };
> 
>         +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>         +{
>         +       struct virtio_pci_device *pf_vp_dev;
>         +
>         +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>         +       if (IS_ERR(pf_vp_dev))
>         +               return NULL;
>         +
>         +       return &pf_vp_dev->vdev;
>         +}
>         +
>          module_pci_driver(virtio_pci_driver);
> 
>          MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>         diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>         index a21b9ba01a60..2785e61ed668 100644
>         --- a/drivers/virtio/virtio_pci_common.h
>         +++ b/drivers/virtio/virtio_pci_common.h
>         @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>          int virtio_pci_modern_probe(struct virtio_pci_device *);
>          void virtio_pci_modern_remove(struct virtio_pci_device *);
> 
>         +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>         +
>          #endif
>         diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>         index cc159a8e6c70..00b65e20b2f5 100644
>         --- a/drivers/virtio/virtio_pci_modern.c
>         +++ b/drivers/virtio/virtio_pci_modern.c
>         @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>                 vp_dev->del_vq(&vp_dev->admin_vq.info);
>          }
> 
>         +/*
>         + * virtio_pci_admin_list_query - Provides to driver list of commands
>         + * supported for the PCI VF.
>         + * @dev: VF pci_dev
>         + * @buf: buffer to hold the returned list
>         + * @buf_size: size of the given buffer
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist result_sg;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       sg_init_one(&result_sg, buf, buf_size);
>         +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.result_sg = &result_sg;
>         +
>         +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>         +
>         +/*
>         + * virtio_pci_admin_list_use - Provides to device list of commands
>         + * used for the PCI VF.
>         + * @dev: VF pci_dev
>         + * @buf: buffer which holds the list
>         + * @buf_size: size of the given buffer
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist data_sg;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       sg_init_one(&data_sg, buf, buf_size);
>         +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.data_sg = &data_sg;
>         +
>         +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> 
>     list commands are actually for a group, not for the VF.
> 
> The VF was given to let the function gets the PF from it.
> 
> For now, the only existing 'group_type' in the spec is SRIOV, this is why we
> hard-coded it internally to match the VF PCI.
> 
> Alternatively,
> We can change the API to get the PF and 'group_type' from the caller to better
> match future usage.
> However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
> virtio-pci.
> 
> Do you prefer to change to the latter option ?

No, there are several points I wanted to make but this
was not one of them.

First, for query, I was trying to suggest changing the comment.
Something like:
         + * virtio_pci_admin_list_query - Provides to driver list of commands
         + * supported for the group including the given member device.
         + * @dev: member pci device.
	


Second, I don't think using buf/size  like this is necessary.
For now we have a small number of commands just work with u64.


Third, while list could be an OK API, the use API does not
really work. If you call use with one set of parameters for
one VF and another for another then they conflict do they not?

So you need virtio core to do the list/use dance for you,
save the list of commands on the PF (which again is just u64 for now)
and vfio or vdpa or whatnot will just query that.
I hope I'm being clear.



> 
> 
> 
>         +
>         +/*
>         + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>         + * @dev: VF pci_dev
>         + * @opcode: op code of the io write command
> 
>     opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
>     or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
> 
>     So please just add 2 APIs for this so users don't need to care.
>     Could be wrappers around these two things.
> 
> 
> OK.
> 
> We'll export the below 2 APIs [1] which internally will call
> virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
> 
> [1]virtio_pci_admin_legacy_device_io_write()
>      virtio_pci_admin_legacy_common_io_write()
> 
> Yishai
>

Makes sense.
 
> 
> 
>         + * @offset: starting byte offset within the registers to write to
>         + * @size: size of the data to write
>         + * @buf: buffer which holds the data
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>         +                                    u8 offset, u8 size, u8 *buf)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd_legacy_wr_data *data;
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist data_sg;
>         +       int vf_id;
>         +       int ret;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       vf_id = pci_iov_vf_id(pdev);
>         +       if (vf_id < 0)
>         +               return vf_id;
>         +
>         +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>         +       if (!data)
>         +               return -ENOMEM;
>         +
>         +       data->offset = offset;
>         +       memcpy(data->registers, buf, size);
>         +       sg_init_one(&data_sg, data, sizeof(*data) + size);
>         +       cmd.opcode = cpu_to_le16(opcode);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>         +       cmd.data_sg = &data_sg;
>         +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +
>         +       kfree(data);
>         +       return ret;
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>         +
>         +/*
>         + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>         + * @dev: VF pci_dev
>         + * @opcode: op code of the io read command
>         + * @offset: starting byte offset within the registers to read from
>         + * @size: size of the data to be read
>         + * @buf: buffer to hold the returned data
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>         +                                   u8 offset, u8 size, u8 *buf)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd_legacy_rd_data *data;
>         +       struct scatterlist data_sg, result_sg;
>         +       struct virtio_admin_cmd cmd = {};
>         +       int vf_id;
>         +       int ret;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       vf_id = pci_iov_vf_id(pdev);
>         +       if (vf_id < 0)
>         +               return vf_id;
>         +
>         +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>         +       if (!data)
>         +               return -ENOMEM;
>         +
>         +       data->offset = offset;
>         +       sg_init_one(&data_sg, data, sizeof(*data));
>         +       sg_init_one(&result_sg, buf, size);
>         +       cmd.opcode = cpu_to_le16(opcode);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>         +       cmd.data_sg = &data_sg;
>         +       cmd.result_sg = &result_sg;
>         +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +
>         +       kfree(data);
>         +       return ret;
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>         +
>         +/*
>         + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>         + * information for legacy interface
>         + * @dev: VF pci_dev
>         + * @req_bar_flags: requested bar flags
>         + * @bar: on output the BAR number of the member device
>         + * @bar_offset: on output the offset within bar
>         + *
>         + * Returns 0 on success, or negative on failure.
>         + */
>         +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>         +                                          u8 req_bar_flags, u8 *bar,
>         +                                          u64 *bar_offset)
>         +{
>         +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>         +       struct virtio_admin_cmd_notify_info_result *result;
>         +       struct virtio_admin_cmd cmd = {};
>         +       struct scatterlist result_sg;
>         +       int vf_id;
>         +       int ret;
>         +
>         +       if (!virtio_dev)
>         +               return -ENODEV;
>         +
>         +       vf_id = pci_iov_vf_id(pdev);
>         +       if (vf_id < 0)
>         +               return vf_id;
>         +
>         +       result = kzalloc(sizeof(*result), GFP_KERNEL);
>         +       if (!result)
>         +               return -ENOMEM;
>         +
>         +       sg_init_one(&result_sg, result, sizeof(*result));
>         +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>         +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>         +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>         +       cmd.result_sg = &result_sg;
>         +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>         +       if (!ret) {
>         +               struct virtio_admin_cmd_notify_info_data *entry;
>         +               int i;
>         +
>         +               ret = -ENOENT;
>         +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>         +                       entry = &result->entries[i];
>         +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>         +                               break;
>         +                       if (entry->flags != req_bar_flags)
>         +                               continue;
>         +                       *bar = entry->bar;
>         +                       *bar_offset = le64_to_cpu(entry->offset);
>         +                       ret = 0;
>         +                       break;
>         +               }
>         +       }
>         +
>         +       kfree(result);
>         +       return ret;
>         +}
>         +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>         +
>          static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>                 .get            = NULL,
>                 .set            = NULL,
>         diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>         new file mode 100644
>         index 000000000000..cb916a4bc1b1
>         --- /dev/null
>         +++ b/include/linux/virtio_pci_admin.h
>         @@ -0,0 +1,18 @@
>         +/* SPDX-License-Identifier: GPL-2.0 */
>         +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>         +#define _LINUX_VIRTIO_PCI_ADMIN_H
>         +
>         +#include <linux/types.h>
>         +#include <linux/pci.h>
>         +
>         +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>         +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>         +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>         +                                    u8 offset, u8 size, u8 *buf);
>         +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>         +                                   u8 offset, u8 size, u8 *buf);
>         +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>         +                                          u8 req_bar_flags, u8 *bar,
>         +                                          u64 *bar_offset);
>         +
>         +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>         --
>         2.27.0
> 
> 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-25 10:17         ` Michael S. Tsirkin
@ 2023-10-25 13:00           ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-25 13:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On 25/10/2023 13:17, Michael S. Tsirkin wrote:
> On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
>> On 25/10/2023 0:01, Michael S. Tsirkin wrote:
>>
>>      On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>>
>>          Introduce APIs to execute legacy IO admin commands.
>>
>>          It includes: list_query/use, io_legacy_read/write,
>>          io_legacy_notify_info.
>>
>>          Those APIs will be used by the next patches from this series.
>>
>>          Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>          ---
>>           drivers/virtio/virtio_pci_common.c |  11 ++
>>           drivers/virtio/virtio_pci_common.h |   2 +
>>           drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>           include/linux/virtio_pci_admin.h   |  18 +++
>>           4 files changed, 237 insertions(+)
>>           create mode 100644 include/linux/virtio_pci_admin.h
>>
>>          diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>>          index 6b4766d5abe6..212d68401d2c 100644
>>          --- a/drivers/virtio/virtio_pci_common.c
>>          +++ b/drivers/virtio/virtio_pci_common.c
>>          @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>                  .sriov_configure = virtio_pci_sriov_configure,
>>           };
>>
>>          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>>          +{
>>          +       struct virtio_pci_device *pf_vp_dev;
>>          +
>>          +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>>          +       if (IS_ERR(pf_vp_dev))
>>          +               return NULL;
>>          +
>>          +       return &pf_vp_dev->vdev;
>>          +}
>>          +
>>           module_pci_driver(virtio_pci_driver);
>>
>>           MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>>          diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>>          index a21b9ba01a60..2785e61ed668 100644
>>          --- a/drivers/virtio/virtio_pci_common.h
>>          +++ b/drivers/virtio/virtio_pci_common.h
>>          @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>           int virtio_pci_modern_probe(struct virtio_pci_device *);
>>           void virtio_pci_modern_remove(struct virtio_pci_device *);
>>
>>          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>>          +
>>           #endif
>>          diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>>          index cc159a8e6c70..00b65e20b2f5 100644
>>          --- a/drivers/virtio/virtio_pci_modern.c
>>          +++ b/drivers/virtio/virtio_pci_modern.c
>>          @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>                  vp_dev->del_vq(&vp_dev->admin_vq.info);
>>           }
>>
>>          +/*
>>          + * virtio_pci_admin_list_query - Provides to driver list of commands
>>          + * supported for the PCI VF.
>>          + * @dev: VF pci_dev
>>          + * @buf: buffer to hold the returned list
>>          + * @buf_size: size of the given buffer
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist result_sg;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       sg_init_one(&result_sg, buf, buf_size);
>>          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.result_sg = &result_sg;
>>          +
>>          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>>          +
>>          +/*
>>          + * virtio_pci_admin_list_use - Provides to device list of commands
>>          + * used for the PCI VF.
>>          + * @dev: VF pci_dev
>>          + * @buf: buffer which holds the list
>>          + * @buf_size: size of the given buffer
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist data_sg;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       sg_init_one(&data_sg, buf, buf_size);
>>          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.data_sg = &data_sg;
>>          +
>>          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
>>
>>      list commands are actually for a group, not for the VF.
>>
>> The VF was given to let the function gets the PF from it.
>>
>> For now, the only existing 'group_type' in the spec is SRIOV, this is why we
>> hard-coded it internally to match the VF PCI.
>>
>> Alternatively,
>> We can change the API to get the PF and 'group_type' from the caller to better
>> match future usage.
>> However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
>> virtio-pci.
>>
>> Do you prefer to change to the latter option ?
> No, there are several points I wanted to make but this
> was not one of them.
>
> First, for query, I was trying to suggest changing the comment.
> Something like:
>           + * virtio_pci_admin_list_query - Provides to driver list of commands
>           + * supported for the group including the given member device.
>           + * @dev: member pci device.

Following your suggestion below, to issue inside virtio the query/use 
and keep its data internally (i.e. on the 'admin_queue' context).

We may suggest the below API for the upper-layers (e.g. vfio) to be 
exported.

bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)

It will find the PF from the VF and internally will check on the 
'admin_queue' context whether the given 'cmds' input is supported.

Its output will be true/false.

Makes sense ?

> 	
>
>
> Second, I don't think using buf/size  like this is necessary.
> For now we have a small number of commands just work with u64.
OK, just keep in mind that upon issuing the command towards the 
controller this still needs to be an allocated u64 data on the heap to 
work properly.
>
>
> Third, while list could be an OK API, the use API does not
> really work. If you call use with one set of parameters for
> one VF and another for another then they conflict do they not?
>
> So you need virtio core to do the list/use dance for you,
> save the list of commands on the PF (which again is just u64 for now)
> and vfio or vdpa or whatnot will just query that.
> I hope I'm being clear.

In that case the virtio_pci_admin_list_query() and 
virtio_pci_admin_list_use() won't be exported any more and will be 
static in virtio-pci.

They will be called internally as part of activating the admin_queue and 
will simply get struct virtio_device* (the PF) instead of struct pci_dev 
*pdev.

>
>
>>
>>
>>          +
>>          +/*
>>          + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>>          + * @dev: VF pci_dev
>>          + * @opcode: op code of the io write command
>>
>>      opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
>>      or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>>
>>      So please just add 2 APIs for this so users don't need to care.
>>      Could be wrappers around these two things.
>>
>>
>> OK.
>>
>> We'll export the below 2 APIs [1] which internally will call
>> virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
>>
>> [1]virtio_pci_admin_legacy_device_io_write()
>>       virtio_pci_admin_legacy_common_io_write()
>>
>> Yishai
>>
> Makes sense.
>   

OK, we may do the same split for the 'legacy_io_read' commands to be 
symmetric with the 'legacy_io_write', right ?

Yishai

>>
>>          + * @offset: starting byte offset within the registers to write to
>>          + * @size: size of the data to write
>>          + * @buf: buffer which holds the data
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>          +                                    u8 offset, u8 size, u8 *buf)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd_legacy_wr_data *data;
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist data_sg;
>>          +       int vf_id;
>>          +       int ret;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       vf_id = pci_iov_vf_id(pdev);
>>          +       if (vf_id < 0)
>>          +               return vf_id;
>>          +
>>          +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>>          +       if (!data)
>>          +               return -ENOMEM;
>>          +
>>          +       data->offset = offset;
>>          +       memcpy(data->registers, buf, size);
>>          +       sg_init_one(&data_sg, data, sizeof(*data) + size);
>>          +       cmd.opcode = cpu_to_le16(opcode);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>          +       cmd.data_sg = &data_sg;
>>          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +
>>          +       kfree(data);
>>          +       return ret;
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>>          +
>>          +/*
>>          + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>>          + * @dev: VF pci_dev
>>          + * @opcode: op code of the io read command
>>          + * @offset: starting byte offset within the registers to read from
>>          + * @size: size of the data to be read
>>          + * @buf: buffer to hold the returned data
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>          +                                   u8 offset, u8 size, u8 *buf)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd_legacy_rd_data *data;
>>          +       struct scatterlist data_sg, result_sg;
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       int vf_id;
>>          +       int ret;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       vf_id = pci_iov_vf_id(pdev);
>>          +       if (vf_id < 0)
>>          +               return vf_id;
>>          +
>>          +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>>          +       if (!data)
>>          +               return -ENOMEM;
>>          +
>>          +       data->offset = offset;
>>          +       sg_init_one(&data_sg, data, sizeof(*data));
>>          +       sg_init_one(&result_sg, buf, size);
>>          +       cmd.opcode = cpu_to_le16(opcode);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>          +       cmd.data_sg = &data_sg;
>>          +       cmd.result_sg = &result_sg;
>>          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +
>>          +       kfree(data);
>>          +       return ret;
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>>          +
>>          +/*
>>          + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>>          + * information for legacy interface
>>          + * @dev: VF pci_dev
>>          + * @req_bar_flags: requested bar flags
>>          + * @bar: on output the BAR number of the member device
>>          + * @bar_offset: on output the offset within bar
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>          +                                          u8 req_bar_flags, u8 *bar,
>>          +                                          u64 *bar_offset)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd_notify_info_result *result;
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist result_sg;
>>          +       int vf_id;
>>          +       int ret;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       vf_id = pci_iov_vf_id(pdev);
>>          +       if (vf_id < 0)
>>          +               return vf_id;
>>          +
>>          +       result = kzalloc(sizeof(*result), GFP_KERNEL);
>>          +       if (!result)
>>          +               return -ENOMEM;
>>          +
>>          +       sg_init_one(&result_sg, result, sizeof(*result));
>>          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>          +       cmd.result_sg = &result_sg;
>>          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +       if (!ret) {
>>          +               struct virtio_admin_cmd_notify_info_data *entry;
>>          +               int i;
>>          +
>>          +               ret = -ENOENT;
>>          +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>>          +                       entry = &result->entries[i];
>>          +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>>          +                               break;
>>          +                       if (entry->flags != req_bar_flags)
>>          +                               continue;
>>          +                       *bar = entry->bar;
>>          +                       *bar_offset = le64_to_cpu(entry->offset);
>>          +                       ret = 0;
>>          +                       break;
>>          +               }
>>          +       }
>>          +
>>          +       kfree(result);
>>          +       return ret;
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>>          +
>>           static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>                  .get            = NULL,
>>                  .set            = NULL,
>>          diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>>          new file mode 100644
>>          index 000000000000..cb916a4bc1b1
>>          --- /dev/null
>>          +++ b/include/linux/virtio_pci_admin.h
>>          @@ -0,0 +1,18 @@
>>          +/* SPDX-License-Identifier: GPL-2.0 */
>>          +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>>          +#define _LINUX_VIRTIO_PCI_ADMIN_H
>>          +
>>          +#include <linux/types.h>
>>          +#include <linux/pci.h>
>>          +
>>          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>>          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>>          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>          +                                    u8 offset, u8 size, u8 *buf);
>>          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>          +                                   u8 offset, u8 size, u8 *buf);
>>          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>          +                                          u8 req_bar_flags, u8 *bar,
>>          +                                          u64 *bar_offset);
>>          +
>>          +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>>          --
>>          2.27.0
>>
>>


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25 13:00           ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-25 13:00 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On 25/10/2023 13:17, Michael S. Tsirkin wrote:
> On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
>> On 25/10/2023 0:01, Michael S. Tsirkin wrote:
>>
>>      On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>>
>>          Introduce APIs to execute legacy IO admin commands.
>>
>>          It includes: list_query/use, io_legacy_read/write,
>>          io_legacy_notify_info.
>>
>>          Those APIs will be used by the next patches from this series.
>>
>>          Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>          ---
>>           drivers/virtio/virtio_pci_common.c |  11 ++
>>           drivers/virtio/virtio_pci_common.h |   2 +
>>           drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>           include/linux/virtio_pci_admin.h   |  18 +++
>>           4 files changed, 237 insertions(+)
>>           create mode 100644 include/linux/virtio_pci_admin.h
>>
>>          diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>>          index 6b4766d5abe6..212d68401d2c 100644
>>          --- a/drivers/virtio/virtio_pci_common.c
>>          +++ b/drivers/virtio/virtio_pci_common.c
>>          @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>                  .sriov_configure = virtio_pci_sriov_configure,
>>           };
>>
>>          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>>          +{
>>          +       struct virtio_pci_device *pf_vp_dev;
>>          +
>>          +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>>          +       if (IS_ERR(pf_vp_dev))
>>          +               return NULL;
>>          +
>>          +       return &pf_vp_dev->vdev;
>>          +}
>>          +
>>           module_pci_driver(virtio_pci_driver);
>>
>>           MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>>          diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>>          index a21b9ba01a60..2785e61ed668 100644
>>          --- a/drivers/virtio/virtio_pci_common.h
>>          +++ b/drivers/virtio/virtio_pci_common.h
>>          @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>           int virtio_pci_modern_probe(struct virtio_pci_device *);
>>           void virtio_pci_modern_remove(struct virtio_pci_device *);
>>
>>          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>>          +
>>           #endif
>>          diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>>          index cc159a8e6c70..00b65e20b2f5 100644
>>          --- a/drivers/virtio/virtio_pci_modern.c
>>          +++ b/drivers/virtio/virtio_pci_modern.c
>>          @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>                  vp_dev->del_vq(&vp_dev->admin_vq.info);
>>           }
>>
>>          +/*
>>          + * virtio_pci_admin_list_query - Provides to driver list of commands
>>          + * supported for the PCI VF.
>>          + * @dev: VF pci_dev
>>          + * @buf: buffer to hold the returned list
>>          + * @buf_size: size of the given buffer
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist result_sg;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       sg_init_one(&result_sg, buf, buf_size);
>>          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.result_sg = &result_sg;
>>          +
>>          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>>          +
>>          +/*
>>          + * virtio_pci_admin_list_use - Provides to device list of commands
>>          + * used for the PCI VF.
>>          + * @dev: VF pci_dev
>>          + * @buf: buffer which holds the list
>>          + * @buf_size: size of the given buffer
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist data_sg;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       sg_init_one(&data_sg, buf, buf_size);
>>          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.data_sg = &data_sg;
>>          +
>>          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
>>
>>      list commands are actually for a group, not for the VF.
>>
>> The VF was given to let the function gets the PF from it.
>>
>> For now, the only existing 'group_type' in the spec is SRIOV, this is why we
>> hard-coded it internally to match the VF PCI.
>>
>> Alternatively,
>> We can change the API to get the PF and 'group_type' from the caller to better
>> match future usage.
>> However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
>> virtio-pci.
>>
>> Do you prefer to change to the latter option ?
> No, there are several points I wanted to make but this
> was not one of them.
>
> First, for query, I was trying to suggest changing the comment.
> Something like:
>           + * virtio_pci_admin_list_query - Provides to driver list of commands
>           + * supported for the group including the given member device.
>           + * @dev: member pci device.

Following your suggestion below, to issue inside virtio the query/use 
and keep its data internally (i.e. on the 'admin_queue' context).

We may suggest the below API for the upper-layers (e.g. vfio) to be 
exported.

bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)

It will find the PF from the VF and internally will check on the 
'admin_queue' context whether the given 'cmds' input is supported.

Its output will be true/false.

Makes sense ?

> 	
>
>
> Second, I don't think using buf/size  like this is necessary.
> For now we have a small number of commands just work with u64.
OK, just keep in mind that upon issuing the command towards the 
controller this still needs to be an allocated u64 data on the heap to 
work properly.
>
>
> Third, while list could be an OK API, the use API does not
> really work. If you call use with one set of parameters for
> one VF and another for another then they conflict do they not?
>
> So you need virtio core to do the list/use dance for you,
> save the list of commands on the PF (which again is just u64 for now)
> and vfio or vdpa or whatnot will just query that.
> I hope I'm being clear.

In that case the virtio_pci_admin_list_query() and 
virtio_pci_admin_list_use() won't be exported any more and will be 
static in virtio-pci.

They will be called internally as part of activating the admin_queue and 
will simply get struct virtio_device* (the PF) instead of struct pci_dev 
*pdev.

>
>
>>
>>
>>          +
>>          +/*
>>          + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>>          + * @dev: VF pci_dev
>>          + * @opcode: op code of the io write command
>>
>>      opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
>>      or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>>
>>      So please just add 2 APIs for this so users don't need to care.
>>      Could be wrappers around these two things.
>>
>>
>> OK.
>>
>> We'll export the below 2 APIs [1] which internally will call
>> virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
>>
>> [1]virtio_pci_admin_legacy_device_io_write()
>>       virtio_pci_admin_legacy_common_io_write()
>>
>> Yishai
>>
> Makes sense.
>   

OK, we may do the same split for the 'legacy_io_read' commands to be 
symmetric with the 'legacy_io_write', right ?

Yishai

>>
>>          + * @offset: starting byte offset within the registers to write to
>>          + * @size: size of the data to write
>>          + * @buf: buffer which holds the data
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>          +                                    u8 offset, u8 size, u8 *buf)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd_legacy_wr_data *data;
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist data_sg;
>>          +       int vf_id;
>>          +       int ret;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       vf_id = pci_iov_vf_id(pdev);
>>          +       if (vf_id < 0)
>>          +               return vf_id;
>>          +
>>          +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>>          +       if (!data)
>>          +               return -ENOMEM;
>>          +
>>          +       data->offset = offset;
>>          +       memcpy(data->registers, buf, size);
>>          +       sg_init_one(&data_sg, data, sizeof(*data) + size);
>>          +       cmd.opcode = cpu_to_le16(opcode);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>          +       cmd.data_sg = &data_sg;
>>          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +
>>          +       kfree(data);
>>          +       return ret;
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>>          +
>>          +/*
>>          + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>>          + * @dev: VF pci_dev
>>          + * @opcode: op code of the io read command
>>          + * @offset: starting byte offset within the registers to read from
>>          + * @size: size of the data to be read
>>          + * @buf: buffer to hold the returned data
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>          +                                   u8 offset, u8 size, u8 *buf)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd_legacy_rd_data *data;
>>          +       struct scatterlist data_sg, result_sg;
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       int vf_id;
>>          +       int ret;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       vf_id = pci_iov_vf_id(pdev);
>>          +       if (vf_id < 0)
>>          +               return vf_id;
>>          +
>>          +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>>          +       if (!data)
>>          +               return -ENOMEM;
>>          +
>>          +       data->offset = offset;
>>          +       sg_init_one(&data_sg, data, sizeof(*data));
>>          +       sg_init_one(&result_sg, buf, size);
>>          +       cmd.opcode = cpu_to_le16(opcode);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>          +       cmd.data_sg = &data_sg;
>>          +       cmd.result_sg = &result_sg;
>>          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +
>>          +       kfree(data);
>>          +       return ret;
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>>          +
>>          +/*
>>          + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>>          + * information for legacy interface
>>          + * @dev: VF pci_dev
>>          + * @req_bar_flags: requested bar flags
>>          + * @bar: on output the BAR number of the member device
>>          + * @bar_offset: on output the offset within bar
>>          + *
>>          + * Returns 0 on success, or negative on failure.
>>          + */
>>          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>          +                                          u8 req_bar_flags, u8 *bar,
>>          +                                          u64 *bar_offset)
>>          +{
>>          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>          +       struct virtio_admin_cmd_notify_info_result *result;
>>          +       struct virtio_admin_cmd cmd = {};
>>          +       struct scatterlist result_sg;
>>          +       int vf_id;
>>          +       int ret;
>>          +
>>          +       if (!virtio_dev)
>>          +               return -ENODEV;
>>          +
>>          +       vf_id = pci_iov_vf_id(pdev);
>>          +       if (vf_id < 0)
>>          +               return vf_id;
>>          +
>>          +       result = kzalloc(sizeof(*result), GFP_KERNEL);
>>          +       if (!result)
>>          +               return -ENOMEM;
>>          +
>>          +       sg_init_one(&result_sg, result, sizeof(*result));
>>          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>>          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>          +       cmd.result_sg = &result_sg;
>>          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>          +       if (!ret) {
>>          +               struct virtio_admin_cmd_notify_info_data *entry;
>>          +               int i;
>>          +
>>          +               ret = -ENOENT;
>>          +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>>          +                       entry = &result->entries[i];
>>          +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>>          +                               break;
>>          +                       if (entry->flags != req_bar_flags)
>>          +                               continue;
>>          +                       *bar = entry->bar;
>>          +                       *bar_offset = le64_to_cpu(entry->offset);
>>          +                       ret = 0;
>>          +                       break;
>>          +               }
>>          +       }
>>          +
>>          +       kfree(result);
>>          +       return ret;
>>          +}
>>          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>>          +
>>           static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>                  .get            = NULL,
>>                  .set            = NULL,
>>          diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>>          new file mode 100644
>>          index 000000000000..cb916a4bc1b1
>>          --- /dev/null
>>          +++ b/include/linux/virtio_pci_admin.h
>>          @@ -0,0 +1,18 @@
>>          +/* SPDX-License-Identifier: GPL-2.0 */
>>          +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>>          +#define _LINUX_VIRTIO_PCI_ADMIN_H
>>          +
>>          +#include <linux/types.h>
>>          +#include <linux/pci.h>
>>          +
>>          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>>          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>>          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>          +                                    u8 offset, u8 size, u8 *buf);
>>          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>          +                                   u8 offset, u8 size, u8 *buf);
>>          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>          +                                          u8 req_bar_flags, u8 *bar,
>>          +                                          u64 *bar_offset);
>>          +
>>          +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>>          --
>>          2.27.0
>>
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-25 13:00           ` Yishai Hadas via Virtualization
@ 2023-10-25 13:04             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 13:04 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Oct 25, 2023 at 04:00:43PM +0300, Yishai Hadas wrote:
> On 25/10/2023 13:17, Michael S. Tsirkin wrote:
> > On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
> > > On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> > > 
> > >      On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> > > 
> > >          Introduce APIs to execute legacy IO admin commands.
> > > 
> > >          It includes: list_query/use, io_legacy_read/write,
> > >          io_legacy_notify_info.
> > > 
> > >          Those APIs will be used by the next patches from this series.
> > > 
> > >          Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > >          ---
> > >           drivers/virtio/virtio_pci_common.c |  11 ++
> > >           drivers/virtio/virtio_pci_common.h |   2 +
> > >           drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
> > >           include/linux/virtio_pci_admin.h   |  18 +++
> > >           4 files changed, 237 insertions(+)
> > >           create mode 100644 include/linux/virtio_pci_admin.h
> > > 
> > >          diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > >          index 6b4766d5abe6..212d68401d2c 100644
> > >          --- a/drivers/virtio/virtio_pci_common.c
> > >          +++ b/drivers/virtio/virtio_pci_common.c
> > >          @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
> > >                  .sriov_configure = virtio_pci_sriov_configure,
> > >           };
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
> > >          +{
> > >          +       struct virtio_pci_device *pf_vp_dev;
> > >          +
> > >          +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
> > >          +       if (IS_ERR(pf_vp_dev))
> > >          +               return NULL;
> > >          +
> > >          +       return &pf_vp_dev->vdev;
> > >          +}
> > >          +
> > >           module_pci_driver(virtio_pci_driver);
> > > 
> > >           MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
> > >          diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > >          index a21b9ba01a60..2785e61ed668 100644
> > >          --- a/drivers/virtio/virtio_pci_common.h
> > >          +++ b/drivers/virtio/virtio_pci_common.h
> > >          @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
> > >           int virtio_pci_modern_probe(struct virtio_pci_device *);
> > >           void virtio_pci_modern_remove(struct virtio_pci_device *);
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> > >          +
> > >           #endif
> > >          diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > >          index cc159a8e6c70..00b65e20b2f5 100644
> > >          --- a/drivers/virtio/virtio_pci_modern.c
> > >          +++ b/drivers/virtio/virtio_pci_modern.c
> > >          @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
> > >                  vp_dev->del_vq(&vp_dev->admin_vq.info);
> > >           }
> > > 
> > >          +/*
> > >          + * virtio_pci_admin_list_query - Provides to driver list of commands
> > >          + * supported for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer to hold the returned list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&result_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.result_sg = &result_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_list_use - Provides to device list of commands
> > >          + * used for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer which holds the list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&data_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.data_sg = &data_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> > > 
> > >      list commands are actually for a group, not for the VF.
> > > 
> > > The VF was given to let the function gets the PF from it.
> > > 
> > > For now, the only existing 'group_type' in the spec is SRIOV, this is why we
> > > hard-coded it internally to match the VF PCI.
> > > 
> > > Alternatively,
> > > We can change the API to get the PF and 'group_type' from the caller to better
> > > match future usage.
> > > However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
> > > virtio-pci.
> > > 
> > > Do you prefer to change to the latter option ?
> > No, there are several points I wanted to make but this
> > was not one of them.
> > 
> > First, for query, I was trying to suggest changing the comment.
> > Something like:
> >           + * virtio_pci_admin_list_query - Provides to driver list of commands
> >           + * supported for the group including the given member device.
> >           + * @dev: member pci device.
> 
> Following your suggestion below, to issue inside virtio the query/use and
> keep its data internally (i.e. on the 'admin_queue' context).
> 
> We may suggest the below API for the upper-layers (e.g. vfio) to be
> exported.
> 
> bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)
> 
> It will find the PF from the VF and internally will check on the
> 'admin_queue' context whether the given 'cmds' input is supported.
> 
> Its output will be true/false.
> 
> Makes sense ?

I think I'd just return the commands. But not a big deal.


> > 	
> > 
> > 
> > Second, I don't think using buf/size  like this is necessary.
> > For now we have a small number of commands just work with u64.
> OK, just keep in mind that upon issuing the command towards the controller
> this still needs to be an allocated u64 data on the heap to work properly.
> > 
> > 
> > Third, while list could be an OK API, the use API does not
> > really work. If you call use with one set of parameters for
> > one VF and another for another then they conflict do they not?
> > 
> > So you need virtio core to do the list/use dance for you,
> > save the list of commands on the PF (which again is just u64 for now)
> > and vfio or vdpa or whatnot will just query that.
> > I hope I'm being clear.
> 
> In that case the virtio_pci_admin_list_query() and
> virtio_pci_admin_list_use() won't be exported any more and will be static in
> virtio-pci.
> 
> They will be called internally as part of activating the admin_queue and
> will simply get struct virtio_device* (the PF) instead of struct pci_dev
> *pdev.
> 
> > 
> > 
> > > 
> > > 
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io write command
> > > 
> > >      opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> > >      or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
> > > 
> > >      So please just add 2 APIs for this so users don't need to care.
> > >      Could be wrappers around these two things.
> > > 
> > > 
> > > OK.
> > > 
> > > We'll export the below 2 APIs [1] which internally will call
> > > virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
> > > 
> > > [1]virtio_pci_admin_legacy_device_io_write()
> > >       virtio_pci_admin_legacy_common_io_write()
> > > 
> > > Yishai
> > > 
> > Makes sense.
> 
> OK, we may do the same split for the 'legacy_io_read' commands to be
> symmetric with the 'legacy_io_write', right ?
> 
> Yishai
> 
> > > 
> > >          + * @offset: starting byte offset within the registers to write to
> > >          + * @size: size of the data to write
> > >          + * @buf: buffer which holds the data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_wr_data *data;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       memcpy(data->registers, buf, size);
> > >          +       sg_init_one(&data_sg, data, sizeof(*data) + size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io read command
> > >          + * @offset: starting byte offset within the registers to read from
> > >          + * @size: size of the data to be read
> > >          + * @buf: buffer to hold the returned data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_rd_data *data;
> > >          +       struct scatterlist data_sg, result_sg;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       sg_init_one(&data_sg, data, sizeof(*data));
> > >          +       sg_init_one(&result_sg, buf, size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
> > >          + * information for legacy interface
> > >          + * @dev: VF pci_dev
> > >          + * @req_bar_flags: requested bar flags
> > >          + * @bar: on output the BAR number of the member device
> > >          + * @bar_offset: on output the offset within bar
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_notify_info_result *result;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       result = kzalloc(sizeof(*result), GFP_KERNEL);
> > >          +       if (!result)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       sg_init_one(&result_sg, result, sizeof(*result));
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +       if (!ret) {
> > >          +               struct virtio_admin_cmd_notify_info_data *entry;
> > >          +               int i;
> > >          +
> > >          +               ret = -ENOENT;
> > >          +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > >          +                       entry = &result->entries[i];
> > >          +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > >          +                               break;
> > >          +                       if (entry->flags != req_bar_flags)
> > >          +                               continue;
> > >          +                       *bar = entry->bar;
> > >          +                       *bar_offset = le64_to_cpu(entry->offset);
> > >          +                       ret = 0;
> > >          +                       break;
> > >          +               }
> > >          +       }
> > >          +
> > >          +       kfree(result);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
> > >          +
> > >           static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > >                  .get            = NULL,
> > >                  .set            = NULL,
> > >          diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
> > >          new file mode 100644
> > >          index 000000000000..cb916a4bc1b1
> > >          --- /dev/null
> > >          +++ b/include/linux/virtio_pci_admin.h
> > >          @@ -0,0 +1,18 @@
> > >          +/* SPDX-License-Identifier: GPL-2.0 */
> > >          +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +#define _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +
> > >          +#include <linux/types.h>
> > >          +#include <linux/pci.h>
> > >          +
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset);
> > >          +
> > >          +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
> > >          --
> > >          2.27.0
> > > 
> > > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25 13:04             ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 13:04 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Oct 25, 2023 at 04:00:43PM +0300, Yishai Hadas wrote:
> On 25/10/2023 13:17, Michael S. Tsirkin wrote:
> > On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
> > > On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> > > 
> > >      On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> > > 
> > >          Introduce APIs to execute legacy IO admin commands.
> > > 
> > >          It includes: list_query/use, io_legacy_read/write,
> > >          io_legacy_notify_info.
> > > 
> > >          Those APIs will be used by the next patches from this series.
> > > 
> > >          Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > >          ---
> > >           drivers/virtio/virtio_pci_common.c |  11 ++
> > >           drivers/virtio/virtio_pci_common.h |   2 +
> > >           drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
> > >           include/linux/virtio_pci_admin.h   |  18 +++
> > >           4 files changed, 237 insertions(+)
> > >           create mode 100644 include/linux/virtio_pci_admin.h
> > > 
> > >          diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > >          index 6b4766d5abe6..212d68401d2c 100644
> > >          --- a/drivers/virtio/virtio_pci_common.c
> > >          +++ b/drivers/virtio/virtio_pci_common.c
> > >          @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
> > >                  .sriov_configure = virtio_pci_sriov_configure,
> > >           };
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
> > >          +{
> > >          +       struct virtio_pci_device *pf_vp_dev;
> > >          +
> > >          +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
> > >          +       if (IS_ERR(pf_vp_dev))
> > >          +               return NULL;
> > >          +
> > >          +       return &pf_vp_dev->vdev;
> > >          +}
> > >          +
> > >           module_pci_driver(virtio_pci_driver);
> > > 
> > >           MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
> > >          diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > >          index a21b9ba01a60..2785e61ed668 100644
> > >          --- a/drivers/virtio/virtio_pci_common.h
> > >          +++ b/drivers/virtio/virtio_pci_common.h
> > >          @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
> > >           int virtio_pci_modern_probe(struct virtio_pci_device *);
> > >           void virtio_pci_modern_remove(struct virtio_pci_device *);
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> > >          +
> > >           #endif
> > >          diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > >          index cc159a8e6c70..00b65e20b2f5 100644
> > >          --- a/drivers/virtio/virtio_pci_modern.c
> > >          +++ b/drivers/virtio/virtio_pci_modern.c
> > >          @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
> > >                  vp_dev->del_vq(&vp_dev->admin_vq.info);
> > >           }
> > > 
> > >          +/*
> > >          + * virtio_pci_admin_list_query - Provides to driver list of commands
> > >          + * supported for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer to hold the returned list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&result_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.result_sg = &result_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_list_use - Provides to device list of commands
> > >          + * used for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer which holds the list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&data_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.data_sg = &data_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> > > 
> > >      list commands are actually for a group, not for the VF.
> > > 
> > > The VF was given to let the function gets the PF from it.
> > > 
> > > For now, the only existing 'group_type' in the spec is SRIOV, this is why we
> > > hard-coded it internally to match the VF PCI.
> > > 
> > > Alternatively,
> > > We can change the API to get the PF and 'group_type' from the caller to better
> > > match future usage.
> > > However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
> > > virtio-pci.
> > > 
> > > Do you prefer to change to the latter option ?
> > No, there are several points I wanted to make but this
> > was not one of them.
> > 
> > First, for query, I was trying to suggest changing the comment.
> > Something like:
> >           + * virtio_pci_admin_list_query - Provides to driver list of commands
> >           + * supported for the group including the given member device.
> >           + * @dev: member pci device.
> 
> Following your suggestion below, to issue inside virtio the query/use and
> keep its data internally (i.e. on the 'admin_queue' context).
> 
> We may suggest the below API for the upper-layers (e.g. vfio) to be
> exported.
> 
> bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)
> 
> It will find the PF from the VF and internally will check on the
> 'admin_queue' context whether the given 'cmds' input is supported.
> 
> Its output will be true/false.
> 
> Makes sense ?

I think I'd just return the commands. But not a big deal.


> > 	
> > 
> > 
> > Second, I don't think using buf/size  like this is necessary.
> > For now we have a small number of commands just work with u64.
> OK, just keep in mind that upon issuing the command towards the controller
> this still needs to be an allocated u64 data on the heap to work properly.
> > 
> > 
> > Third, while list could be an OK API, the use API does not
> > really work. If you call use with one set of parameters for
> > one VF and another for another then they conflict do they not?
> > 
> > So you need virtio core to do the list/use dance for you,
> > save the list of commands on the PF (which again is just u64 for now)
> > and vfio or vdpa or whatnot will just query that.
> > I hope I'm being clear.
> 
> In that case the virtio_pci_admin_list_query() and
> virtio_pci_admin_list_use() won't be exported any more and will be static in
> virtio-pci.
> 
> They will be called internally as part of activating the admin_queue and
> will simply get struct virtio_device* (the PF) instead of struct pci_dev
> *pdev.
> 
> > 
> > 
> > > 
> > > 
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io write command
> > > 
> > >      opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> > >      or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
> > > 
> > >      So please just add 2 APIs for this so users don't need to care.
> > >      Could be wrappers around these two things.
> > > 
> > > 
> > > OK.
> > > 
> > > We'll export the below 2 APIs [1] which internally will call
> > > virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
> > > 
> > > [1]virtio_pci_admin_legacy_device_io_write()
> > >       virtio_pci_admin_legacy_common_io_write()
> > > 
> > > Yishai
> > > 
> > Makes sense.
> 
> OK, we may do the same split for the 'legacy_io_read' commands to be
> symmetric with the 'legacy_io_write', right ?
> 
> Yishai
> 
> > > 
> > >          + * @offset: starting byte offset within the registers to write to
> > >          + * @size: size of the data to write
> > >          + * @buf: buffer which holds the data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_wr_data *data;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       memcpy(data->registers, buf, size);
> > >          +       sg_init_one(&data_sg, data, sizeof(*data) + size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io read command
> > >          + * @offset: starting byte offset within the registers to read from
> > >          + * @size: size of the data to be read
> > >          + * @buf: buffer to hold the returned data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_rd_data *data;
> > >          +       struct scatterlist data_sg, result_sg;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       sg_init_one(&data_sg, data, sizeof(*data));
> > >          +       sg_init_one(&result_sg, buf, size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
> > >          + * information for legacy interface
> > >          + * @dev: VF pci_dev
> > >          + * @req_bar_flags: requested bar flags
> > >          + * @bar: on output the BAR number of the member device
> > >          + * @bar_offset: on output the offset within bar
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_notify_info_result *result;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       result = kzalloc(sizeof(*result), GFP_KERNEL);
> > >          +       if (!result)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       sg_init_one(&result_sg, result, sizeof(*result));
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +       if (!ret) {
> > >          +               struct virtio_admin_cmd_notify_info_data *entry;
> > >          +               int i;
> > >          +
> > >          +               ret = -ENOENT;
> > >          +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > >          +                       entry = &result->entries[i];
> > >          +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > >          +                               break;
> > >          +                       if (entry->flags != req_bar_flags)
> > >          +                               continue;
> > >          +                       *bar = entry->bar;
> > >          +                       *bar_offset = le64_to_cpu(entry->offset);
> > >          +                       ret = 0;
> > >          +                       break;
> > >          +               }
> > >          +       }
> > >          +
> > >          +       kfree(result);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
> > >          +
> > >           static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > >                  .get            = NULL,
> > >                  .set            = NULL,
> > >          diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
> > >          new file mode 100644
> > >          index 000000000000..cb916a4bc1b1
> > >          --- /dev/null
> > >          +++ b/include/linux/virtio_pci_admin.h
> > >          @@ -0,0 +1,18 @@
> > >          +/* SPDX-License-Identifier: GPL-2.0 */
> > >          +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +#define _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +
> > >          +#include <linux/types.h>
> > >          +#include <linux/pci.h>
> > >          +
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset);
> > >          +
> > >          +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
> > >          --
> > >          2.27.0
> > > 
> > > 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-25 13:00           ` Yishai Hadas via Virtualization
@ 2023-10-25 13:44             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 13:44 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Oct 25, 2023 at 04:00:43PM +0300, Yishai Hadas wrote:
> On 25/10/2023 13:17, Michael S. Tsirkin wrote:
> > On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
> > > On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> > > 
> > >      On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> > > 
> > >          Introduce APIs to execute legacy IO admin commands.
> > > 
> > >          It includes: list_query/use, io_legacy_read/write,
> > >          io_legacy_notify_info.
> > > 
> > >          Those APIs will be used by the next patches from this series.
> > > 
> > >          Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > >          ---
> > >           drivers/virtio/virtio_pci_common.c |  11 ++
> > >           drivers/virtio/virtio_pci_common.h |   2 +
> > >           drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
> > >           include/linux/virtio_pci_admin.h   |  18 +++
> > >           4 files changed, 237 insertions(+)
> > >           create mode 100644 include/linux/virtio_pci_admin.h
> > > 
> > >          diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > >          index 6b4766d5abe6..212d68401d2c 100644
> > >          --- a/drivers/virtio/virtio_pci_common.c
> > >          +++ b/drivers/virtio/virtio_pci_common.c
> > >          @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
> > >                  .sriov_configure = virtio_pci_sriov_configure,
> > >           };
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
> > >          +{
> > >          +       struct virtio_pci_device *pf_vp_dev;
> > >          +
> > >          +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
> > >          +       if (IS_ERR(pf_vp_dev))
> > >          +               return NULL;
> > >          +
> > >          +       return &pf_vp_dev->vdev;
> > >          +}
> > >          +
> > >           module_pci_driver(virtio_pci_driver);
> > > 
> > >           MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
> > >          diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > >          index a21b9ba01a60..2785e61ed668 100644
> > >          --- a/drivers/virtio/virtio_pci_common.h
> > >          +++ b/drivers/virtio/virtio_pci_common.h
> > >          @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
> > >           int virtio_pci_modern_probe(struct virtio_pci_device *);
> > >           void virtio_pci_modern_remove(struct virtio_pci_device *);
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> > >          +
> > >           #endif
> > >          diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > >          index cc159a8e6c70..00b65e20b2f5 100644
> > >          --- a/drivers/virtio/virtio_pci_modern.c
> > >          +++ b/drivers/virtio/virtio_pci_modern.c
> > >          @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
> > >                  vp_dev->del_vq(&vp_dev->admin_vq.info);
> > >           }
> > > 
> > >          +/*
> > >          + * virtio_pci_admin_list_query - Provides to driver list of commands
> > >          + * supported for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer to hold the returned list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&result_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.result_sg = &result_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_list_use - Provides to device list of commands
> > >          + * used for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer which holds the list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&data_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.data_sg = &data_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> > > 
> > >      list commands are actually for a group, not for the VF.
> > > 
> > > The VF was given to let the function gets the PF from it.
> > > 
> > > For now, the only existing 'group_type' in the spec is SRIOV, this is why we
> > > hard-coded it internally to match the VF PCI.
> > > 
> > > Alternatively,
> > > We can change the API to get the PF and 'group_type' from the caller to better
> > > match future usage.
> > > However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
> > > virtio-pci.
> > > 
> > > Do you prefer to change to the latter option ?
> > No, there are several points I wanted to make but this
> > was not one of them.
> > 
> > First, for query, I was trying to suggest changing the comment.
> > Something like:
> >           + * virtio_pci_admin_list_query - Provides to driver list of commands
> >           + * supported for the group including the given member device.
> >           + * @dev: member pci device.
> 
> Following your suggestion below, to issue inside virtio the query/use and
> keep its data internally (i.e. on the 'admin_queue' context).
> 
> We may suggest the below API for the upper-layers (e.g. vfio) to be
> exported.
> 
> bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)
> 
> It will find the PF from the VF and internally will check on the
> 'admin_queue' context whether the given 'cmds' input is supported.
> 
> Its output will be true/false.
> 
> Makes sense ?
> 
> > 	
> > 
> > 
> > Second, I don't think using buf/size  like this is necessary.
> > For now we have a small number of commands just work with u64.
> OK, just keep in mind that upon issuing the command towards the controller
> this still needs to be an allocated u64 data on the heap to work properly.
> > 
> > 
> > Third, while list could be an OK API, the use API does not
> > really work. If you call use with one set of parameters for
> > one VF and another for another then they conflict do they not?
> > 
> > So you need virtio core to do the list/use dance for you,
> > save the list of commands on the PF (which again is just u64 for now)
> > and vfio or vdpa or whatnot will just query that.
> > I hope I'm being clear.
> 
> In that case the virtio_pci_admin_list_query() and
> virtio_pci_admin_list_use() won't be exported any more and will be static in
> virtio-pci.
> 
> They will be called internally as part of activating the admin_queue and
> will simply get struct virtio_device* (the PF) instead of struct pci_dev
> *pdev.


Yes - I think some kind of API will be needed to setup/cleanup.
Then 1st call to setup would do the list/use dance? some ref counting?

And maybe the API should just be
bool virtio_pci_admin_has_legacy_io()



> > 
> > 
> > > 
> > > 
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io write command
> > > 
> > >      opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> > >      or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
> > > 
> > >      So please just add 2 APIs for this so users don't need to care.
> > >      Could be wrappers around these two things.
> > > 
> > > 
> > > OK.
> > > 
> > > We'll export the below 2 APIs [1] which internally will call
> > > virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
> > > 
> > > [1]virtio_pci_admin_legacy_device_io_write()
> > >       virtio_pci_admin_legacy_common_io_write()
> > > 
> > > Yishai
> > > 
> > Makes sense.
> 
> OK, we may do the same split for the 'legacy_io_read' commands to be
> symmetric with the 'legacy_io_write', right ?
> 
> Yishai

makes sense.

> > > 
> > >          + * @offset: starting byte offset within the registers to write to
> > >          + * @size: size of the data to write
> > >          + * @buf: buffer which holds the data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_wr_data *data;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       memcpy(data->registers, buf, size);
> > >          +       sg_init_one(&data_sg, data, sizeof(*data) + size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io read command
> > >          + * @offset: starting byte offset within the registers to read from
> > >          + * @size: size of the data to be read
> > >          + * @buf: buffer to hold the returned data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_rd_data *data;
> > >          +       struct scatterlist data_sg, result_sg;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       sg_init_one(&data_sg, data, sizeof(*data));
> > >          +       sg_init_one(&result_sg, buf, size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
> > >          + * information for legacy interface
> > >          + * @dev: VF pci_dev
> > >          + * @req_bar_flags: requested bar flags
> > >          + * @bar: on output the BAR number of the member device
> > >          + * @bar_offset: on output the offset within bar
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_notify_info_result *result;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       result = kzalloc(sizeof(*result), GFP_KERNEL);
> > >          +       if (!result)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       sg_init_one(&result_sg, result, sizeof(*result));
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +       if (!ret) {
> > >          +               struct virtio_admin_cmd_notify_info_data *entry;
> > >          +               int i;
> > >          +
> > >          +               ret = -ENOENT;
> > >          +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > >          +                       entry = &result->entries[i];
> > >          +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > >          +                               break;
> > >          +                       if (entry->flags != req_bar_flags)
> > >          +                               continue;
> > >          +                       *bar = entry->bar;
> > >          +                       *bar_offset = le64_to_cpu(entry->offset);
> > >          +                       ret = 0;
> > >          +                       break;
> > >          +               }
> > >          +       }
> > >          +
> > >          +       kfree(result);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
> > >          +
> > >           static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > >                  .get            = NULL,
> > >                  .set            = NULL,
> > >          diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
> > >          new file mode 100644
> > >          index 000000000000..cb916a4bc1b1
> > >          --- /dev/null
> > >          +++ b/include/linux/virtio_pci_admin.h
> > >          @@ -0,0 +1,18 @@
> > >          +/* SPDX-License-Identifier: GPL-2.0 */
> > >          +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +#define _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +
> > >          +#include <linux/types.h>
> > >          +#include <linux/pci.h>
> > >          +
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset);
> > >          +
> > >          +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
> > >          --
> > >          2.27.0
> > > 
> > > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25 13:44             ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 13:44 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Oct 25, 2023 at 04:00:43PM +0300, Yishai Hadas wrote:
> On 25/10/2023 13:17, Michael S. Tsirkin wrote:
> > On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
> > > On 25/10/2023 0:01, Michael S. Tsirkin wrote:
> > > 
> > >      On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
> > > 
> > >          Introduce APIs to execute legacy IO admin commands.
> > > 
> > >          It includes: list_query/use, io_legacy_read/write,
> > >          io_legacy_notify_info.
> > > 
> > >          Those APIs will be used by the next patches from this series.
> > > 
> > >          Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > >          ---
> > >           drivers/virtio/virtio_pci_common.c |  11 ++
> > >           drivers/virtio/virtio_pci_common.h |   2 +
> > >           drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
> > >           include/linux/virtio_pci_admin.h   |  18 +++
> > >           4 files changed, 237 insertions(+)
> > >           create mode 100644 include/linux/virtio_pci_admin.h
> > > 
> > >          diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > >          index 6b4766d5abe6..212d68401d2c 100644
> > >          --- a/drivers/virtio/virtio_pci_common.c
> > >          +++ b/drivers/virtio/virtio_pci_common.c
> > >          @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
> > >                  .sriov_configure = virtio_pci_sriov_configure,
> > >           };
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
> > >          +{
> > >          +       struct virtio_pci_device *pf_vp_dev;
> > >          +
> > >          +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
> > >          +       if (IS_ERR(pf_vp_dev))
> > >          +               return NULL;
> > >          +
> > >          +       return &pf_vp_dev->vdev;
> > >          +}
> > >          +
> > >           module_pci_driver(virtio_pci_driver);
> > > 
> > >           MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
> > >          diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
> > >          index a21b9ba01a60..2785e61ed668 100644
> > >          --- a/drivers/virtio/virtio_pci_common.h
> > >          +++ b/drivers/virtio/virtio_pci_common.h
> > >          @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
> > >           int virtio_pci_modern_probe(struct virtio_pci_device *);
> > >           void virtio_pci_modern_remove(struct virtio_pci_device *);
> > > 
> > >          +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
> > >          +
> > >           #endif
> > >          diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > >          index cc159a8e6c70..00b65e20b2f5 100644
> > >          --- a/drivers/virtio/virtio_pci_modern.c
> > >          +++ b/drivers/virtio/virtio_pci_modern.c
> > >          @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
> > >                  vp_dev->del_vq(&vp_dev->admin_vq.info);
> > >           }
> > > 
> > >          +/*
> > >          + * virtio_pci_admin_list_query - Provides to driver list of commands
> > >          + * supported for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer to hold the returned list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&result_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.result_sg = &result_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_list_use - Provides to device list of commands
> > >          + * used for the PCI VF.
> > >          + * @dev: VF pci_dev
> > >          + * @buf: buffer which holds the list
> > >          + * @buf_size: size of the given buffer
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       sg_init_one(&data_sg, buf, buf_size);
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.data_sg = &data_sg;
> > >          +
> > >          +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
> > > 
> > >      list commands are actually for a group, not for the VF.
> > > 
> > > The VF was given to let the function gets the PF from it.
> > > 
> > > For now, the only existing 'group_type' in the spec is SRIOV, this is why we
> > > hard-coded it internally to match the VF PCI.
> > > 
> > > Alternatively,
> > > We can change the API to get the PF and 'group_type' from the caller to better
> > > match future usage.
> > > However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
> > > virtio-pci.
> > > 
> > > Do you prefer to change to the latter option ?
> > No, there are several points I wanted to make but this
> > was not one of them.
> > 
> > First, for query, I was trying to suggest changing the comment.
> > Something like:
> >           + * virtio_pci_admin_list_query - Provides to driver list of commands
> >           + * supported for the group including the given member device.
> >           + * @dev: member pci device.
> 
> Following your suggestion below, to issue inside virtio the query/use and
> keep its data internally (i.e. on the 'admin_queue' context).
> 
> We may suggest the below API for the upper-layers (e.g. vfio) to be
> exported.
> 
> bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)
> 
> It will find the PF from the VF and internally will check on the
> 'admin_queue' context whether the given 'cmds' input is supported.
> 
> Its output will be true/false.
> 
> Makes sense ?
> 
> > 	
> > 
> > 
> > Second, I don't think using buf/size  like this is necessary.
> > For now we have a small number of commands just work with u64.
> OK, just keep in mind that upon issuing the command towards the controller
> this still needs to be an allocated u64 data on the heap to work properly.
> > 
> > 
> > Third, while list could be an OK API, the use API does not
> > really work. If you call use with one set of parameters for
> > one VF and another for another then they conflict do they not?
> > 
> > So you need virtio core to do the list/use dance for you,
> > save the list of commands on the PF (which again is just u64 for now)
> > and vfio or vdpa or whatnot will just query that.
> > I hope I'm being clear.
> 
> In that case the virtio_pci_admin_list_query() and
> virtio_pci_admin_list_use() won't be exported any more and will be static in
> virtio-pci.
> 
> They will be called internally as part of activating the admin_queue and
> will simply get struct virtio_device* (the PF) instead of struct pci_dev
> *pdev.


Yes - I think some kind of API will be needed to setup/cleanup.
Then 1st call to setup would do the list/use dance? some ref counting?

And maybe the API should just be
bool virtio_pci_admin_has_legacy_io()



> > 
> > 
> > > 
> > > 
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io write command
> > > 
> > >      opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
> > >      or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
> > > 
> > >      So please just add 2 APIs for this so users don't need to care.
> > >      Could be wrappers around these two things.
> > > 
> > > 
> > > OK.
> > > 
> > > We'll export the below 2 APIs [1] which internally will call
> > > virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
> > > 
> > > [1]virtio_pci_admin_legacy_device_io_write()
> > >       virtio_pci_admin_legacy_common_io_write()
> > > 
> > > Yishai
> > > 
> > Makes sense.
> 
> OK, we may do the same split for the 'legacy_io_read' commands to be
> symmetric with the 'legacy_io_write', right ?
> 
> Yishai

makes sense.

> > > 
> > >          + * @offset: starting byte offset within the registers to write to
> > >          + * @size: size of the data to write
> > >          + * @buf: buffer which holds the data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_wr_data *data;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist data_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       memcpy(data->registers, buf, size);
> > >          +       sg_init_one(&data_sg, data, sizeof(*data) + size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
> > >          + * @dev: VF pci_dev
> > >          + * @opcode: op code of the io read command
> > >          + * @offset: starting byte offset within the registers to read from
> > >          + * @size: size of the data to be read
> > >          + * @buf: buffer to hold the returned data
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_legacy_rd_data *data;
> > >          +       struct scatterlist data_sg, result_sg;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > >          +       if (!data)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       data->offset = offset;
> > >          +       sg_init_one(&data_sg, data, sizeof(*data));
> > >          +       sg_init_one(&result_sg, buf, size);
> > >          +       cmd.opcode = cpu_to_le16(opcode);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.data_sg = &data_sg;
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +
> > >          +       kfree(data);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
> > >          +
> > >          +/*
> > >          + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
> > >          + * information for legacy interface
> > >          + * @dev: VF pci_dev
> > >          + * @req_bar_flags: requested bar flags
> > >          + * @bar: on output the BAR number of the member device
> > >          + * @bar_offset: on output the offset within bar
> > >          + *
> > >          + * Returns 0 on success, or negative on failure.
> > >          + */
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset)
> > >          +{
> > >          +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > >          +       struct virtio_admin_cmd_notify_info_result *result;
> > >          +       struct virtio_admin_cmd cmd = {};
> > >          +       struct scatterlist result_sg;
> > >          +       int vf_id;
> > >          +       int ret;
> > >          +
> > >          +       if (!virtio_dev)
> > >          +               return -ENODEV;
> > >          +
> > >          +       vf_id = pci_iov_vf_id(pdev);
> > >          +       if (vf_id < 0)
> > >          +               return vf_id;
> > >          +
> > >          +       result = kzalloc(sizeof(*result), GFP_KERNEL);
> > >          +       if (!result)
> > >          +               return -ENOMEM;
> > >          +
> > >          +       sg_init_one(&result_sg, result, sizeof(*result));
> > >          +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
> > >          +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
> > >          +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
> > >          +       cmd.result_sg = &result_sg;
> > >          +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
> > >          +       if (!ret) {
> > >          +               struct virtio_admin_cmd_notify_info_data *entry;
> > >          +               int i;
> > >          +
> > >          +               ret = -ENOENT;
> > >          +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
> > >          +                       entry = &result->entries[i];
> > >          +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
> > >          +                               break;
> > >          +                       if (entry->flags != req_bar_flags)
> > >          +                               continue;
> > >          +                       *bar = entry->bar;
> > >          +                       *bar_offset = le64_to_cpu(entry->offset);
> > >          +                       ret = 0;
> > >          +                       break;
> > >          +               }
> > >          +       }
> > >          +
> > >          +       kfree(result);
> > >          +       return ret;
> > >          +}
> > >          +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
> > >          +
> > >           static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
> > >                  .get            = NULL,
> > >                  .set            = NULL,
> > >          diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
> > >          new file mode 100644
> > >          index 000000000000..cb916a4bc1b1
> > >          --- /dev/null
> > >          +++ b/include/linux/virtio_pci_admin.h
> > >          @@ -0,0 +1,18 @@
> > >          +/* SPDX-License-Identifier: GPL-2.0 */
> > >          +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +#define _LINUX_VIRTIO_PCI_ADMIN_H
> > >          +
> > >          +#include <linux/types.h>
> > >          +#include <linux/pci.h>
> > >          +
> > >          +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
> > >          +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
> > >          +                                    u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
> > >          +                                   u8 offset, u8 size, u8 *buf);
> > >          +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
> > >          +                                          u8 req_bar_flags, u8 *bar,
> > >          +                                          u64 *bar_offset);
> > >          +
> > >          +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
> > >          --
> > >          2.27.0
> > > 
> > > 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-25 13:44             ` Michael S. Tsirkin
@ 2023-10-25 14:03               ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-25 14:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On 25/10/2023 16:44, Michael S. Tsirkin wrote:
> On Wed, Oct 25, 2023 at 04:00:43PM +0300, Yishai Hadas wrote:
>> On 25/10/2023 13:17, Michael S. Tsirkin wrote:
>>> On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
>>>> On 25/10/2023 0:01, Michael S. Tsirkin wrote:
>>>>
>>>>       On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>>>>
>>>>           Introduce APIs to execute legacy IO admin commands.
>>>>
>>>>           It includes: list_query/use, io_legacy_read/write,
>>>>           io_legacy_notify_info.
>>>>
>>>>           Those APIs will be used by the next patches from this series.
>>>>
>>>>           Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>>           ---
>>>>            drivers/virtio/virtio_pci_common.c |  11 ++
>>>>            drivers/virtio/virtio_pci_common.h |   2 +
>>>>            drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>>>            include/linux/virtio_pci_admin.h   |  18 +++
>>>>            4 files changed, 237 insertions(+)
>>>>            create mode 100644 include/linux/virtio_pci_admin.h
>>>>
>>>>           diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>>>>           index 6b4766d5abe6..212d68401d2c 100644
>>>>           --- a/drivers/virtio/virtio_pci_common.c
>>>>           +++ b/drivers/virtio/virtio_pci_common.c
>>>>           @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>>>                   .sriov_configure = virtio_pci_sriov_configure,
>>>>            };
>>>>
>>>>           +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>>>>           +{
>>>>           +       struct virtio_pci_device *pf_vp_dev;
>>>>           +
>>>>           +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>>>>           +       if (IS_ERR(pf_vp_dev))
>>>>           +               return NULL;
>>>>           +
>>>>           +       return &pf_vp_dev->vdev;
>>>>           +}
>>>>           +
>>>>            module_pci_driver(virtio_pci_driver);
>>>>
>>>>            MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>>>>           diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>>>>           index a21b9ba01a60..2785e61ed668 100644
>>>>           --- a/drivers/virtio/virtio_pci_common.h
>>>>           +++ b/drivers/virtio/virtio_pci_common.h
>>>>           @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>>>            int virtio_pci_modern_probe(struct virtio_pci_device *);
>>>>            void virtio_pci_modern_remove(struct virtio_pci_device *);
>>>>
>>>>           +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>>>>           +
>>>>            #endif
>>>>           diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>>>>           index cc159a8e6c70..00b65e20b2f5 100644
>>>>           --- a/drivers/virtio/virtio_pci_modern.c
>>>>           +++ b/drivers/virtio/virtio_pci_modern.c
>>>>           @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>>>                   vp_dev->del_vq(&vp_dev->admin_vq.info);
>>>>            }
>>>>
>>>>           +/*
>>>>           + * virtio_pci_admin_list_query - Provides to driver list of commands
>>>>           + * supported for the PCI VF.
>>>>           + * @dev: VF pci_dev
>>>>           + * @buf: buffer to hold the returned list
>>>>           + * @buf_size: size of the given buffer
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist result_sg;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       sg_init_one(&result_sg, buf, buf_size);
>>>>           +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.result_sg = &result_sg;
>>>>           +
>>>>           +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_list_use - Provides to device list of commands
>>>>           + * used for the PCI VF.
>>>>           + * @dev: VF pci_dev
>>>>           + * @buf: buffer which holds the list
>>>>           + * @buf_size: size of the given buffer
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist data_sg;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       sg_init_one(&data_sg, buf, buf_size);
>>>>           +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.data_sg = &data_sg;
>>>>           +
>>>>           +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
>>>>
>>>>       list commands are actually for a group, not for the VF.
>>>>
>>>> The VF was given to let the function gets the PF from it.
>>>>
>>>> For now, the only existing 'group_type' in the spec is SRIOV, this is why we
>>>> hard-coded it internally to match the VF PCI.
>>>>
>>>> Alternatively,
>>>> We can change the API to get the PF and 'group_type' from the caller to better
>>>> match future usage.
>>>> However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
>>>> virtio-pci.
>>>>
>>>> Do you prefer to change to the latter option ?
>>> No, there are several points I wanted to make but this
>>> was not one of them.
>>>
>>> First, for query, I was trying to suggest changing the comment.
>>> Something like:
>>>            + * virtio_pci_admin_list_query - Provides to driver list of commands
>>>            + * supported for the group including the given member device.
>>>            + * @dev: member pci device.
>> Following your suggestion below, to issue inside virtio the query/use and
>> keep its data internally (i.e. on the 'admin_queue' context).
>>
>> We may suggest the below API for the upper-layers (e.g. vfio) to be
>> exported.
>>
>> bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)
>>
>> It will find the PF from the VF and internally will check on the
>> 'admin_queue' context whether the given 'cmds' input is supported.
>>
>> Its output will be true/false.
>>
>> Makes sense ?
>>
>>> 	
>>>
>>>
>>> Second, I don't think using buf/size  like this is necessary.
>>> For now we have a small number of commands just work with u64.
>> OK, just keep in mind that upon issuing the command towards the controller
>> this still needs to be an allocated u64 data on the heap to work properly.
>>>
>>> Third, while list could be an OK API, the use API does not
>>> really work. If you call use with one set of parameters for
>>> one VF and another for another then they conflict do they not?
>>>
>>> So you need virtio core to do the list/use dance for you,
>>> save the list of commands on the PF (which again is just u64 for now)
>>> and vfio or vdpa or whatnot will just query that.
>>> I hope I'm being clear.
>> In that case the virtio_pci_admin_list_query() and
>> virtio_pci_admin_list_use() won't be exported any more and will be static in
>> virtio-pci.
>>
>> They will be called internally as part of activating the admin_queue and
>> will simply get struct virtio_device* (the PF) instead of struct pci_dev
>> *pdev.
>
> Yes - I think some kind of API will be needed to setup/cleanup.
> Then 1st call to setup would do the list/use dance? some ref counting?

OK, we may work to come in V2 with that option in place.

Please note that the initialization 'list/use' commands would be done as 
part of the admin queue activation but we can't enable the admin queue 
for the upper layers before that it was done.

So, we may consider skipping the ref count set/get as part of the 
initialization flow with some flag/detection of the list/use commands as 
the ref count setting enables the admin queue for upper-layers which we 
would like to prevent by that time.

>
> And maybe the API should just be
> bool virtio_pci_admin_has_legacy_io()

This can work as well.

In that case, the API will just get the VF PCI to get from it the PF + 
'admin_queue' context and will check internally that all current 5 
legacy commands are supported.

Yishai

>
>
>
>>>
>>>>
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>>>>           + * @dev: VF pci_dev
>>>>           + * @opcode: op code of the io write command
>>>>
>>>>       opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
>>>>       or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>>>>
>>>>       So please just add 2 APIs for this so users don't need to care.
>>>>       Could be wrappers around these two things.
>>>>
>>>>
>>>> OK.
>>>>
>>>> We'll export the below 2 APIs [1] which internally will call
>>>> virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
>>>>
>>>> [1]virtio_pci_admin_legacy_device_io_write()
>>>>        virtio_pci_admin_legacy_common_io_write()
>>>>
>>>> Yishai
>>>>
>>> Makes sense.
>> OK, we may do the same split for the 'legacy_io_read' commands to be
>> symmetric with the 'legacy_io_write', right ?
>>
>> Yishai
> makes sense.
>
>>>>           + * @offset: starting byte offset within the registers to write to
>>>>           + * @size: size of the data to write
>>>>           + * @buf: buffer which holds the data
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>>>           +                                    u8 offset, u8 size, u8 *buf)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd_legacy_wr_data *data;
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist data_sg;
>>>>           +       int vf_id;
>>>>           +       int ret;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       vf_id = pci_iov_vf_id(pdev);
>>>>           +       if (vf_id < 0)
>>>>           +               return vf_id;
>>>>           +
>>>>           +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>>>>           +       if (!data)
>>>>           +               return -ENOMEM;
>>>>           +
>>>>           +       data->offset = offset;
>>>>           +       memcpy(data->registers, buf, size);
>>>>           +       sg_init_one(&data_sg, data, sizeof(*data) + size);
>>>>           +       cmd.opcode = cpu_to_le16(opcode);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>>>           +       cmd.data_sg = &data_sg;
>>>>           +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +
>>>>           +       kfree(data);
>>>>           +       return ret;
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>>>>           + * @dev: VF pci_dev
>>>>           + * @opcode: op code of the io read command
>>>>           + * @offset: starting byte offset within the registers to read from
>>>>           + * @size: size of the data to be read
>>>>           + * @buf: buffer to hold the returned data
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>>>           +                                   u8 offset, u8 size, u8 *buf)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd_legacy_rd_data *data;
>>>>           +       struct scatterlist data_sg, result_sg;
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       int vf_id;
>>>>           +       int ret;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       vf_id = pci_iov_vf_id(pdev);
>>>>           +       if (vf_id < 0)
>>>>           +               return vf_id;
>>>>           +
>>>>           +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>>>>           +       if (!data)
>>>>           +               return -ENOMEM;
>>>>           +
>>>>           +       data->offset = offset;
>>>>           +       sg_init_one(&data_sg, data, sizeof(*data));
>>>>           +       sg_init_one(&result_sg, buf, size);
>>>>           +       cmd.opcode = cpu_to_le16(opcode);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>>>           +       cmd.data_sg = &data_sg;
>>>>           +       cmd.result_sg = &result_sg;
>>>>           +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +
>>>>           +       kfree(data);
>>>>           +       return ret;
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>>>>           + * information for legacy interface
>>>>           + * @dev: VF pci_dev
>>>>           + * @req_bar_flags: requested bar flags
>>>>           + * @bar: on output the BAR number of the member device
>>>>           + * @bar_offset: on output the offset within bar
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>>>           +                                          u8 req_bar_flags, u8 *bar,
>>>>           +                                          u64 *bar_offset)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd_notify_info_result *result;
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist result_sg;
>>>>           +       int vf_id;
>>>>           +       int ret;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       vf_id = pci_iov_vf_id(pdev);
>>>>           +       if (vf_id < 0)
>>>>           +               return vf_id;
>>>>           +
>>>>           +       result = kzalloc(sizeof(*result), GFP_KERNEL);
>>>>           +       if (!result)
>>>>           +               return -ENOMEM;
>>>>           +
>>>>           +       sg_init_one(&result_sg, result, sizeof(*result));
>>>>           +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>>>           +       cmd.result_sg = &result_sg;
>>>>           +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +       if (!ret) {
>>>>           +               struct virtio_admin_cmd_notify_info_data *entry;
>>>>           +               int i;
>>>>           +
>>>>           +               ret = -ENOENT;
>>>>           +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>>>>           +                       entry = &result->entries[i];
>>>>           +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>>>>           +                               break;
>>>>           +                       if (entry->flags != req_bar_flags)
>>>>           +                               continue;
>>>>           +                       *bar = entry->bar;
>>>>           +                       *bar_offset = le64_to_cpu(entry->offset);
>>>>           +                       ret = 0;
>>>>           +                       break;
>>>>           +               }
>>>>           +       }
>>>>           +
>>>>           +       kfree(result);
>>>>           +       return ret;
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>>>>           +
>>>>            static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>>>                   .get            = NULL,
>>>>                   .set            = NULL,
>>>>           diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>>>>           new file mode 100644
>>>>           index 000000000000..cb916a4bc1b1
>>>>           --- /dev/null
>>>>           +++ b/include/linux/virtio_pci_admin.h
>>>>           @@ -0,0 +1,18 @@
>>>>           +/* SPDX-License-Identifier: GPL-2.0 */
>>>>           +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>>>>           +#define _LINUX_VIRTIO_PCI_ADMIN_H
>>>>           +
>>>>           +#include <linux/types.h>
>>>>           +#include <linux/pci.h>
>>>>           +
>>>>           +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>>>>           +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>>>>           +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>>>           +                                    u8 offset, u8 size, u8 *buf);
>>>>           +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>>>           +                                   u8 offset, u8 size, u8 *buf);
>>>>           +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>>>           +                                          u8 req_bar_flags, u8 *bar,
>>>>           +                                          u64 *bar_offset);
>>>>           +
>>>>           +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>>>>           --
>>>>           2.27.0
>>>>
>>>>


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25 14:03               ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-25 14:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On 25/10/2023 16:44, Michael S. Tsirkin wrote:
> On Wed, Oct 25, 2023 at 04:00:43PM +0300, Yishai Hadas wrote:
>> On 25/10/2023 13:17, Michael S. Tsirkin wrote:
>>> On Wed, Oct 25, 2023 at 12:18:32PM +0300, Yishai Hadas wrote:
>>>> On 25/10/2023 0:01, Michael S. Tsirkin wrote:
>>>>
>>>>       On Tue, Oct 17, 2023 at 04:42:14PM +0300, Yishai Hadas wrote:
>>>>
>>>>           Introduce APIs to execute legacy IO admin commands.
>>>>
>>>>           It includes: list_query/use, io_legacy_read/write,
>>>>           io_legacy_notify_info.
>>>>
>>>>           Those APIs will be used by the next patches from this series.
>>>>
>>>>           Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>>           ---
>>>>            drivers/virtio/virtio_pci_common.c |  11 ++
>>>>            drivers/virtio/virtio_pci_common.h |   2 +
>>>>            drivers/virtio/virtio_pci_modern.c | 206 +++++++++++++++++++++++++++++
>>>>            include/linux/virtio_pci_admin.h   |  18 +++
>>>>            4 files changed, 237 insertions(+)
>>>>            create mode 100644 include/linux/virtio_pci_admin.h
>>>>
>>>>           diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>>>>           index 6b4766d5abe6..212d68401d2c 100644
>>>>           --- a/drivers/virtio/virtio_pci_common.c
>>>>           +++ b/drivers/virtio/virtio_pci_common.c
>>>>           @@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
>>>>                   .sriov_configure = virtio_pci_sriov_configure,
>>>>            };
>>>>
>>>>           +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
>>>>           +{
>>>>           +       struct virtio_pci_device *pf_vp_dev;
>>>>           +
>>>>           +       pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
>>>>           +       if (IS_ERR(pf_vp_dev))
>>>>           +               return NULL;
>>>>           +
>>>>           +       return &pf_vp_dev->vdev;
>>>>           +}
>>>>           +
>>>>            module_pci_driver(virtio_pci_driver);
>>>>
>>>>            MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
>>>>           diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
>>>>           index a21b9ba01a60..2785e61ed668 100644
>>>>           --- a/drivers/virtio/virtio_pci_common.h
>>>>           +++ b/drivers/virtio/virtio_pci_common.h
>>>>           @@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct virtio_pci_device *vp_dev)
>>>>            int virtio_pci_modern_probe(struct virtio_pci_device *);
>>>>            void virtio_pci_modern_remove(struct virtio_pci_device *);
>>>>
>>>>           +struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
>>>>           +
>>>>            #endif
>>>>           diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>>>>           index cc159a8e6c70..00b65e20b2f5 100644
>>>>           --- a/drivers/virtio/virtio_pci_modern.c
>>>>           +++ b/drivers/virtio/virtio_pci_modern.c
>>>>           @@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device *vdev)
>>>>                   vp_dev->del_vq(&vp_dev->admin_vq.info);
>>>>            }
>>>>
>>>>           +/*
>>>>           + * virtio_pci_admin_list_query - Provides to driver list of commands
>>>>           + * supported for the PCI VF.
>>>>           + * @dev: VF pci_dev
>>>>           + * @buf: buffer to hold the returned list
>>>>           + * @buf_size: size of the given buffer
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist result_sg;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       sg_init_one(&result_sg, buf, buf_size);
>>>>           +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.result_sg = &result_sg;
>>>>           +
>>>>           +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_list_use - Provides to device list of commands
>>>>           + * used for the PCI VF.
>>>>           + * @dev: VF pci_dev
>>>>           + * @buf: buffer which holds the list
>>>>           + * @buf_size: size of the given buffer
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist data_sg;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       sg_init_one(&data_sg, buf, buf_size);
>>>>           +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.data_sg = &data_sg;
>>>>           +
>>>>           +       return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
>>>>
>>>>       list commands are actually for a group, not for the VF.
>>>>
>>>> The VF was given to let the function gets the PF from it.
>>>>
>>>> For now, the only existing 'group_type' in the spec is SRIOV, this is why we
>>>> hard-coded it internally to match the VF PCI.
>>>>
>>>> Alternatively,
>>>> We can change the API to get the PF and 'group_type' from the caller to better
>>>> match future usage.
>>>> However, this will require to export the virtio_pci_vf_get_pf_dev() API outside
>>>> virtio-pci.
>>>>
>>>> Do you prefer to change to the latter option ?
>>> No, there are several points I wanted to make but this
>>> was not one of them.
>>>
>>> First, for query, I was trying to suggest changing the comment.
>>> Something like:
>>>            + * virtio_pci_admin_list_query - Provides to driver list of commands
>>>            + * supported for the group including the given member device.
>>>            + * @dev: member pci device.
>> Following your suggestion below, to issue inside virtio the query/use and
>> keep its data internally (i.e. on the 'admin_queue' context).
>>
>> We may suggest the below API for the upper-layers (e.g. vfio) to be
>> exported.
>>
>> bool virtio_pci_admin_supported_cmds(struct pci_dev *pdev, u64 cmds)
>>
>> It will find the PF from the VF and internally will check on the
>> 'admin_queue' context whether the given 'cmds' input is supported.
>>
>> Its output will be true/false.
>>
>> Makes sense ?
>>
>>> 	
>>>
>>>
>>> Second, I don't think using buf/size  like this is necessary.
>>> For now we have a small number of commands just work with u64.
>> OK, just keep in mind that upon issuing the command towards the controller
>> this still needs to be an allocated u64 data on the heap to work properly.
>>>
>>> Third, while list could be an OK API, the use API does not
>>> really work. If you call use with one set of parameters for
>>> one VF and another for another then they conflict do they not?
>>>
>>> So you need virtio core to do the list/use dance for you,
>>> save the list of commands on the PF (which again is just u64 for now)
>>> and vfio or vdpa or whatnot will just query that.
>>> I hope I'm being clear.
>> In that case the virtio_pci_admin_list_query() and
>> virtio_pci_admin_list_use() won't be exported any more and will be static in
>> virtio-pci.
>>
>> They will be called internally as part of activating the admin_queue and
>> will simply get struct virtio_device* (the PF) instead of struct pci_dev
>> *pdev.
>
> Yes - I think some kind of API will be needed to setup/cleanup.
> Then 1st call to setup would do the list/use dance? some ref counting?

OK, we may work to come in V2 with that option in place.

Please note that the initialization 'list/use' commands would be done as 
part of the admin queue activation but we can't enable the admin queue 
for the upper layers before that it was done.

So, we may consider skipping the ref count set/get as part of the 
initialization flow with some flag/detection of the list/use commands as 
the ref count setting enables the admin queue for upper-layers which we 
would like to prevent by that time.

>
> And maybe the API should just be
> bool virtio_pci_admin_has_legacy_io()

This can work as well.

In that case, the API will just get the VF PCI to get from it the PF + 
'admin_queue' context and will check internally that all current 5 
legacy commands are supported.

Yishai

>
>
>
>>>
>>>>
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
>>>>           + * @dev: VF pci_dev
>>>>           + * @opcode: op code of the io write command
>>>>
>>>>       opcode is actually either VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE
>>>>       or VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE correct?
>>>>
>>>>       So please just add 2 APIs for this so users don't need to care.
>>>>       Could be wrappers around these two things.
>>>>
>>>>
>>>> OK.
>>>>
>>>> We'll export the below 2 APIs [1] which internally will call
>>>> virtio_pci_admin_legacy_io_write() with the proper op code hard-coded.
>>>>
>>>> [1]virtio_pci_admin_legacy_device_io_write()
>>>>        virtio_pci_admin_legacy_common_io_write()
>>>>
>>>> Yishai
>>>>
>>> Makes sense.
>> OK, we may do the same split for the 'legacy_io_read' commands to be
>> symmetric with the 'legacy_io_write', right ?
>>
>> Yishai
> makes sense.
>
>>>>           + * @offset: starting byte offset within the registers to write to
>>>>           + * @size: size of the data to write
>>>>           + * @buf: buffer which holds the data
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>>>           +                                    u8 offset, u8 size, u8 *buf)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd_legacy_wr_data *data;
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist data_sg;
>>>>           +       int vf_id;
>>>>           +       int ret;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       vf_id = pci_iov_vf_id(pdev);
>>>>           +       if (vf_id < 0)
>>>>           +               return vf_id;
>>>>           +
>>>>           +       data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
>>>>           +       if (!data)
>>>>           +               return -ENOMEM;
>>>>           +
>>>>           +       data->offset = offset;
>>>>           +       memcpy(data->registers, buf, size);
>>>>           +       sg_init_one(&data_sg, data, sizeof(*data) + size);
>>>>           +       cmd.opcode = cpu_to_le16(opcode);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>>>           +       cmd.data_sg = &data_sg;
>>>>           +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +
>>>>           +       kfree(data);
>>>>           +       return ret;
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_write);
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_legacy_io_read - Read legacy registers of a member device
>>>>           + * @dev: VF pci_dev
>>>>           + * @opcode: op code of the io read command
>>>>           + * @offset: starting byte offset within the registers to read from
>>>>           + * @size: size of the data to be read
>>>>           + * @buf: buffer to hold the returned data
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>>>           +                                   u8 offset, u8 size, u8 *buf)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd_legacy_rd_data *data;
>>>>           +       struct scatterlist data_sg, result_sg;
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       int vf_id;
>>>>           +       int ret;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       vf_id = pci_iov_vf_id(pdev);
>>>>           +       if (vf_id < 0)
>>>>           +               return vf_id;
>>>>           +
>>>>           +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>>>>           +       if (!data)
>>>>           +               return -ENOMEM;
>>>>           +
>>>>           +       data->offset = offset;
>>>>           +       sg_init_one(&data_sg, data, sizeof(*data));
>>>>           +       sg_init_one(&result_sg, buf, size);
>>>>           +       cmd.opcode = cpu_to_le16(opcode);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>>>           +       cmd.data_sg = &data_sg;
>>>>           +       cmd.result_sg = &result_sg;
>>>>           +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +
>>>>           +       kfree(data);
>>>>           +       return ret;
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_read);
>>>>           +
>>>>           +/*
>>>>           + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification
>>>>           + * information for legacy interface
>>>>           + * @dev: VF pci_dev
>>>>           + * @req_bar_flags: requested bar flags
>>>>           + * @bar: on output the BAR number of the member device
>>>>           + * @bar_offset: on output the offset within bar
>>>>           + *
>>>>           + * Returns 0 on success, or negative on failure.
>>>>           + */
>>>>           +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>>>           +                                          u8 req_bar_flags, u8 *bar,
>>>>           +                                          u64 *bar_offset)
>>>>           +{
>>>>           +       struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
>>>>           +       struct virtio_admin_cmd_notify_info_result *result;
>>>>           +       struct virtio_admin_cmd cmd = {};
>>>>           +       struct scatterlist result_sg;
>>>>           +       int vf_id;
>>>>           +       int ret;
>>>>           +
>>>>           +       if (!virtio_dev)
>>>>           +               return -ENODEV;
>>>>           +
>>>>           +       vf_id = pci_iov_vf_id(pdev);
>>>>           +       if (vf_id < 0)
>>>>           +               return vf_id;
>>>>           +
>>>>           +       result = kzalloc(sizeof(*result), GFP_KERNEL);
>>>>           +       if (!result)
>>>>           +               return -ENOMEM;
>>>>           +
>>>>           +       sg_init_one(&result_sg, result, sizeof(*result));
>>>>           +       cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO);
>>>>           +       cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
>>>>           +       cmd.group_member_id = cpu_to_le64(vf_id + 1);
>>>>           +       cmd.result_sg = &result_sg;
>>>>           +       ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd);
>>>>           +       if (!ret) {
>>>>           +               struct virtio_admin_cmd_notify_info_data *entry;
>>>>           +               int i;
>>>>           +
>>>>           +               ret = -ENOENT;
>>>>           +               for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) {
>>>>           +                       entry = &result->entries[i];
>>>>           +                       if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END)
>>>>           +                               break;
>>>>           +                       if (entry->flags != req_bar_flags)
>>>>           +                               continue;
>>>>           +                       *bar = entry->bar;
>>>>           +                       *bar_offset = le64_to_cpu(entry->offset);
>>>>           +                       ret = 0;
>>>>           +                       break;
>>>>           +               }
>>>>           +       }
>>>>           +
>>>>           +       kfree(result);
>>>>           +       return ret;
>>>>           +}
>>>>           +EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
>>>>           +
>>>>            static const struct virtio_config_ops virtio_pci_config_nodev_ops = {
>>>>                   .get            = NULL,
>>>>                   .set            = NULL,
>>>>           diff --git a/include/linux/virtio_pci_admin.h b/include/linux/virtio_pci_admin.h
>>>>           new file mode 100644
>>>>           index 000000000000..cb916a4bc1b1
>>>>           --- /dev/null
>>>>           +++ b/include/linux/virtio_pci_admin.h
>>>>           @@ -0,0 +1,18 @@
>>>>           +/* SPDX-License-Identifier: GPL-2.0 */
>>>>           +#ifndef _LINUX_VIRTIO_PCI_ADMIN_H
>>>>           +#define _LINUX_VIRTIO_PCI_ADMIN_H
>>>>           +
>>>>           +#include <linux/types.h>
>>>>           +#include <linux/pci.h>
>>>>           +
>>>>           +int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size);
>>>>           +int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size);
>>>>           +int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
>>>>           +                                    u8 offset, u8 size, u8 *buf);
>>>>           +int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
>>>>           +                                   u8 offset, u8 size, u8 *buf);
>>>>           +int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
>>>>           +                                          u8 req_bar_flags, u8 *bar,
>>>>           +                                          u64 *bar_offset);
>>>>           +
>>>>           +#endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
>>>>           --
>>>>           2.27.0
>>>>
>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-24 19:57     ` Alex Williamson
@ 2023-10-25 14:35       ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-25 14:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On 24/10/2023 22:57, Alex Williamson wrote:
> On Tue, 17 Oct 2023 16:42:17 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Introduce a vfio driver over virtio devices to support the legacy
>> interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> Specifically, this driver adds support for a virtio-net VF to be exposed
>> as a transitional device to a guest driver and allows the legacy IO BAR
>> functionality on top.
>>
>> This allows a VM which uses a legacy virtio-net driver in the guest to
>> work transparently over a VF which its driver in the host is that new
>> driver.
>>
>> The driver can be extended easily to support some other types of virtio
>> devices (e.g virtio-blk), by adding in a few places the specific type
>> properties as was done for virtio-net.
>>
>> For now, only the virtio-net use case was tested and as such we introduce
>> the support only for such a device.
>>
>> Practically,
>> Upon probing a VF for a virtio-net device, in case its PF supports
>> legacy access over the virtio admin commands and the VF doesn't have BAR
>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>> transitional device with I/O BAR in BAR 0.
>>
>> The existence of the simulated I/O bar is reported later on by
>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>> exposes itself as a transitional device by overwriting some properties
>> upon reading its config space.
>>
>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>> guest may use it via read/write calls according to the virtio
>> specification.
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>> probing/init flow.
>>
>> With that code in place a legacy driver in the guest has the look and
>> feel as if having a transitional device with legacy support for both its
>> control and data path flows.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   MAINTAINERS                      |   7 +
>>   drivers/vfio/pci/Kconfig         |   2 +
>>   drivers/vfio/pci/Makefile        |   2 +
>>   drivers/vfio/pci/virtio/Kconfig  |  15 +
>>   drivers/vfio/pci/virtio/Makefile |   4 +
>>   drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
>>   6 files changed, 607 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 7a7bd8bd80e9..680a70063775 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
>>   S:	Maintained
>>   F:	drivers/vfio/pci/mlx5/
>>   
>> +VFIO VIRTIO PCI DRIVER
>> +M:	Yishai Hadas <yishaih@nvidia.com>
>> +L:	kvm@vger.kernel.org
>> +L:	virtualization@lists.linux-foundation.org
>> +S:	Maintained
>> +F:	drivers/vfio/pci/virtio
>> +
>>   VFIO PCI DEVICE SPECIFIC DRIVERS
>>   R:	Jason Gunthorpe <jgg@nvidia.com>
>>   R:	Yishai Hadas <yishaih@nvidia.com>
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 8125e5f37832..18c397df566d 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>   
>>   source "drivers/vfio/pci/pds/Kconfig"
>>   
>> +source "drivers/vfio/pci/virtio/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>> index 45167be462d8..046139a4eca5 100644
>> --- a/drivers/vfio/pci/Makefile
>> +++ b/drivers/vfio/pci/Makefile
>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>   
>>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>> +
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>> new file mode 100644
>> index 000000000000..89eddce8b1bd
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO VF devices using the VFIO
>> +          framework that can work with a legacy virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
>> +          As of that this driver emulated I/O BAR in software to let a VF be
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
> This description is a little bit subtle to the hard requirements on the
> device.  Reading this, one might think that this should work for any
> SR-IOV VF virtio device, when in reality it only support virtio-net
> currently and places a number of additional requirements on the device
> (ex. legacy access and MSI-X support).

Sure, will change to refer only to virtio-net devices which are capable 
for 'legacy access'.

No need to refer to MSI-X, please see below.

>
>> +
>> +          If you don't know what to do here, say N.
>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>> new file mode 100644
>> index 000000000000..2039b39fb723
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>> +virtio-vfio-pci-y := main.o
>> +
>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>> new file mode 100644
>> index 000000000000..3fef4b21f7e6
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/main.c
>> @@ -0,0 +1,577 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +#include <linux/virtio_net.h>
>> +#include <linux/virtio_pci_admin.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	u8 bar0_virtual_buf_size;
>> +	u8 *bar0_virtual_buf;
>> +	/* synchronize access to the virtual buf */
>> +	struct mutex bar_mutex;
>> +	void __iomem *notify_addr;
>> +	u32 notify_offset;
>> +	u8 notify_bar;
> Push the above u8 to the end of the structure for better packing.
OK
>> +	u16 pci_cmd;
>> +	u16 msix_ctrl;
>> +};
>> +
>> +static int
>> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
>> +			     loff_t pos, char __user *buf,
>> +			     size_t count, bool read)
>> +{
>> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>> +	u16 opcode;
>> +	int ret;
>> +
>> +	mutex_lock(&virtvdev->bar_mutex);
>> +	if (read) {
>> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
>> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
>> +						      bar0_buf + pos);
>> +		if (ret)
>> +			goto out;
>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>> +			ret = -EFAULT;
>> +		goto out;
>> +	}
> TBH, I think the symmetry of read vs write would be more apparent if
> this were an else branch.
OK, will do.
>> +
>> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
>> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
>> +					       bar0_buf + pos);
>> +out:
>> +	mutex_unlock(&virtvdev->bar_mutex);
>> +	return ret;
>> +}
>> +
>> +static int
>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>> +			    loff_t pos, char __user *buf,
>> +			    size_t count, bool read)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	u16 queue_notify;
>> +	int ret;
>> +
>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>> +		return -EINVAL;
>> +
>> +	switch (pos) {
>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>> +		if (count != sizeof(queue_notify))
>> +			return -EINVAL;
>> +		if (read) {
>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>> +						virtvdev->notify_addr);
>> +			if (ret)
>> +				return ret;
>> +			if (copy_to_user(buf, &queue_notify,
>> +					 sizeof(queue_notify)))
>> +				return -EFAULT;
>> +			break;
>> +		}
> Same.
OK
>> +
>> +		if (copy_from_user(&queue_notify, buf, count))
>> +			return -EFAULT;
>> +
>> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>> +					 virtvdev->notify_addr);
>> +		break;
>> +	default:
>> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
>> +						   read);
>> +	}
>> +
>> +	return ret ? ret : count;
>> +}
>> +
>> +static bool range_intersect_range(loff_t range1_start, size_t count1,
>> +				  loff_t range2_start, size_t count2,
>> +				  loff_t *start_offset,
>> +				  size_t *intersect_count,
>> +				  size_t *register_offset)
>> +{
>> +	if (range1_start <= range2_start &&
>> +	    range1_start + count1 > range2_start) {
>> +		*start_offset = range2_start - range1_start;
>> +		*intersect_count = min_t(size_t, count2,
>> +					 range1_start + count1 - range2_start);
>> +		if (register_offset)
>> +			*register_offset = 0;
>> +		return true;
>> +	}
>> +
>> +	if (range1_start > range2_start &&
>> +	    range1_start < range2_start + count2) {
>> +		*start_offset = range1_start;
>> +		*intersect_count = min_t(size_t, count1,
>> +					 range2_start + count2 - range1_start);
>> +		if (register_offset)
>> +			*register_offset = range1_start - range2_start;
>> +		return true;
>> +	}
> Seems like we're missing a case, and some documentation.
>
> The first test requires range1 to fully enclose range2 and provides the
> offset of range2 within range1 and the length of the intersection.
>
> The second test requires range1 to start from a non-zero offset within
> range2 and returns the absolute offset of range1 and the length of the
> intersection.
>
> The register offset is then non-zero offset of range1 into range2.  So
> does the caller use the zero value in the previous test to know range2
> exists within range1?
>
> We miss the cases where range1_start is <= range2_start and range1
> terminates within range2.

The first test should cover this case as well of the case of fully 
enclosing.

It checks whether range1_start + count1 > range2_start which can 
terminates also within range2.

Isn't it ?

I may add some documentation for that function as part of V2 as you asked.

>   I suppose we'll see below how this is used,
> but it seems asymmetric and incomplete.
>
>> +
>> +	return false;
>> +}
>> +
>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>> +					char __user *buf, size_t count,
>> +					loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	size_t register_offset;
>> +	loff_t copy_offset;
>> +	size_t copy_count;
>> +	__le32 val32;
>> +	__le16 val16;
>> +	u8 val8;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, NULL)) {
> If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
> within range2) they'll not enter this branch and see 41 rather than 00.
>
> If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
> contained within range 2), the above function returns a copy_offset of
> range1_start (ie. 3).  But that offset is applied to the buffer, which
> is out of bounds.  The function needs to have returned an offset of 1
> and it should have applied to the val16 address.
>
> I don't think this works like it's intended.

Is that because of the missing case ?
Please see my note above.

>
>
>> +		val16 = cpu_to_le16(0x1000);
> Please #define this somewhere rather than hiding a magic value here.
Sure, will just replace to VIRTIO_TRANS_ID_NET.
>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
>> +				   copy_count))
>> +			return -EFAULT;
>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>> +				 copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>> +				  &copy_offset, &copy_count, NULL)) {
>> +		/* Transional needs to have revision 0 */
>> +		val8 = 0;
>> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>> +				  &copy_offset, &copy_count, NULL)) {
>> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> I'd still like to see the remainder of the BAR follow the semantics
> vfio-pci does.  I think this requires a __le32 bar0 field on the
> virtvdev struct to store writes and the read here would mask the lower
> bits up to the BAR size and OR in the IO indicator bit.

OK, will do.

>
>
>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, NULL)) {
>> +		/*
>> +		 * Transitional devices use the PCI subsystem device id as
>> +		 * virtio device id, same as legacy driver always did.
> Where did we require the subsystem vendor ID to be 0x1af4?  This
> subsystem device ID really only makes since given that subsystem
> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> such as the VF, have a hard requirement per the spec for the subsystem
> vendor ID.
>
> Do we want to make this only probe the correct subsystem vendor ID or do
> we want to emulate the subsystem vendor ID as well?  I don't see this is
> correct without one of those options.

Looking in the 1.x spec we can see the below.

Legacy Interfaces: A Note on PCI Device Discovery

"Transitional devices MUST have the PCI Subsystem
Device ID matching the Virtio Device ID, as indicated in section 5 ...
This is to match legacy drivers."

However, there is no need to enforce Subsystem Vendor ID.

This is what we followed here.

Makes sense ?

>> +		 */
>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>> +		       size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>> +				     ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>> +			size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>> +		size_t register_offset;
>> +		loff_t copy_offset;
>> +		size_t copy_count;
>> +
>> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
>> +
>> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
>> +					  sizeof(virtvdev->msix_ctrl),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
> MSI-X is setup via ioctl, so you're relying on a userspace that writes
> through the control register bit even though it doesn't do anything.
> Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
> enabled?
OK, may switch to your suggestion post of testing it.
>
>> +	}
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static int
>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>> +				   unsigned int cmd, unsigned long arg)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>> +	void __user *uarg = (void __user *)arg;
>> +	struct vfio_region_info info = {};
>> +
>> +	if (copy_from_user(&info, uarg, minsz))
>> +		return -EFAULT;
>> +
>> +	if (info.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	switch (info.index) {
>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>> +		info.size = virtvdev->bar0_virtual_buf_size;
>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static long
>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>> +			     unsigned long arg)
>> +{
>> +	switch (cmd) {
>> +	case VFIO_DEVICE_GET_REGION_INFO:
>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static int
>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	/*
>> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
>> +	 * This will let us mmap it only once and use it when needed.
>> +	 */
>> +	ret = vfio_pci_core_setup_barmap(core_device,
>> +					 virtvdev->notify_bar);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>> +			virtvdev->notify_offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_enable(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (virtvdev->bar0_virtual_buf) {
>> +		/*
>> +		 * Upon close_device() the vfio_pci_core_disable() is called
>> +		 * and will close all the previous mmaps, so it seems that the
>> +		 * valid life cycle for the 'notify' addr is per open/close.
>> +		 */
>> +		ret = virtiovf_set_notify_addr(virtvdev);
>> +		if (ret) {
>> +			vfio_pci_core_disable(vdev);
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	vfio_pci_core_finish_enable(vdev);
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_get_device_config_size(unsigned short device)
>> +{
>> +	/* Network card */
>> +	return offsetofend(struct virtio_net_config, status);
>> +}
>> +
>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	u64 offset;
>> +	int ret;
>> +	u8 bar;
>> +
>> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>> +				&bar, &offset);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_bar = bar;
>> +	virtvdev->notify_offset = offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_init_dev(core_vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = virtvdev->core_device.pdev;
>> +	ret = virtiovf_read_notify_info(virtvdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Being ready with a buffer that supports MSIX */
>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>> +				virtiovf_get_device_config_size(pdev->device);
>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>> +					     GFP_KERNEL);
>> +	if (!virtvdev->bar0_virtual_buf)
>> +		return -ENOMEM;
>> +	mutex_init(&virtvdev->bar_mutex);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +
>> +	kfree(virtvdev->bar0_virtual_buf);
>> +	vfio_pci_core_release_dev(core_vdev);
>> +}
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
>> +	.name = "virtio-transitional-vfio-pci",
>> +	.init = virtiovf_pci_init_device,
>> +	.release = virtiovf_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = vfio_pci_core_close_device,
>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>> +	.read = virtiovf_pci_core_read,
>> +	.write = virtiovf_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
>> +	.name = "virtio-acc-vfio-pci",
>> +	.init = vfio_pci_core_init_dev,
>> +	.release = vfio_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = vfio_pci_core_close_device,
>> +	.ioctl = vfio_pci_core_ioctl,
>> +	.device_feature = vfio_pci_core_ioctl_feature,
>> +	.read = vfio_pci_core_read,
>> +	.write = vfio_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>> +{
>> +	struct resource *res = pdev->resource;
>> +
>> +	return res->flags ? true : false;
>> +}
>> +
>> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
>> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>> +
>> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
>> +{
>> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
>> +	u8 *buf;
>> +	int ret;
>> +
>> +	buf = kzalloc(buf_size, GFP_KERNEL);
>> +	if (!buf)
>> +		return false;
>> +
>> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
>> +	if (ret)
>> +		goto end;
>> +
>> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
>> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
>> +		ret = -EOPNOTSUPP;
>> +		goto end;
>> +	}
>> +
>> +	/* Confirm the used commands */
>> +	memset(buf, 0, buf_size);
>> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
>> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
>> +end:
>> +	kfree(buf);
>> +	return ret ? false : true;
>> +}
>> +
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>
> All but the last test here are fairly evident requirements of the
> driver.  Why do we require a device that supports MSI-X?

As now we check at run time to decide whether MSI-X is enabled/disabled 
to pick-up the correct op code, no need for that any more.

Will drop this MSI-X check from V2.

Thanks,
Yishai

>
> Thanks,
> Alex
>
>
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	/* Only virtio-net is supported/tested so far */
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-25 14:35       ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-25 14:35 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On 24/10/2023 22:57, Alex Williamson wrote:
> On Tue, 17 Oct 2023 16:42:17 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Introduce a vfio driver over virtio devices to support the legacy
>> interface functionality for VFs.
>>
>> Background, from the virtio spec [1].
>> --------------------------------------------------------------------
>> In some systems, there is a need to support a virtio legacy driver with
>> a device that does not directly support the legacy interface. In such
>> scenarios, a group owner device can provide the legacy interface
>> functionality for the group member devices. The driver of the owner
>> device can then access the legacy interface of a member device on behalf
>> of the legacy member device driver.
>>
>> For example, with the SR-IOV group type, group members (VFs) can not
>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>> legacy pci driver. If the legacy driver is running inside a virtual
>> machine, the hypervisor executing the virtual machine can present a
>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>> legacy driver accesses to this I/O BAR and forwards them to the group
>> owner device (PF) using group administration commands.
>> --------------------------------------------------------------------
>>
>> Specifically, this driver adds support for a virtio-net VF to be exposed
>> as a transitional device to a guest driver and allows the legacy IO BAR
>> functionality on top.
>>
>> This allows a VM which uses a legacy virtio-net driver in the guest to
>> work transparently over a VF which its driver in the host is that new
>> driver.
>>
>> The driver can be extended easily to support some other types of virtio
>> devices (e.g virtio-blk), by adding in a few places the specific type
>> properties as was done for virtio-net.
>>
>> For now, only the virtio-net use case was tested and as such we introduce
>> the support only for such a device.
>>
>> Practically,
>> Upon probing a VF for a virtio-net device, in case its PF supports
>> legacy access over the virtio admin commands and the VF doesn't have BAR
>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>> transitional device with I/O BAR in BAR 0.
>>
>> The existence of the simulated I/O bar is reported later on by
>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>> exposes itself as a transitional device by overwriting some properties
>> upon reading its config space.
>>
>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>> guest may use it via read/write calls according to the virtio
>> specification.
>>
>> Any read/write towards the control parts of the BAR will be captured by
>> the new driver and will be translated into admin commands towards the
>> device.
>>
>> Any data path read/write access (i.e. virtio driver notifications) will
>> be forwarded to the physical BAR which its properties were supplied by
>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>> probing/init flow.
>>
>> With that code in place a legacy driver in the guest has the look and
>> feel as if having a transitional device with legacy support for both its
>> control and data path flows.
>>
>> [1]
>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   MAINTAINERS                      |   7 +
>>   drivers/vfio/pci/Kconfig         |   2 +
>>   drivers/vfio/pci/Makefile        |   2 +
>>   drivers/vfio/pci/virtio/Kconfig  |  15 +
>>   drivers/vfio/pci/virtio/Makefile |   4 +
>>   drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
>>   6 files changed, 607 insertions(+)
>>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>   create mode 100644 drivers/vfio/pci/virtio/Makefile
>>   create mode 100644 drivers/vfio/pci/virtio/main.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 7a7bd8bd80e9..680a70063775 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
>>   S:	Maintained
>>   F:	drivers/vfio/pci/mlx5/
>>   
>> +VFIO VIRTIO PCI DRIVER
>> +M:	Yishai Hadas <yishaih@nvidia.com>
>> +L:	kvm@vger.kernel.org
>> +L:	virtualization@lists.linux-foundation.org
>> +S:	Maintained
>> +F:	drivers/vfio/pci/virtio
>> +
>>   VFIO PCI DEVICE SPECIFIC DRIVERS
>>   R:	Jason Gunthorpe <jgg@nvidia.com>
>>   R:	Yishai Hadas <yishaih@nvidia.com>
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 8125e5f37832..18c397df566d 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>   
>>   source "drivers/vfio/pci/pds/Kconfig"
>>   
>> +source "drivers/vfio/pci/virtio/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>> index 45167be462d8..046139a4eca5 100644
>> --- a/drivers/vfio/pci/Makefile
>> +++ b/drivers/vfio/pci/Makefile
>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>   
>>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>> +
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>> new file mode 100644
>> index 000000000000..89eddce8b1bd
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VIRTIO_VFIO_PCI
>> +        tristate "VFIO support for VIRTIO PCI devices"
>> +        depends on VIRTIO_PCI
>> +        select VFIO_PCI_CORE
>> +        help
>> +          This provides support for exposing VIRTIO VF devices using the VFIO
>> +          framework that can work with a legacy virtio driver in the guest.
>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>> +          not indicate I/O Space.
>> +          As of that this driver emulated I/O BAR in software to let a VF be
>> +          seen as a transitional device in the guest and let it work with
>> +          a legacy driver.
> This description is a little bit subtle to the hard requirements on the
> device.  Reading this, one might think that this should work for any
> SR-IOV VF virtio device, when in reality it only support virtio-net
> currently and places a number of additional requirements on the device
> (ex. legacy access and MSI-X support).

Sure, will change to refer only to virtio-net devices which are capable 
for 'legacy access'.

No need to refer to MSI-X, please see below.

>
>> +
>> +          If you don't know what to do here, say N.
>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>> new file mode 100644
>> index 000000000000..2039b39fb723
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>> +virtio-vfio-pci-y := main.o
>> +
>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>> new file mode 100644
>> index 000000000000..3fef4b21f7e6
>> --- /dev/null
>> +++ b/drivers/vfio/pci/virtio/main.c
>> @@ -0,0 +1,577 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +#include <linux/vfio_pci_core.h>
>> +#include <linux/virtio_pci.h>
>> +#include <linux/virtio_net.h>
>> +#include <linux/virtio_pci_admin.h>
>> +
>> +struct virtiovf_pci_core_device {
>> +	struct vfio_pci_core_device core_device;
>> +	u8 bar0_virtual_buf_size;
>> +	u8 *bar0_virtual_buf;
>> +	/* synchronize access to the virtual buf */
>> +	struct mutex bar_mutex;
>> +	void __iomem *notify_addr;
>> +	u32 notify_offset;
>> +	u8 notify_bar;
> Push the above u8 to the end of the structure for better packing.
OK
>> +	u16 pci_cmd;
>> +	u16 msix_ctrl;
>> +};
>> +
>> +static int
>> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
>> +			     loff_t pos, char __user *buf,
>> +			     size_t count, bool read)
>> +{
>> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>> +	u16 opcode;
>> +	int ret;
>> +
>> +	mutex_lock(&virtvdev->bar_mutex);
>> +	if (read) {
>> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
>> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
>> +						      bar0_buf + pos);
>> +		if (ret)
>> +			goto out;
>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>> +			ret = -EFAULT;
>> +		goto out;
>> +	}
> TBH, I think the symmetry of read vs write would be more apparent if
> this were an else branch.
OK, will do.
>> +
>> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
>> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
>> +					       bar0_buf + pos);
>> +out:
>> +	mutex_unlock(&virtvdev->bar_mutex);
>> +	return ret;
>> +}
>> +
>> +static int
>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>> +			    loff_t pos, char __user *buf,
>> +			    size_t count, bool read)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	u16 queue_notify;
>> +	int ret;
>> +
>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>> +		return -EINVAL;
>> +
>> +	switch (pos) {
>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>> +		if (count != sizeof(queue_notify))
>> +			return -EINVAL;
>> +		if (read) {
>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>> +						virtvdev->notify_addr);
>> +			if (ret)
>> +				return ret;
>> +			if (copy_to_user(buf, &queue_notify,
>> +					 sizeof(queue_notify)))
>> +				return -EFAULT;
>> +			break;
>> +		}
> Same.
OK
>> +
>> +		if (copy_from_user(&queue_notify, buf, count))
>> +			return -EFAULT;
>> +
>> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>> +					 virtvdev->notify_addr);
>> +		break;
>> +	default:
>> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
>> +						   read);
>> +	}
>> +
>> +	return ret ? ret : count;
>> +}
>> +
>> +static bool range_intersect_range(loff_t range1_start, size_t count1,
>> +				  loff_t range2_start, size_t count2,
>> +				  loff_t *start_offset,
>> +				  size_t *intersect_count,
>> +				  size_t *register_offset)
>> +{
>> +	if (range1_start <= range2_start &&
>> +	    range1_start + count1 > range2_start) {
>> +		*start_offset = range2_start - range1_start;
>> +		*intersect_count = min_t(size_t, count2,
>> +					 range1_start + count1 - range2_start);
>> +		if (register_offset)
>> +			*register_offset = 0;
>> +		return true;
>> +	}
>> +
>> +	if (range1_start > range2_start &&
>> +	    range1_start < range2_start + count2) {
>> +		*start_offset = range1_start;
>> +		*intersect_count = min_t(size_t, count1,
>> +					 range2_start + count2 - range1_start);
>> +		if (register_offset)
>> +			*register_offset = range1_start - range2_start;
>> +		return true;
>> +	}
> Seems like we're missing a case, and some documentation.
>
> The first test requires range1 to fully enclose range2 and provides the
> offset of range2 within range1 and the length of the intersection.
>
> The second test requires range1 to start from a non-zero offset within
> range2 and returns the absolute offset of range1 and the length of the
> intersection.
>
> The register offset is then non-zero offset of range1 into range2.  So
> does the caller use the zero value in the previous test to know range2
> exists within range1?
>
> We miss the cases where range1_start is <= range2_start and range1
> terminates within range2.

The first test should cover this case as well of the case of fully 
enclosing.

It checks whether range1_start + count1 > range2_start which can 
terminates also within range2.

Isn't it ?

I may add some documentation for that function as part of V2 as you asked.

>   I suppose we'll see below how this is used,
> but it seems asymmetric and incomplete.
>
>> +
>> +	return false;
>> +}
>> +
>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>> +					char __user *buf, size_t count,
>> +					loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	size_t register_offset;
>> +	loff_t copy_offset;
>> +	size_t copy_count;
>> +	__le32 val32;
>> +	__le16 val16;
>> +	u8 val8;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, NULL)) {
> If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
> within range2) they'll not enter this branch and see 41 rather than 00.
>
> If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
> contained within range 2), the above function returns a copy_offset of
> range1_start (ie. 3).  But that offset is applied to the buffer, which
> is out of bounds.  The function needs to have returned an offset of 1
> and it should have applied to the val16 address.
>
> I don't think this works like it's intended.

Is that because of the missing case ?
Please see my note above.

>
>
>> +		val16 = cpu_to_le16(0x1000);
> Please #define this somewhere rather than hiding a magic value here.
Sure, will just replace to VIRTIO_TRANS_ID_NET.
>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>> +				  &copy_offset, &copy_count, &register_offset)) {
>> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
>> +				   copy_count))
>> +			return -EFAULT;
>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>> +				 copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>> +				  &copy_offset, &copy_count, NULL)) {
>> +		/* Transional needs to have revision 0 */
>> +		val8 = 0;
>> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>> +				  &copy_offset, &copy_count, NULL)) {
>> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
> I'd still like to see the remainder of the BAR follow the semantics
> vfio-pci does.  I think this requires a __le32 bar0 field on the
> virtvdev struct to store writes and the read here would mask the lower
> bits up to the BAR size and OR in the IO indicator bit.

OK, will do.

>
>
>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>> +				  &copy_offset, &copy_count, NULL)) {
>> +		/*
>> +		 * Transitional devices use the PCI subsystem device id as
>> +		 * virtio device id, same as legacy driver always did.
> Where did we require the subsystem vendor ID to be 0x1af4?  This
> subsystem device ID really only makes since given that subsystem
> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> such as the VF, have a hard requirement per the spec for the subsystem
> vendor ID.
>
> Do we want to make this only probe the correct subsystem vendor ID or do
> we want to emulate the subsystem vendor ID as well?  I don't see this is
> correct without one of those options.

Looking in the 1.x spec we can see the below.

Legacy Interfaces: A Note on PCI Device Discovery

"Transitional devices MUST have the PCI Subsystem
Device ID matching the Virtio Device ID, as indicated in section 5 ...
This is to match legacy drivers."

However, there is no need to enforce Subsystem Vendor ID.

This is what we followed here.

Makes sense ?

>> +		 */
>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>> +			return -EFAULT;
>> +	}
>> +
>> +	return count;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>> +		       size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>> +				     ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static ssize_t
>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>> +			size_t count, loff_t *ppos)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +	int ret;
>> +
>> +	if (!count)
>> +		return 0;
>> +
>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>> +		size_t register_offset;
>> +		loff_t copy_offset;
>> +		size_t copy_count;
>> +
>> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
>> +
>> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
>> +					  sizeof(virtvdev->msix_ctrl),
>> +					  &copy_offset, &copy_count,
>> +					  &register_offset)) {
>> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
>> +					   buf + copy_offset,
>> +					   copy_count))
>> +				return -EFAULT;
>> +		}
> MSI-X is setup via ioctl, so you're relying on a userspace that writes
> through the control register bit even though it doesn't do anything.
> Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
> enabled?
OK, may switch to your suggestion post of testing it.
>
>> +	}
>> +
>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>> +
>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	if (ret) {
>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>> +		return -EIO;
>> +	}
>> +
>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>> +	pm_runtime_put(&pdev->dev);
>> +	return ret;
>> +}
>> +
>> +static int
>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>> +				   unsigned int cmd, unsigned long arg)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>> +	void __user *uarg = (void __user *)arg;
>> +	struct vfio_region_info info = {};
>> +
>> +	if (copy_from_user(&info, uarg, minsz))
>> +		return -EFAULT;
>> +
>> +	if (info.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	switch (info.index) {
>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>> +		info.size = virtvdev->bar0_virtual_buf_size;
>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static long
>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>> +			     unsigned long arg)
>> +{
>> +	switch (cmd) {
>> +	case VFIO_DEVICE_GET_REGION_INFO:
>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>> +	default:
>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>> +	}
>> +}
>> +
>> +static int
>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	/*
>> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
>> +	 * This will let us mmap it only once and use it when needed.
>> +	 */
>> +	ret = vfio_pci_core_setup_barmap(core_device,
>> +					 virtvdev->notify_bar);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>> +			virtvdev->notify_offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_enable(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (virtvdev->bar0_virtual_buf) {
>> +		/*
>> +		 * Upon close_device() the vfio_pci_core_disable() is called
>> +		 * and will close all the previous mmaps, so it seems that the
>> +		 * valid life cycle for the 'notify' addr is per open/close.
>> +		 */
>> +		ret = virtiovf_set_notify_addr(virtvdev);
>> +		if (ret) {
>> +			vfio_pci_core_disable(vdev);
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	vfio_pci_core_finish_enable(vdev);
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_get_device_config_size(unsigned short device)
>> +{
>> +	/* Network card */
>> +	return offsetofend(struct virtio_net_config, status);
>> +}
>> +
>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>> +{
>> +	u64 offset;
>> +	int ret;
>> +	u8 bar;
>> +
>> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>> +				&bar, &offset);
>> +	if (ret)
>> +		return ret;
>> +
>> +	virtvdev->notify_bar = bar;
>> +	virtvdev->notify_offset = offset;
>> +	return 0;
>> +}
>> +
>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	ret = vfio_pci_core_init_dev(core_vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = virtvdev->core_device.pdev;
>> +	ret = virtiovf_read_notify_info(virtvdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Being ready with a buffer that supports MSIX */
>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>> +				virtiovf_get_device_config_size(pdev->device);
>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>> +					     GFP_KERNEL);
>> +	if (!virtvdev->bar0_virtual_buf)
>> +		return -ENOMEM;
>> +	mutex_init(&virtvdev->bar_mutex);
>> +	return 0;
>> +}
>> +
>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>> +
>> +	kfree(virtvdev->bar0_virtual_buf);
>> +	vfio_pci_core_release_dev(core_vdev);
>> +}
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
>> +	.name = "virtio-transitional-vfio-pci",
>> +	.init = virtiovf_pci_init_device,
>> +	.release = virtiovf_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = vfio_pci_core_close_device,
>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>> +	.read = virtiovf_pci_core_read,
>> +	.write = virtiovf_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
>> +	.name = "virtio-acc-vfio-pci",
>> +	.init = vfio_pci_core_init_dev,
>> +	.release = vfio_pci_core_release_dev,
>> +	.open_device = virtiovf_pci_open_device,
>> +	.close_device = vfio_pci_core_close_device,
>> +	.ioctl = vfio_pci_core_ioctl,
>> +	.device_feature = vfio_pci_core_ioctl_feature,
>> +	.read = vfio_pci_core_read,
>> +	.write = vfio_pci_core_write,
>> +	.mmap = vfio_pci_core_mmap,
>> +	.request = vfio_pci_core_request,
>> +	.match = vfio_pci_core_match,
>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>> +};
>> +
>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>> +{
>> +	struct resource *res = pdev->resource;
>> +
>> +	return res->flags ? true : false;
>> +}
>> +
>> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
>> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>> +
>> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
>> +{
>> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
>> +	u8 *buf;
>> +	int ret;
>> +
>> +	buf = kzalloc(buf_size, GFP_KERNEL);
>> +	if (!buf)
>> +		return false;
>> +
>> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
>> +	if (ret)
>> +		goto end;
>> +
>> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
>> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
>> +		ret = -EOPNOTSUPP;
>> +		goto end;
>> +	}
>> +
>> +	/* Confirm the used commands */
>> +	memset(buf, 0, buf_size);
>> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
>> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
>> +end:
>> +	kfree(buf);
>> +	return ret ? false : true;
>> +}
>> +
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>
> All but the last test here are fairly evident requirements of the
> driver.  Why do we require a device that supports MSI-X?

As now we check at run time to decide whether MSI-X is enabled/disabled 
to pick-up the correct op code, no need for that any more.

Will drop this MSI-X check from V2.

Thanks,
Yishai

>
> Thanks,
> Alex
>
>
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	/* Only virtio-net is supported/tested so far */
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-25 14:35       ` Yishai Hadas via Virtualization
@ 2023-10-25 16:28         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 16:28 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Alex Williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Oct 25, 2023 at 05:35:51PM +0300, Yishai Hadas wrote:
> > Do we want to make this only probe the correct subsystem vendor ID or do
> > we want to emulate the subsystem vendor ID as well?  I don't see this is
> > correct without one of those options.
> 
> Looking in the 1.x spec we can see the below.
> 
> Legacy Interfaces: A Note on PCI Device Discovery
> 
> "Transitional devices MUST have the PCI Subsystem
> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> This is to match legacy drivers."
> 
> However, there is no need to enforce Subsystem Vendor ID.
> 
> This is what we followed here.
> 
> Makes sense ?

Won't work for legacy windows drivers.

-- 
MST


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-25 16:28         ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 16:28 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Oct 25, 2023 at 05:35:51PM +0300, Yishai Hadas wrote:
> > Do we want to make this only probe the correct subsystem vendor ID or do
> > we want to emulate the subsystem vendor ID as well?  I don't see this is
> > correct without one of those options.
> 
> Looking in the 1.x spec we can see the below.
> 
> Legacy Interfaces: A Note on PCI Device Discovery
> 
> "Transitional devices MUST have the PCI Subsystem
> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> This is to match legacy drivers."
> 
> However, there is no need to enforce Subsystem Vendor ID.
> 
> This is what we followed here.
> 
> Makes sense ?

Won't work for legacy windows drivers.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
  2023-10-25 14:03               ` Yishai Hadas via Virtualization
@ 2023-10-25 16:31                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 16:31 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Wed, Oct 25, 2023 at 05:03:55PM +0300, Yishai Hadas wrote:
> > Yes - I think some kind of API will be needed to setup/cleanup.
> > Then 1st call to setup would do the list/use dance? some ref counting?
> 
> OK, we may work to come in V2 with that option in place.
> 
> Please note that the initialization 'list/use' commands would be done as
> part of the admin queue activation but we can't enable the admin queue for
> the upper layers before that it was done.

I don't know what does this mean.

> So, we may consider skipping the ref count set/get as part of the
> initialization flow with some flag/detection of the list/use commands as the
> ref count setting enables the admin queue for upper-layers which we would
> like to prevent by that time.

You can init on 1st use but you can't destroy after last use.
For symmetry it's better to just have explicit constructor/destructor.


> > 
> > And maybe the API should just be
> > bool virtio_pci_admin_has_legacy_io()
> 
> This can work as well.
> 
> In that case, the API will just get the VF PCI to get from it the PF +
> 'admin_queue' context and will check internally that all current 5 legacy
> commands are supported.
> 
> Yishai

Yes, makes sense.

-- 
MST


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands
@ 2023-10-25 16:31                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-25 16:31 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Wed, Oct 25, 2023 at 05:03:55PM +0300, Yishai Hadas wrote:
> > Yes - I think some kind of API will be needed to setup/cleanup.
> > Then 1st call to setup would do the list/use dance? some ref counting?
> 
> OK, we may work to come in V2 with that option in place.
> 
> Please note that the initialization 'list/use' commands would be done as
> part of the admin queue activation but we can't enable the admin queue for
> the upper layers before that it was done.

I don't know what does this mean.

> So, we may consider skipping the ref count set/get as part of the
> initialization flow with some flag/detection of the list/use commands as the
> ref count setting enables the admin queue for upper-layers which we would
> like to prevent by that time.

You can init on 1st use but you can't destroy after last use.
For symmetry it's better to just have explicit constructor/destructor.


> > 
> > And maybe the API should just be
> > bool virtio_pci_admin_has_legacy_io()
> 
> This can work as well.
> 
> In that case, the API will just get the VF PCI to get from it the PF +
> 'admin_queue' context and will check internally that all current 5 legacy
> commands are supported.
> 
> Yishai

Yes, makes sense.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-25 14:35       ` Yishai Hadas via Virtualization
@ 2023-10-25 19:13         ` Alex Williamson
  -1 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-25 19:13 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Wed, 25 Oct 2023 17:35:51 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 24/10/2023 22:57, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >  
> >> Introduce a vfio driver over virtio devices to support the legacy
> >> interface functionality for VFs.
> >>
> >> Background, from the virtio spec [1].
> >> --------------------------------------------------------------------
> >> In some systems, there is a need to support a virtio legacy driver with
> >> a device that does not directly support the legacy interface. In such
> >> scenarios, a group owner device can provide the legacy interface
> >> functionality for the group member devices. The driver of the owner
> >> device can then access the legacy interface of a member device on behalf
> >> of the legacy member device driver.
> >>
> >> For example, with the SR-IOV group type, group members (VFs) can not
> >> present the legacy interface in an I/O BAR in BAR0 as expected by the
> >> legacy pci driver. If the legacy driver is running inside a virtual
> >> machine, the hypervisor executing the virtual machine can present a
> >> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> >> legacy driver accesses to this I/O BAR and forwards them to the group
> >> owner device (PF) using group administration commands.
> >> --------------------------------------------------------------------
> >>
> >> Specifically, this driver adds support for a virtio-net VF to be exposed
> >> as a transitional device to a guest driver and allows the legacy IO BAR
> >> functionality on top.
> >>
> >> This allows a VM which uses a legacy virtio-net driver in the guest to
> >> work transparently over a VF which its driver in the host is that new
> >> driver.
> >>
> >> The driver can be extended easily to support some other types of virtio
> >> devices (e.g virtio-blk), by adding in a few places the specific type
> >> properties as was done for virtio-net.
> >>
> >> For now, only the virtio-net use case was tested and as such we introduce
> >> the support only for such a device.
> >>
> >> Practically,
> >> Upon probing a VF for a virtio-net device, in case its PF supports
> >> legacy access over the virtio admin commands and the VF doesn't have BAR
> >> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> >> transitional device with I/O BAR in BAR 0.
> >>
> >> The existence of the simulated I/O bar is reported later on by
> >> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> >> exposes itself as a transitional device by overwriting some properties
> >> upon reading its config space.
> >>
> >> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> >> guest may use it via read/write calls according to the virtio
> >> specification.
> >>
> >> Any read/write towards the control parts of the BAR will be captured by
> >> the new driver and will be translated into admin commands towards the
> >> device.
> >>
> >> Any data path read/write access (i.e. virtio driver notifications) will
> >> be forwarded to the physical BAR which its properties were supplied by
> >> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> >> probing/init flow.
> >>
> >> With that code in place a legacy driver in the guest has the look and
> >> feel as if having a transitional device with legacy support for both its
> >> control and data path flows.
> >>
> >> [1]
> >> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >>
> >> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> >> ---
> >>   MAINTAINERS                      |   7 +
> >>   drivers/vfio/pci/Kconfig         |   2 +
> >>   drivers/vfio/pci/Makefile        |   2 +
> >>   drivers/vfio/pci/virtio/Kconfig  |  15 +
> >>   drivers/vfio/pci/virtio/Makefile |   4 +
> >>   drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
> >>   6 files changed, 607 insertions(+)
> >>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >>   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >>   create mode 100644 drivers/vfio/pci/virtio/main.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 7a7bd8bd80e9..680a70063775 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
> >>   S:	Maintained
> >>   F:	drivers/vfio/pci/mlx5/
> >>   
> >> +VFIO VIRTIO PCI DRIVER
> >> +M:	Yishai Hadas <yishaih@nvidia.com>
> >> +L:	kvm@vger.kernel.org
> >> +L:	virtualization@lists.linux-foundation.org
> >> +S:	Maintained
> >> +F:	drivers/vfio/pci/virtio
> >> +
> >>   VFIO PCI DEVICE SPECIFIC DRIVERS
> >>   R:	Jason Gunthorpe <jgg@nvidia.com>
> >>   R:	Yishai Hadas <yishaih@nvidia.com>
> >> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >> index 8125e5f37832..18c397df566d 100644
> >> --- a/drivers/vfio/pci/Kconfig
> >> +++ b/drivers/vfio/pci/Kconfig
> >> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> >>   
> >>   source "drivers/vfio/pci/pds/Kconfig"
> >>   
> >> +source "drivers/vfio/pci/virtio/Kconfig"
> >> +
> >>   endmenu
> >> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> >> index 45167be462d8..046139a4eca5 100644
> >> --- a/drivers/vfio/pci/Makefile
> >> +++ b/drivers/vfio/pci/Makefile
> >> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> >>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> >>   
> >>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> >> +
> >> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> >> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> >> new file mode 100644
> >> index 000000000000..89eddce8b1bd
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/Kconfig
> >> @@ -0,0 +1,15 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +config VIRTIO_VFIO_PCI
> >> +        tristate "VFIO support for VIRTIO PCI devices"
> >> +        depends on VIRTIO_PCI
> >> +        select VFIO_PCI_CORE
> >> +        help
> >> +          This provides support for exposing VIRTIO VF devices using the VFIO
> >> +          framework that can work with a legacy virtio driver in the guest.
> >> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> >> +          not indicate I/O Space.
> >> +          As of that this driver emulated I/O BAR in software to let a VF be
> >> +          seen as a transitional device in the guest and let it work with
> >> +          a legacy driver.  
> > This description is a little bit subtle to the hard requirements on the
> > device.  Reading this, one might think that this should work for any
> > SR-IOV VF virtio device, when in reality it only support virtio-net
> > currently and places a number of additional requirements on the device
> > (ex. legacy access and MSI-X support).  
> 
> Sure, will change to refer only to virtio-net devices which are capable 
> for 'legacy access'.
> 
> No need to refer to MSI-X, please see below.
> 
> >  
> >> +
> >> +          If you don't know what to do here, say N.
> >> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> >> new file mode 100644
> >> index 000000000000..2039b39fb723
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/Makefile
> >> @@ -0,0 +1,4 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> >> +virtio-vfio-pci-y := main.o
> >> +
> >> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> >> new file mode 100644
> >> index 000000000000..3fef4b21f7e6
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/main.c
> >> @@ -0,0 +1,577 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> >> + */
> >> +
> >> +#include <linux/device.h>
> >> +#include <linux/module.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/pci.h>
> >> +#include <linux/pm_runtime.h>
> >> +#include <linux/types.h>
> >> +#include <linux/uaccess.h>
> >> +#include <linux/vfio.h>
> >> +#include <linux/vfio_pci_core.h>
> >> +#include <linux/virtio_pci.h>
> >> +#include <linux/virtio_net.h>
> >> +#include <linux/virtio_pci_admin.h>
> >> +
> >> +struct virtiovf_pci_core_device {
> >> +	struct vfio_pci_core_device core_device;
> >> +	u8 bar0_virtual_buf_size;
> >> +	u8 *bar0_virtual_buf;
> >> +	/* synchronize access to the virtual buf */
> >> +	struct mutex bar_mutex;
> >> +	void __iomem *notify_addr;
> >> +	u32 notify_offset;
> >> +	u8 notify_bar;  
> > Push the above u8 to the end of the structure for better packing.  
> OK
> >> +	u16 pci_cmd;
> >> +	u16 msix_ctrl;
> >> +};
> >> +
> >> +static int
> >> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
> >> +			     loff_t pos, char __user *buf,
> >> +			     size_t count, bool read)
> >> +{
> >> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> >> +	u16 opcode;
> >> +	int ret;
> >> +
> >> +	mutex_lock(&virtvdev->bar_mutex);
> >> +	if (read) {
> >> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> >> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> >> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> >> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
> >> +						      bar0_buf + pos);
> >> +		if (ret)
> >> +			goto out;
> >> +		if (copy_to_user(buf, bar0_buf + pos, count))
> >> +			ret = -EFAULT;
> >> +		goto out;
> >> +	}  
> > TBH, I think the symmetry of read vs write would be more apparent if
> > this were an else branch.  
> OK, will do.
> >> +
> >> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> >> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> >> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> >> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
> >> +					       bar0_buf + pos);
> >> +out:
> >> +	mutex_unlock(&virtvdev->bar_mutex);
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> >> +			    loff_t pos, char __user *buf,
> >> +			    size_t count, bool read)
> >> +{
> >> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> >> +	u16 queue_notify;
> >> +	int ret;
> >> +
> >> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> >> +		return -EINVAL;
> >> +
> >> +	switch (pos) {
> >> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> >> +		if (count != sizeof(queue_notify))
> >> +			return -EINVAL;
> >> +		if (read) {
> >> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> >> +						virtvdev->notify_addr);
> >> +			if (ret)
> >> +				return ret;
> >> +			if (copy_to_user(buf, &queue_notify,
> >> +					 sizeof(queue_notify)))
> >> +				return -EFAULT;
> >> +			break;
> >> +		}  
> > Same.  
> OK
> >> +
> >> +		if (copy_from_user(&queue_notify, buf, count))
> >> +			return -EFAULT;
> >> +
> >> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> >> +					 virtvdev->notify_addr);
> >> +		break;
> >> +	default:
> >> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
> >> +						   read);
> >> +	}
> >> +
> >> +	return ret ? ret : count;
> >> +}
> >> +
> >> +static bool range_intersect_range(loff_t range1_start, size_t count1,
> >> +				  loff_t range2_start, size_t count2,
> >> +				  loff_t *start_offset,
> >> +				  size_t *intersect_count,
> >> +				  size_t *register_offset)
> >> +{
> >> +	if (range1_start <= range2_start &&
> >> +	    range1_start + count1 > range2_start) {
> >> +		*start_offset = range2_start - range1_start;
> >> +		*intersect_count = min_t(size_t, count2,
> >> +					 range1_start + count1 - range2_start);
> >> +		if (register_offset)
> >> +			*register_offset = 0;
> >> +		return true;
> >> +	}
> >> +
> >> +	if (range1_start > range2_start &&
> >> +	    range1_start < range2_start + count2) {
> >> +		*start_offset = range1_start;
> >> +		*intersect_count = min_t(size_t, count1,
> >> +					 range2_start + count2 - range1_start);
> >> +		if (register_offset)
> >> +			*register_offset = range1_start - range2_start;
> >> +		return true;
> >> +	}  
> > Seems like we're missing a case, and some documentation.
> >
> > The first test requires range1 to fully enclose range2 and provides the
> > offset of range2 within range1 and the length of the intersection.
> >
> > The second test requires range1 to start from a non-zero offset within
> > range2 and returns the absolute offset of range1 and the length of the
> > intersection.
> >
> > The register offset is then non-zero offset of range1 into range2.  So
> > does the caller use the zero value in the previous test to know range2
> > exists within range1?
> >
> > We miss the cases where range1_start is <= range2_start and range1
> > terminates within range2.  
> 
> The first test should cover this case as well of the case of fully 
> enclosing.
> 
> It checks whether range1_start + count1 > range2_start which can 
> terminates also within range2.
> 
> Isn't it ?

Hmm, maybe I read it wrong.  Let me try again...

The first test covers the cases where range1 starts at or below range2
and range1 extends into or through range2.  start_offset describes the
offset into range1 that range2 begins.  The intersect_count is the
extent of the intersection and it's not clear what register_offset
describes since it's zero.

The second test covers the cases where range1 starts within range2.
start_offset is the start of range1, which doesn't seem consistent with
the previous branch usage.  The intersect_count does look consistent
with the previous branch.  register_offset is then the offset of range1
into range2

So I had some things wrong, but I'm still having trouble with a
consistent definition of start_offset and register_offset.


> I may add some documentation for that function as part of V2 as you asked.
> 
> >   I suppose we'll see below how this is used,
> > but it seems asymmetric and incomplete.
> >  
> >> +
> >> +	return false;
> >> +}
> >> +
> >> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >> +					char __user *buf, size_t count,
> >> +					loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	size_t register_offset;
> >> +	loff_t copy_offset;
> >> +	size_t copy_count;
> >> +	__le32 val32;
> >> +	__le16 val16;
> >> +	u8 val8;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, NULL)) {  
> > If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
> > within range2) they'll not enter this branch and see 41 rather than 00.

Yes, this does take the first branch per my second look, so copy_offset
is zero, copy_count is 1.  I think the copy_to_user() works correctly

> > If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
> > contained within range 2), the above function returns a copy_offset of
> > range1_start (ie. 3).  But that offset is applied to the buffer, which
> > is out of bounds.  The function needs to have returned an offset of 1
> > and it should have applied to the val16 address.
> >
> > I don't think this works like it's intended.  
> 
> Is that because of the missing case ?
> Please see my note above.

No, I think my original evaluation of this second case still holds,
copy_offset is wrong.  I suspect what you're trying to do with
start_offset and register_offset is specify the output buffer offset,
ie. relative to range1 or buf, or the input offset, ie. range2 or our
local val variable.  But start_offset is incorrectly calculated in the
second branch above (should always be zero) and the caller didn't ask
for the register offset here, which is seems it always should.

> >> +		val16 = cpu_to_le16(0x1000);  
> > Please #define this somewhere rather than hiding a magic value here.  
> Sure, will just replace to VIRTIO_TRANS_ID_NET.
> >> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
> >> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
> >> +				   copy_count))
> >> +			return -EFAULT;
> >> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> >> +				  &copy_offset, &copy_count, NULL)) {
> >> +		/* Transional needs to have revision 0 */
> >> +		val8 = 0;
> >> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> >> +				  &copy_offset, &copy_count, NULL)) {
> >> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);  
> > I'd still like to see the remainder of the BAR follow the semantics
> > vfio-pci does.  I think this requires a __le32 bar0 field on the
> > virtvdev struct to store writes and the read here would mask the lower
> > bits up to the BAR size and OR in the IO indicator bit.  
> 
> OK, will do.
> 
> >
> >  
> >> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, NULL)) {
> >> +		/*
> >> +		 * Transitional devices use the PCI subsystem device id as
> >> +		 * virtio device id, same as legacy driver always did.  
> > Where did we require the subsystem vendor ID to be 0x1af4?  This
> > subsystem device ID really only makes since given that subsystem
> > vendor ID, right?  Otherwise I don't see that non-transitional devices,
> > such as the VF, have a hard requirement per the spec for the subsystem
> > vendor ID.
> >
> > Do we want to make this only probe the correct subsystem vendor ID or do
> > we want to emulate the subsystem vendor ID as well?  I don't see this is
> > correct without one of those options.  
> 
> Looking in the 1.x spec we can see the below.
> 
> Legacy Interfaces: A Note on PCI Device Discovery
> 
> "Transitional devices MUST have the PCI Subsystem
> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> This is to match legacy drivers."
> 
> However, there is no need to enforce Subsystem Vendor ID.
> 
> This is what we followed here.
> 
> Makes sense ?

So do I understand correctly that virtio dictates the subsystem device
ID for all subsystem vendor IDs that implement a legacy virtio
interface?  Ok, but this device didn't actually implement a legacy
virtio interface.  The device itself is not tranistional, we're imposing
an emulated transitional interface onto it.  So did the subsystem vendor
agree to have their subsystem device ID managed by the virtio committee
or might we create conflicts?  I imagine we know we don't have a
conflict if we also virtualize the subsystem vendor ID.


BTW, it would be a lot easier for all of the config space emulation here
if we could make use of the existing field virtualization in
vfio-pci-core.  In fact you'll see in vfio_config_init() that
PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
simply do the following to report the desired device ID:

	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);

It appears everything in this function could be handled similarly by
vfio-pci-core if the right fields in the perm_bits.virt and .write
bits could be manipulated and vconfig modified appropriately.  I'd look
for a way that a variant driver could provide an alternate set of
permissions structures for various capabilities.  Thanks,

Alex


> >> +		 */
> >> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> >> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	return count;
> >> +}
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> >> +		       size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	int ret;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> >> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> >> +
> >> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> >> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +
> >> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> >> +	if (ret) {
> >> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> >> +				     ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> >> +	pm_runtime_put(&pdev->dev);
> >> +	return ret;
> >> +}
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> >> +			size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	int ret;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> >> +		size_t register_offset;
> >> +		loff_t copy_offset;
> >> +		size_t copy_count;
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +
> >> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
> >> +					  sizeof(virtvdev->msix_ctrl),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}  
> > MSI-X is setup via ioctl, so you're relying on a userspace that writes
> > through the control register bit even though it doesn't do anything.
> > Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
> > enabled?  
> OK, may switch to your suggestion post of testing it.
> >  
> >> +	}
> >> +
> >> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> >> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> >> +
> >> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> >> +	if (ret) {
> >> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> >> +	pm_runtime_put(&pdev->dev);
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> >> +				   unsigned int cmd, unsigned long arg)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> >> +	void __user *uarg = (void __user *)arg;
> >> +	struct vfio_region_info info = {};
> >> +
> >> +	if (copy_from_user(&info, uarg, minsz))
> >> +		return -EFAULT;
> >> +
> >> +	if (info.argsz < minsz)
> >> +		return -EINVAL;
> >> +
> >> +	switch (info.index) {
> >> +	case VFIO_PCI_BAR0_REGION_INDEX:
> >> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> >> +		info.size = virtvdev->bar0_virtual_buf_size;
> >> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> >> +			     VFIO_REGION_INFO_FLAG_WRITE;
> >> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> >> +	default:
> >> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> >> +	}
> >> +}
> >> +
> >> +static long
> >> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> >> +			     unsigned long arg)
> >> +{
> >> +	switch (cmd) {
> >> +	case VFIO_DEVICE_GET_REGION_INFO:
> >> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> >> +	default:
> >> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> >> +	}
> >> +}
> >> +
> >> +static int
> >> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> >> +{
> >> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> >> +	int ret;
> >> +
> >> +	/*
> >> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
> >> +	 * This will let us mmap it only once and use it when needed.
> >> +	 */
> >> +	ret = vfio_pci_core_setup_barmap(core_device,
> >> +					 virtvdev->notify_bar);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> >> +			virtvdev->notify_offset;
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_enable(vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	if (virtvdev->bar0_virtual_buf) {
> >> +		/*
> >> +		 * Upon close_device() the vfio_pci_core_disable() is called
> >> +		 * and will close all the previous mmaps, so it seems that the
> >> +		 * valid life cycle for the 'notify' addr is per open/close.
> >> +		 */
> >> +		ret = virtiovf_set_notify_addr(virtvdev);
> >> +		if (ret) {
> >> +			vfio_pci_core_disable(vdev);
> >> +			return ret;
> >> +		}
> >> +	}
> >> +
> >> +	vfio_pci_core_finish_enable(vdev);
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_get_device_config_size(unsigned short device)
> >> +{
> >> +	/* Network card */
> >> +	return offsetofend(struct virtio_net_config, status);
> >> +}
> >> +
> >> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> >> +{
> >> +	u64 offset;
> >> +	int ret;
> >> +	u8 bar;
> >> +
> >> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
> >> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> >> +				&bar, &offset);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	virtvdev->notify_bar = bar;
> >> +	virtvdev->notify_offset = offset;
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_init_dev(core_vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	pdev = virtvdev->core_device.pdev;
> >> +	ret = virtiovf_read_notify_info(virtvdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* Being ready with a buffer that supports MSIX */
> >> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> >> +				virtiovf_get_device_config_size(pdev->device);
> >> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> >> +					     GFP_KERNEL);
> >> +	if (!virtvdev->bar0_virtual_buf)
> >> +		return -ENOMEM;
> >> +	mutex_init(&virtvdev->bar_mutex);
> >> +	return 0;
> >> +}
> >> +
> >> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +
> >> +	kfree(virtvdev->bar0_virtual_buf);
> >> +	vfio_pci_core_release_dev(core_vdev);
> >> +}
> >> +
> >> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> >> +	.name = "virtio-transitional-vfio-pci",
> >> +	.init = virtiovf_pci_init_device,
> >> +	.release = virtiovf_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >> +	.close_device = vfio_pci_core_close_device,
> >> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> >> +	.read = virtiovf_pci_core_read,
> >> +	.write = virtiovf_pci_core_write,
> >> +	.mmap = vfio_pci_core_mmap,
> >> +	.request = vfio_pci_core_request,
> >> +	.match = vfio_pci_core_match,
> >> +	.bind_iommufd = vfio_iommufd_physical_bind,
> >> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> >> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> >> +};
> >> +
> >> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> >> +	.name = "virtio-acc-vfio-pci",
> >> +	.init = vfio_pci_core_init_dev,
> >> +	.release = vfio_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >> +	.close_device = vfio_pci_core_close_device,
> >> +	.ioctl = vfio_pci_core_ioctl,
> >> +	.device_feature = vfio_pci_core_ioctl_feature,
> >> +	.read = vfio_pci_core_read,
> >> +	.write = vfio_pci_core_write,
> >> +	.mmap = vfio_pci_core_mmap,
> >> +	.request = vfio_pci_core_request,
> >> +	.match = vfio_pci_core_match,
> >> +	.bind_iommufd = vfio_iommufd_physical_bind,
> >> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> >> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> >> +};
> >> +
> >> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> >> +{
> >> +	struct resource *res = pdev->resource;
> >> +
> >> +	return res->flags ? true : false;
> >> +}
> >> +
> >> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> >> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> >> +
> >> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> >> +{
> >> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> >> +	u8 *buf;
> >> +	int ret;
> >> +
> >> +	buf = kzalloc(buf_size, GFP_KERNEL);
> >> +	if (!buf)
> >> +		return false;
> >> +
> >> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
> >> +	if (ret)
> >> +		goto end;
> >> +
> >> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> >> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> >> +		ret = -EOPNOTSUPP;
> >> +		goto end;
> >> +	}
> >> +
> >> +	/* Confirm the used commands */
> >> +	memset(buf, 0, buf_size);
> >> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> >> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
> >> +end:
> >> +	kfree(buf);
> >> +	return ret ? false : true;
> >> +}
> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)  
> >
> > All but the last test here are fairly evident requirements of the
> > driver.  Why do we require a device that supports MSI-X?  
> 
> As now we check at run time to decide whether MSI-X is enabled/disabled 
> to pick-up the correct op code, no need for that any more.
> 
> Will drop this MSI-X check from V2.
> 
> Thanks,
> Yishai
> 
> >
> > Thanks,
> > Alex
> >
> >  
> >> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> >> +
> >> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> >> +				     &pdev->dev, ops);
> >> +	if (IS_ERR(virtvdev))
> >> +		return PTR_ERR(virtvdev);
> >> +
> >> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> >> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> >> +	if (ret)
> >> +		goto out;
> >> +	return 0;
> >> +out:
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +	return ret;
> >> +}
> >> +
> >> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +}
> >> +
> >> +static const struct pci_device_id virtiovf_pci_table[] = {
> >> +	/* Only virtio-net is supported/tested so far */
> >> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> >> +	{}
> >> +};
> >> +
> >> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> >> +
> >> +static struct pci_driver virtiovf_pci_driver = {
> >> +	.name = KBUILD_MODNAME,
> >> +	.id_table = virtiovf_pci_table,
> >> +	.probe = virtiovf_pci_probe,
> >> +	.remove = virtiovf_pci_remove,
> >> +	.err_handler = &vfio_pci_core_err_handlers,
> >> +	.driver_managed_dma = true,
> >> +};
> >> +
> >> +module_pci_driver(virtiovf_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> >> +MODULE_DESCRIPTION(
> >> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");  
> 
> 


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-25 19:13         ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-25 19:13 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Wed, 25 Oct 2023 17:35:51 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 24/10/2023 22:57, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >  
> >> Introduce a vfio driver over virtio devices to support the legacy
> >> interface functionality for VFs.
> >>
> >> Background, from the virtio spec [1].
> >> --------------------------------------------------------------------
> >> In some systems, there is a need to support a virtio legacy driver with
> >> a device that does not directly support the legacy interface. In such
> >> scenarios, a group owner device can provide the legacy interface
> >> functionality for the group member devices. The driver of the owner
> >> device can then access the legacy interface of a member device on behalf
> >> of the legacy member device driver.
> >>
> >> For example, with the SR-IOV group type, group members (VFs) can not
> >> present the legacy interface in an I/O BAR in BAR0 as expected by the
> >> legacy pci driver. If the legacy driver is running inside a virtual
> >> machine, the hypervisor executing the virtual machine can present a
> >> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> >> legacy driver accesses to this I/O BAR and forwards them to the group
> >> owner device (PF) using group administration commands.
> >> --------------------------------------------------------------------
> >>
> >> Specifically, this driver adds support for a virtio-net VF to be exposed
> >> as a transitional device to a guest driver and allows the legacy IO BAR
> >> functionality on top.
> >>
> >> This allows a VM which uses a legacy virtio-net driver in the guest to
> >> work transparently over a VF which its driver in the host is that new
> >> driver.
> >>
> >> The driver can be extended easily to support some other types of virtio
> >> devices (e.g virtio-blk), by adding in a few places the specific type
> >> properties as was done for virtio-net.
> >>
> >> For now, only the virtio-net use case was tested and as such we introduce
> >> the support only for such a device.
> >>
> >> Practically,
> >> Upon probing a VF for a virtio-net device, in case its PF supports
> >> legacy access over the virtio admin commands and the VF doesn't have BAR
> >> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> >> transitional device with I/O BAR in BAR 0.
> >>
> >> The existence of the simulated I/O bar is reported later on by
> >> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> >> exposes itself as a transitional device by overwriting some properties
> >> upon reading its config space.
> >>
> >> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> >> guest may use it via read/write calls according to the virtio
> >> specification.
> >>
> >> Any read/write towards the control parts of the BAR will be captured by
> >> the new driver and will be translated into admin commands towards the
> >> device.
> >>
> >> Any data path read/write access (i.e. virtio driver notifications) will
> >> be forwarded to the physical BAR which its properties were supplied by
> >> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
> >> probing/init flow.
> >>
> >> With that code in place a legacy driver in the guest has the look and
> >> feel as if having a transitional device with legacy support for both its
> >> control and data path flows.
> >>
> >> [1]
> >> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> >>
> >> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> >> ---
> >>   MAINTAINERS                      |   7 +
> >>   drivers/vfio/pci/Kconfig         |   2 +
> >>   drivers/vfio/pci/Makefile        |   2 +
> >>   drivers/vfio/pci/virtio/Kconfig  |  15 +
> >>   drivers/vfio/pci/virtio/Makefile |   4 +
> >>   drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
> >>   6 files changed, 607 insertions(+)
> >>   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> >>   create mode 100644 drivers/vfio/pci/virtio/Makefile
> >>   create mode 100644 drivers/vfio/pci/virtio/main.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 7a7bd8bd80e9..680a70063775 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
> >>   S:	Maintained
> >>   F:	drivers/vfio/pci/mlx5/
> >>   
> >> +VFIO VIRTIO PCI DRIVER
> >> +M:	Yishai Hadas <yishaih@nvidia.com>
> >> +L:	kvm@vger.kernel.org
> >> +L:	virtualization@lists.linux-foundation.org
> >> +S:	Maintained
> >> +F:	drivers/vfio/pci/virtio
> >> +
> >>   VFIO PCI DEVICE SPECIFIC DRIVERS
> >>   R:	Jason Gunthorpe <jgg@nvidia.com>
> >>   R:	Yishai Hadas <yishaih@nvidia.com>
> >> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >> index 8125e5f37832..18c397df566d 100644
> >> --- a/drivers/vfio/pci/Kconfig
> >> +++ b/drivers/vfio/pci/Kconfig
> >> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
> >>   
> >>   source "drivers/vfio/pci/pds/Kconfig"
> >>   
> >> +source "drivers/vfio/pci/virtio/Kconfig"
> >> +
> >>   endmenu
> >> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> >> index 45167be462d8..046139a4eca5 100644
> >> --- a/drivers/vfio/pci/Makefile
> >> +++ b/drivers/vfio/pci/Makefile
> >> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
> >>   obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
> >>   
> >>   obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> >> +
> >> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
> >> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
> >> new file mode 100644
> >> index 000000000000..89eddce8b1bd
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/Kconfig
> >> @@ -0,0 +1,15 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +config VIRTIO_VFIO_PCI
> >> +        tristate "VFIO support for VIRTIO PCI devices"
> >> +        depends on VIRTIO_PCI
> >> +        select VFIO_PCI_CORE
> >> +        help
> >> +          This provides support for exposing VIRTIO VF devices using the VFIO
> >> +          framework that can work with a legacy virtio driver in the guest.
> >> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
> >> +          not indicate I/O Space.
> >> +          As of that this driver emulated I/O BAR in software to let a VF be
> >> +          seen as a transitional device in the guest and let it work with
> >> +          a legacy driver.  
> > This description is a little bit subtle to the hard requirements on the
> > device.  Reading this, one might think that this should work for any
> > SR-IOV VF virtio device, when in reality it only support virtio-net
> > currently and places a number of additional requirements on the device
> > (ex. legacy access and MSI-X support).  
> 
> Sure, will change to refer only to virtio-net devices which are capable 
> for 'legacy access'.
> 
> No need to refer to MSI-X, please see below.
> 
> >  
> >> +
> >> +          If you don't know what to do here, say N.
> >> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
> >> new file mode 100644
> >> index 000000000000..2039b39fb723
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/Makefile
> >> @@ -0,0 +1,4 @@
> >> +# SPDX-License-Identifier: GPL-2.0-only
> >> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
> >> +virtio-vfio-pci-y := main.o
> >> +
> >> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
> >> new file mode 100644
> >> index 000000000000..3fef4b21f7e6
> >> --- /dev/null
> >> +++ b/drivers/vfio/pci/virtio/main.c
> >> @@ -0,0 +1,577 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> >> + */
> >> +
> >> +#include <linux/device.h>
> >> +#include <linux/module.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/pci.h>
> >> +#include <linux/pm_runtime.h>
> >> +#include <linux/types.h>
> >> +#include <linux/uaccess.h>
> >> +#include <linux/vfio.h>
> >> +#include <linux/vfio_pci_core.h>
> >> +#include <linux/virtio_pci.h>
> >> +#include <linux/virtio_net.h>
> >> +#include <linux/virtio_pci_admin.h>
> >> +
> >> +struct virtiovf_pci_core_device {
> >> +	struct vfio_pci_core_device core_device;
> >> +	u8 bar0_virtual_buf_size;
> >> +	u8 *bar0_virtual_buf;
> >> +	/* synchronize access to the virtual buf */
> >> +	struct mutex bar_mutex;
> >> +	void __iomem *notify_addr;
> >> +	u32 notify_offset;
> >> +	u8 notify_bar;  
> > Push the above u8 to the end of the structure for better packing.  
> OK
> >> +	u16 pci_cmd;
> >> +	u16 msix_ctrl;
> >> +};
> >> +
> >> +static int
> >> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
> >> +			     loff_t pos, char __user *buf,
> >> +			     size_t count, bool read)
> >> +{
> >> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
> >> +	u16 opcode;
> >> +	int ret;
> >> +
> >> +	mutex_lock(&virtvdev->bar_mutex);
> >> +	if (read) {
> >> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> >> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
> >> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
> >> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
> >> +						      bar0_buf + pos);
> >> +		if (ret)
> >> +			goto out;
> >> +		if (copy_to_user(buf, bar0_buf + pos, count))
> >> +			ret = -EFAULT;
> >> +		goto out;
> >> +	}  
> > TBH, I think the symmetry of read vs write would be more apparent if
> > this were an else branch.  
> OK, will do.
> >> +
> >> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
> >> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
> >> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
> >> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
> >> +					       bar0_buf + pos);
> >> +out:
> >> +	mutex_unlock(&virtvdev->bar_mutex);
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
> >> +			    loff_t pos, char __user *buf,
> >> +			    size_t count, bool read)
> >> +{
> >> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> >> +	u16 queue_notify;
> >> +	int ret;
> >> +
> >> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
> >> +		return -EINVAL;
> >> +
> >> +	switch (pos) {
> >> +	case VIRTIO_PCI_QUEUE_NOTIFY:
> >> +		if (count != sizeof(queue_notify))
> >> +			return -EINVAL;
> >> +		if (read) {
> >> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
> >> +						virtvdev->notify_addr);
> >> +			if (ret)
> >> +				return ret;
> >> +			if (copy_to_user(buf, &queue_notify,
> >> +					 sizeof(queue_notify)))
> >> +				return -EFAULT;
> >> +			break;
> >> +		}  
> > Same.  
> OK
> >> +
> >> +		if (copy_from_user(&queue_notify, buf, count))
> >> +			return -EFAULT;
> >> +
> >> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
> >> +					 virtvdev->notify_addr);
> >> +		break;
> >> +	default:
> >> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
> >> +						   read);
> >> +	}
> >> +
> >> +	return ret ? ret : count;
> >> +}
> >> +
> >> +static bool range_intersect_range(loff_t range1_start, size_t count1,
> >> +				  loff_t range2_start, size_t count2,
> >> +				  loff_t *start_offset,
> >> +				  size_t *intersect_count,
> >> +				  size_t *register_offset)
> >> +{
> >> +	if (range1_start <= range2_start &&
> >> +	    range1_start + count1 > range2_start) {
> >> +		*start_offset = range2_start - range1_start;
> >> +		*intersect_count = min_t(size_t, count2,
> >> +					 range1_start + count1 - range2_start);
> >> +		if (register_offset)
> >> +			*register_offset = 0;
> >> +		return true;
> >> +	}
> >> +
> >> +	if (range1_start > range2_start &&
> >> +	    range1_start < range2_start + count2) {
> >> +		*start_offset = range1_start;
> >> +		*intersect_count = min_t(size_t, count1,
> >> +					 range2_start + count2 - range1_start);
> >> +		if (register_offset)
> >> +			*register_offset = range1_start - range2_start;
> >> +		return true;
> >> +	}  
> > Seems like we're missing a case, and some documentation.
> >
> > The first test requires range1 to fully enclose range2 and provides the
> > offset of range2 within range1 and the length of the intersection.
> >
> > The second test requires range1 to start from a non-zero offset within
> > range2 and returns the absolute offset of range1 and the length of the
> > intersection.
> >
> > The register offset is then non-zero offset of range1 into range2.  So
> > does the caller use the zero value in the previous test to know range2
> > exists within range1?
> >
> > We miss the cases where range1_start is <= range2_start and range1
> > terminates within range2.  
> 
> The first test should cover this case as well of the case of fully 
> enclosing.
> 
> It checks whether range1_start + count1 > range2_start which can 
> terminates also within range2.
> 
> Isn't it ?

Hmm, maybe I read it wrong.  Let me try again...

The first test covers the cases where range1 starts at or below range2
and range1 extends into or through range2.  start_offset describes the
offset into range1 that range2 begins.  The intersect_count is the
extent of the intersection and it's not clear what register_offset
describes since it's zero.

The second test covers the cases where range1 starts within range2.
start_offset is the start of range1, which doesn't seem consistent with
the previous branch usage.  The intersect_count does look consistent
with the previous branch.  register_offset is then the offset of range1
into range2

So I had some things wrong, but I'm still having trouble with a
consistent definition of start_offset and register_offset.


> I may add some documentation for that function as part of V2 as you asked.
> 
> >   I suppose we'll see below how this is used,
> > but it seems asymmetric and incomplete.
> >  
> >> +
> >> +	return false;
> >> +}
> >> +
> >> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >> +					char __user *buf, size_t count,
> >> +					loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	size_t register_offset;
> >> +	loff_t copy_offset;
> >> +	size_t copy_count;
> >> +	__le32 val32;
> >> +	__le16 val16;
> >> +	u8 val8;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, NULL)) {  
> > If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
> > within range2) they'll not enter this branch and see 41 rather than 00.

Yes, this does take the first branch per my second look, so copy_offset
is zero, copy_count is 1.  I think the copy_to_user() works correctly

> > If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
> > contained within range 2), the above function returns a copy_offset of
> > range1_start (ie. 3).  But that offset is applied to the buffer, which
> > is out of bounds.  The function needs to have returned an offset of 1
> > and it should have applied to the val16 address.
> >
> > I don't think this works like it's intended.  
> 
> Is that because of the missing case ?
> Please see my note above.

No, I think my original evaluation of this second case still holds,
copy_offset is wrong.  I suspect what you're trying to do with
start_offset and register_offset is specify the output buffer offset,
ie. relative to range1 or buf, or the input offset, ie. range2 or our
local val variable.  But start_offset is incorrectly calculated in the
second branch above (should always be zero) and the caller didn't ask
for the register offset here, which is seems it always should.

> >> +		val16 = cpu_to_le16(0x1000);  
> > Please #define this somewhere rather than hiding a magic value here.  
> Sure, will just replace to VIRTIO_TRANS_ID_NET.
> >> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
> >> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >> +				  &copy_offset, &copy_count, &register_offset)) {
> >> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
> >> +				   copy_count))
> >> +			return -EFAULT;
> >> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
> >> +				 copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
> >> +				  &copy_offset, &copy_count, NULL)) {
> >> +		/* Transional needs to have revision 0 */
> >> +		val8 = 0;
> >> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
> >> +				  &copy_offset, &copy_count, NULL)) {
> >> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);  
> > I'd still like to see the remainder of the BAR follow the semantics
> > vfio-pci does.  I think this requires a __le32 bar0 field on the
> > virtvdev struct to store writes and the read here would mask the lower
> > bits up to the BAR size and OR in the IO indicator bit.  
> 
> OK, will do.
> 
> >
> >  
> >> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> >> +				  &copy_offset, &copy_count, NULL)) {
> >> +		/*
> >> +		 * Transitional devices use the PCI subsystem device id as
> >> +		 * virtio device id, same as legacy driver always did.  
> > Where did we require the subsystem vendor ID to be 0x1af4?  This
> > subsystem device ID really only makes since given that subsystem
> > vendor ID, right?  Otherwise I don't see that non-transitional devices,
> > such as the VF, have a hard requirement per the spec for the subsystem
> > vendor ID.
> >
> > Do we want to make this only probe the correct subsystem vendor ID or do
> > we want to emulate the subsystem vendor ID as well?  I don't see this is
> > correct without one of those options.  
> 
> Looking in the 1.x spec we can see the below.
> 
> Legacy Interfaces: A Note on PCI Device Discovery
> 
> "Transitional devices MUST have the PCI Subsystem
> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> This is to match legacy drivers."
> 
> However, there is no need to enforce Subsystem Vendor ID.
> 
> This is what we followed here.
> 
> Makes sense ?

So do I understand correctly that virtio dictates the subsystem device
ID for all subsystem vendor IDs that implement a legacy virtio
interface?  Ok, but this device didn't actually implement a legacy
virtio interface.  The device itself is not tranistional, we're imposing
an emulated transitional interface onto it.  So did the subsystem vendor
agree to have their subsystem device ID managed by the virtio committee
or might we create conflicts?  I imagine we know we don't have a
conflict if we also virtualize the subsystem vendor ID.


BTW, it would be a lot easier for all of the config space emulation here
if we could make use of the existing field virtualization in
vfio-pci-core.  In fact you'll see in vfio_config_init() that
PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
simply do the following to report the desired device ID:

	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);

It appears everything in this function could be handled similarly by
vfio-pci-core if the right fields in the perm_bits.virt and .write
bits could be manipulated and vconfig modified appropriately.  I'd look
for a way that a variant driver could provide an alternate set of
permissions structures for various capabilities.  Thanks,

Alex


> >> +		 */
> >> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
> >> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	return count;
> >> +}
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
> >> +		       size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	int ret;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
> >> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
> >> +
> >> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> >> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> +
> >> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> >> +	if (ret) {
> >> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
> >> +				     ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
> >> +	pm_runtime_put(&pdev->dev);
> >> +	return ret;
> >> +}
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
> >> +			size_t count, loff_t *ppos)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
> >> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +	int ret;
> >> +
> >> +	if (!count)
> >> +		return 0;
> >> +
> >> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> >> +		size_t register_offset;
> >> +		loff_t copy_offset;
> >> +		size_t copy_count;
> >> +
> >> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}
> >> +
> >> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
> >> +					  sizeof(virtvdev->msix_ctrl),
> >> +					  &copy_offset, &copy_count,
> >> +					  &register_offset)) {
> >> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
> >> +					   buf + copy_offset,
> >> +					   copy_count))
> >> +				return -EFAULT;
> >> +		}  
> > MSI-X is setup via ioctl, so you're relying on a userspace that writes
> > through the control register bit even though it doesn't do anything.
> > Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
> > enabled?  
> OK, may switch to your suggestion post of testing it.
> >  
> >> +	}
> >> +
> >> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
> >> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
> >> +
> >> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> >> +	if (ret) {
> >> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
> >> +		return -EIO;
> >> +	}
> >> +
> >> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
> >> +	pm_runtime_put(&pdev->dev);
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> >> +				   unsigned int cmd, unsigned long arg)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
> >> +	void __user *uarg = (void __user *)arg;
> >> +	struct vfio_region_info info = {};
> >> +
> >> +	if (copy_from_user(&info, uarg, minsz))
> >> +		return -EFAULT;
> >> +
> >> +	if (info.argsz < minsz)
> >> +		return -EINVAL;
> >> +
> >> +	switch (info.index) {
> >> +	case VFIO_PCI_BAR0_REGION_INDEX:
> >> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> >> +		info.size = virtvdev->bar0_virtual_buf_size;
> >> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
> >> +			     VFIO_REGION_INFO_FLAG_WRITE;
> >> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> >> +	default:
> >> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> >> +	}
> >> +}
> >> +
> >> +static long
> >> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> >> +			     unsigned long arg)
> >> +{
> >> +	switch (cmd) {
> >> +	case VFIO_DEVICE_GET_REGION_INFO:
> >> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
> >> +	default:
> >> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> >> +	}
> >> +}
> >> +
> >> +static int
> >> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
> >> +{
> >> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
> >> +	int ret;
> >> +
> >> +	/*
> >> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
> >> +	 * This will let us mmap it only once and use it when needed.
> >> +	 */
> >> +	ret = vfio_pci_core_setup_barmap(core_device,
> >> +					 virtvdev->notify_bar);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
> >> +			virtvdev->notify_offset;
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_enable(vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	if (virtvdev->bar0_virtual_buf) {
> >> +		/*
> >> +		 * Upon close_device() the vfio_pci_core_disable() is called
> >> +		 * and will close all the previous mmaps, so it seems that the
> >> +		 * valid life cycle for the 'notify' addr is per open/close.
> >> +		 */
> >> +		ret = virtiovf_set_notify_addr(virtvdev);
> >> +		if (ret) {
> >> +			vfio_pci_core_disable(vdev);
> >> +			return ret;
> >> +		}
> >> +	}
> >> +
> >> +	vfio_pci_core_finish_enable(vdev);
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_get_device_config_size(unsigned short device)
> >> +{
> >> +	/* Network card */
> >> +	return offsetofend(struct virtio_net_config, status);
> >> +}
> >> +
> >> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
> >> +{
> >> +	u64 offset;
> >> +	int ret;
> >> +	u8 bar;
> >> +
> >> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
> >> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
> >> +				&bar, &offset);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	virtvdev->notify_bar = bar;
> >> +	virtvdev->notify_offset = offset;
> >> +	return 0;
> >> +}
> >> +
> >> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +	struct pci_dev *pdev;
> >> +	int ret;
> >> +
> >> +	ret = vfio_pci_core_init_dev(core_vdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	pdev = virtvdev->core_device.pdev;
> >> +	ret = virtiovf_read_notify_info(virtvdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* Being ready with a buffer that supports MSIX */
> >> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> >> +				virtiovf_get_device_config_size(pdev->device);
> >> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
> >> +					     GFP_KERNEL);
> >> +	if (!virtvdev->bar0_virtual_buf)
> >> +		return -ENOMEM;
> >> +	mutex_init(&virtvdev->bar_mutex);
> >> +	return 0;
> >> +}
> >> +
> >> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = container_of(
> >> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> +
> >> +	kfree(virtvdev->bar0_virtual_buf);
> >> +	vfio_pci_core_release_dev(core_vdev);
> >> +}
> >> +
> >> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
> >> +	.name = "virtio-transitional-vfio-pci",
> >> +	.init = virtiovf_pci_init_device,
> >> +	.release = virtiovf_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >> +	.close_device = vfio_pci_core_close_device,
> >> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
> >> +	.read = virtiovf_pci_core_read,
> >> +	.write = virtiovf_pci_core_write,
> >> +	.mmap = vfio_pci_core_mmap,
> >> +	.request = vfio_pci_core_request,
> >> +	.match = vfio_pci_core_match,
> >> +	.bind_iommufd = vfio_iommufd_physical_bind,
> >> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> >> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> >> +};
> >> +
> >> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
> >> +	.name = "virtio-acc-vfio-pci",
> >> +	.init = vfio_pci_core_init_dev,
> >> +	.release = vfio_pci_core_release_dev,
> >> +	.open_device = virtiovf_pci_open_device,
> >> +	.close_device = vfio_pci_core_close_device,
> >> +	.ioctl = vfio_pci_core_ioctl,
> >> +	.device_feature = vfio_pci_core_ioctl_feature,
> >> +	.read = vfio_pci_core_read,
> >> +	.write = vfio_pci_core_write,
> >> +	.mmap = vfio_pci_core_mmap,
> >> +	.request = vfio_pci_core_request,
> >> +	.match = vfio_pci_core_match,
> >> +	.bind_iommufd = vfio_iommufd_physical_bind,
> >> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
> >> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
> >> +};
> >> +
> >> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
> >> +{
> >> +	struct resource *res = pdev->resource;
> >> +
> >> +	return res->flags ? true : false;
> >> +}
> >> +
> >> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
> >> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
> >> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
> >> +
> >> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
> >> +{
> >> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
> >> +	u8 *buf;
> >> +	int ret;
> >> +
> >> +	buf = kzalloc(buf_size, GFP_KERNEL);
> >> +	if (!buf)
> >> +		return false;
> >> +
> >> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
> >> +	if (ret)
> >> +		goto end;
> >> +
> >> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
> >> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
> >> +		ret = -EOPNOTSUPP;
> >> +		goto end;
> >> +	}
> >> +
> >> +	/* Confirm the used commands */
> >> +	memset(buf, 0, buf_size);
> >> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
> >> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
> >> +end:
> >> +	kfree(buf);
> >> +	return ret ? false : true;
> >> +}
> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)  
> >
> > All but the last test here are fairly evident requirements of the
> > driver.  Why do we require a device that supports MSI-X?  
> 
> As now we check at run time to decide whether MSI-X is enabled/disabled 
> to pick-up the correct op code, no need for that any more.
> 
> Will drop this MSI-X check from V2.
> 
> Thanks,
> Yishai
> 
> >
> > Thanks,
> > Alex
> >
> >  
> >> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
> >> +
> >> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> >> +				     &pdev->dev, ops);
> >> +	if (IS_ERR(virtvdev))
> >> +		return PTR_ERR(virtvdev);
> >> +
> >> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> >> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
> >> +	if (ret)
> >> +		goto out;
> >> +	return 0;
> >> +out:
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +	return ret;
> >> +}
> >> +
> >> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +}
> >> +
> >> +static const struct pci_device_id virtiovf_pci_table[] = {
> >> +	/* Only virtio-net is supported/tested so far */
> >> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
> >> +	{}
> >> +};
> >> +
> >> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> >> +
> >> +static struct pci_driver virtiovf_pci_driver = {
> >> +	.name = KBUILD_MODNAME,
> >> +	.id_table = virtiovf_pci_table,
> >> +	.probe = virtiovf_pci_probe,
> >> +	.remove = virtiovf_pci_remove,
> >> +	.err_handler = &vfio_pci_core_err_handlers,
> >> +	.driver_managed_dma = true,
> >> +};
> >> +
> >> +module_pci_driver(virtiovf_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
> >> +MODULE_DESCRIPTION(
> >> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");  
> 
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-25 19:13         ` Alex Williamson
@ 2023-10-26 12:08           ` Yishai Hadas via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-26 12:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On 25/10/2023 22:13, Alex Williamson wrote:
> On Wed, 25 Oct 2023 17:35:51 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> On 24/10/2023 22:57, Alex Williamson wrote:
>>> On Tue, 17 Oct 2023 16:42:17 +0300
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>   
>>>> Introduce a vfio driver over virtio devices to support the legacy
>>>> interface functionality for VFs.
>>>>
>>>> Background, from the virtio spec [1].
>>>> --------------------------------------------------------------------
>>>> In some systems, there is a need to support a virtio legacy driver with
>>>> a device that does not directly support the legacy interface. In such
>>>> scenarios, a group owner device can provide the legacy interface
>>>> functionality for the group member devices. The driver of the owner
>>>> device can then access the legacy interface of a member device on behalf
>>>> of the legacy member device driver.
>>>>
>>>> For example, with the SR-IOV group type, group members (VFs) can not
>>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>>>> legacy pci driver. If the legacy driver is running inside a virtual
>>>> machine, the hypervisor executing the virtual machine can present a
>>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>>>> legacy driver accesses to this I/O BAR and forwards them to the group
>>>> owner device (PF) using group administration commands.
>>>> --------------------------------------------------------------------
>>>>
>>>> Specifically, this driver adds support for a virtio-net VF to be exposed
>>>> as a transitional device to a guest driver and allows the legacy IO BAR
>>>> functionality on top.
>>>>
>>>> This allows a VM which uses a legacy virtio-net driver in the guest to
>>>> work transparently over a VF which its driver in the host is that new
>>>> driver.
>>>>
>>>> The driver can be extended easily to support some other types of virtio
>>>> devices (e.g virtio-blk), by adding in a few places the specific type
>>>> properties as was done for virtio-net.
>>>>
>>>> For now, only the virtio-net use case was tested and as such we introduce
>>>> the support only for such a device.
>>>>
>>>> Practically,
>>>> Upon probing a VF for a virtio-net device, in case its PF supports
>>>> legacy access over the virtio admin commands and the VF doesn't have BAR
>>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>>>> transitional device with I/O BAR in BAR 0.
>>>>
>>>> The existence of the simulated I/O bar is reported later on by
>>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>>>> exposes itself as a transitional device by overwriting some properties
>>>> upon reading its config space.
>>>>
>>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>>>> guest may use it via read/write calls according to the virtio
>>>> specification.
>>>>
>>>> Any read/write towards the control parts of the BAR will be captured by
>>>> the new driver and will be translated into admin commands towards the
>>>> device.
>>>>
>>>> Any data path read/write access (i.e. virtio driver notifications) will
>>>> be forwarded to the physical BAR which its properties were supplied by
>>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>>>> probing/init flow.
>>>>
>>>> With that code in place a legacy driver in the guest has the look and
>>>> feel as if having a transitional device with legacy support for both its
>>>> control and data path flows.
>>>>
>>>> [1]
>>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>>>
>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>> ---
>>>>    MAINTAINERS                      |   7 +
>>>>    drivers/vfio/pci/Kconfig         |   2 +
>>>>    drivers/vfio/pci/Makefile        |   2 +
>>>>    drivers/vfio/pci/virtio/Kconfig  |  15 +
>>>>    drivers/vfio/pci/virtio/Makefile |   4 +
>>>>    drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
>>>>    6 files changed, 607 insertions(+)
>>>>    create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>>>    create mode 100644 drivers/vfio/pci/virtio/Makefile
>>>>    create mode 100644 drivers/vfio/pci/virtio/main.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 7a7bd8bd80e9..680a70063775 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
>>>>    S:	Maintained
>>>>    F:	drivers/vfio/pci/mlx5/
>>>>    
>>>> +VFIO VIRTIO PCI DRIVER
>>>> +M:	Yishai Hadas <yishaih@nvidia.com>
>>>> +L:	kvm@vger.kernel.org
>>>> +L:	virtualization@lists.linux-foundation.org
>>>> +S:	Maintained
>>>> +F:	drivers/vfio/pci/virtio
>>>> +
>>>>    VFIO PCI DEVICE SPECIFIC DRIVERS
>>>>    R:	Jason Gunthorpe <jgg@nvidia.com>
>>>>    R:	Yishai Hadas <yishaih@nvidia.com>
>>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>>>> index 8125e5f37832..18c397df566d 100644
>>>> --- a/drivers/vfio/pci/Kconfig
>>>> +++ b/drivers/vfio/pci/Kconfig
>>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>>>    
>>>>    source "drivers/vfio/pci/pds/Kconfig"
>>>>    
>>>> +source "drivers/vfio/pci/virtio/Kconfig"
>>>> +
>>>>    endmenu
>>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>>>> index 45167be462d8..046139a4eca5 100644
>>>> --- a/drivers/vfio/pci/Makefile
>>>> +++ b/drivers/vfio/pci/Makefile
>>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>>>    obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>>>    
>>>>    obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>>>> +
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>>>> new file mode 100644
>>>> index 000000000000..89eddce8b1bd
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Kconfig
>>>> @@ -0,0 +1,15 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +config VIRTIO_VFIO_PCI
>>>> +        tristate "VFIO support for VIRTIO PCI devices"
>>>> +        depends on VIRTIO_PCI
>>>> +        select VFIO_PCI_CORE
>>>> +        help
>>>> +          This provides support for exposing VIRTIO VF devices using the VFIO
>>>> +          framework that can work with a legacy virtio driver in the guest.
>>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>>>> +          not indicate I/O Space.
>>>> +          As of that this driver emulated I/O BAR in software to let a VF be
>>>> +          seen as a transitional device in the guest and let it work with
>>>> +          a legacy driver.
>>> This description is a little bit subtle to the hard requirements on the
>>> device.  Reading this, one might think that this should work for any
>>> SR-IOV VF virtio device, when in reality it only support virtio-net
>>> currently and places a number of additional requirements on the device
>>> (ex. legacy access and MSI-X support).
>> Sure, will change to refer only to virtio-net devices which are capable
>> for 'legacy access'.
>>
>> No need to refer to MSI-X, please see below.
>>
>>>   
>>>> +
>>>> +          If you don't know what to do here, say N.
>>>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>>>> new file mode 100644
>>>> index 000000000000..2039b39fb723
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Makefile
>>>> @@ -0,0 +1,4 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>>>> +virtio-vfio-pci-y := main.o
>>>> +
>>>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>>>> new file mode 100644
>>>> index 000000000000..3fef4b21f7e6
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/main.c
>>>> @@ -0,0 +1,577 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>>>> + */
>>>> +
>>>> +#include <linux/device.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/pm_runtime.h>
>>>> +#include <linux/types.h>
>>>> +#include <linux/uaccess.h>
>>>> +#include <linux/vfio.h>
>>>> +#include <linux/vfio_pci_core.h>
>>>> +#include <linux/virtio_pci.h>
>>>> +#include <linux/virtio_net.h>
>>>> +#include <linux/virtio_pci_admin.h>
>>>> +
>>>> +struct virtiovf_pci_core_device {
>>>> +	struct vfio_pci_core_device core_device;
>>>> +	u8 bar0_virtual_buf_size;
>>>> +	u8 *bar0_virtual_buf;
>>>> +	/* synchronize access to the virtual buf */
>>>> +	struct mutex bar_mutex;
>>>> +	void __iomem *notify_addr;
>>>> +	u32 notify_offset;
>>>> +	u8 notify_bar;
>>> Push the above u8 to the end of the structure for better packing.
>> OK
>>>> +	u16 pci_cmd;
>>>> +	u16 msix_ctrl;
>>>> +};
>>>> +
>>>> +static int
>>>> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
>>>> +			     loff_t pos, char __user *buf,
>>>> +			     size_t count, bool read)
>>>> +{
>>>> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>>>> +	u16 opcode;
>>>> +	int ret;
>>>> +
>>>> +	mutex_lock(&virtvdev->bar_mutex);
>>>> +	if (read) {
>>>> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
>>>> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
>>>> +						      bar0_buf + pos);
>>>> +		if (ret)
>>>> +			goto out;
>>>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>>>> +			ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>> TBH, I think the symmetry of read vs write would be more apparent if
>>> this were an else branch.
>> OK, will do.
>>>> +
>>>> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
>>>> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
>>>> +					       bar0_buf + pos);
>>>> +out:
>>>> +	mutex_unlock(&virtvdev->bar_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>>>> +			    loff_t pos, char __user *buf,
>>>> +			    size_t count, bool read)
>>>> +{
>>>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>>>> +	u16 queue_notify;
>>>> +	int ret;
>>>> +
>>>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (pos) {
>>>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>>>> +		if (count != sizeof(queue_notify))
>>>> +			return -EINVAL;
>>>> +		if (read) {
>>>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>>>> +						virtvdev->notify_addr);
>>>> +			if (ret)
>>>> +				return ret;
>>>> +			if (copy_to_user(buf, &queue_notify,
>>>> +					 sizeof(queue_notify)))
>>>> +				return -EFAULT;
>>>> +			break;
>>>> +		}
>>> Same.
>> OK
>>>> +
>>>> +		if (copy_from_user(&queue_notify, buf, count))
>>>> +			return -EFAULT;
>>>> +
>>>> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>>>> +					 virtvdev->notify_addr);
>>>> +		break;
>>>> +	default:
>>>> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
>>>> +						   read);
>>>> +	}
>>>> +
>>>> +	return ret ? ret : count;
>>>> +}
>>>> +
>>>> +static bool range_intersect_range(loff_t range1_start, size_t count1,
>>>> +				  loff_t range2_start, size_t count2,
>>>> +				  loff_t *start_offset,
>>>> +				  size_t *intersect_count,
>>>> +				  size_t *register_offset)
>>>> +{
>>>> +	if (range1_start <= range2_start &&
>>>> +	    range1_start + count1 > range2_start) {
>>>> +		*start_offset = range2_start - range1_start;
>>>> +		*intersect_count = min_t(size_t, count2,
>>>> +					 range1_start + count1 - range2_start);
>>>> +		if (register_offset)
>>>> +			*register_offset = 0;
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	if (range1_start > range2_start &&
>>>> +	    range1_start < range2_start + count2) {
>>>> +		*start_offset = range1_start;
>>>> +		*intersect_count = min_t(size_t, count1,
>>>> +					 range2_start + count2 - range1_start);
>>>> +		if (register_offset)
>>>> +			*register_offset = range1_start - range2_start;
>>>> +		return true;
>>>> +	}
>>> Seems like we're missing a case, and some documentation.
>>>
>>> The first test requires range1 to fully enclose range2 and provides the
>>> offset of range2 within range1 and the length of the intersection.
>>>
>>> The second test requires range1 to start from a non-zero offset within
>>> range2 and returns the absolute offset of range1 and the length of the
>>> intersection.
>>>
>>> The register offset is then non-zero offset of range1 into range2.  So
>>> does the caller use the zero value in the previous test to know range2
>>> exists within range1?
>>>
>>> We miss the cases where range1_start is <= range2_start and range1
>>> terminates within range2.
>> The first test should cover this case as well of the case of fully
>> enclosing.
>>
>> It checks whether range1_start + count1 > range2_start which can
>> terminates also within range2.
>>
>> Isn't it ?
> Hmm, maybe I read it wrong.  Let me try again...
>
> The first test covers the cases where range1 starts at or below range2
> and range1 extends into or through range2.  start_offset describes the
> offset into range1 that range2 begins.  The intersect_count is the
> extent of the intersection and it's not clear what register_offset
> describes since it's zero.
>
> The second test covers the cases where range1 starts within range2.
> start_offset is the start of range1, which doesn't seem consistent with
> the previous branch usage.

Right, start_offest needs to be 0 in that second branch.


>   The intersect_count does look consistent
> with the previous branch.  register_offset is then the offset of range1
> into range2
>
> So I had some things wrong, but I'm still having trouble with a
> consistent definition of start_offset and register_offset.
>
>
>> I may add some documentation for that function as part of V2 as you asked.
>>
>>>    I suppose we'll see below how this is used,
>>> but it seems asymmetric and incomplete.
>>>   
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>>>> +					char __user *buf, size_t count,
>>>> +					loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	size_t register_offset;
>>>> +	loff_t copy_offset;
>>>> +	size_t copy_count;
>>>> +	__le32 val32;
>>>> +	__le16 val16;
>>>> +	u8 val8;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>> If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
>>> within range2) they'll not enter this branch and see 41 rather than 00.
> Yes, this does take the first branch per my second look, so copy_offset
> is zero, copy_count is 1.  I think the copy_to_user() works correctly
>
>>> If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
>>> contained within range 2), the above function returns a copy_offset of
>>> range1_start (ie. 3).  But that offset is applied to the buffer, which
>>> is out of bounds.  The function needs to have returned an offset of 1
>>> and it should have applied to the val16 address.
>>>
>>> I don't think this works like it's intended.
>> Is that because of the missing case ?
>> Please see my note above.
> No, I think my original evaluation of this second case still holds,
> copy_offset is wrong.  I suspect what you're trying to do with
> start_offset and register_offset is specify the output buffer offset,
> ie. relative to range1 or buf, or the input offset, ie. range2 or our
> local val variable.  But start_offset is incorrectly calculated in the
> second branch above (should always be zero) and the caller didn't ask
> for the register offset here, which is seems it always should.

OK,  I got what you said.

Yes, start_offset should always be 0 in the second branch, and the 
caller may need always to ask for the register_offest and use it.

As current QEMU code doesn't read partial register/field we didn't get 
to the second branch and to the above issues.

I will fix as part of V2, it should be a simple change.

>
>>>> +		val16 = cpu_to_le16(0x1000);
>>> Please #define this somewhere rather than hiding a magic value here.
>> Sure, will just replace to VIRTIO_TRANS_ID_NET.
>>>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
>>>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
>>>> +				   copy_count))
>>>> +			return -EFAULT;
>>>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>>>> +				 copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>> +		/* Transional needs to have revision 0 */
>>>> +		val8 = 0;
>>>> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
>>> I'd still like to see the remainder of the BAR follow the semantics
>>> vfio-pci does.  I think this requires a __le32 bar0 field on the
>>> virtvdev struct to store writes and the read here would mask the lower
>>> bits up to the BAR size and OR in the IO indicator bit.
>> OK, will do.
>>
>>>   
>>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>> +		/*
>>>> +		 * Transitional devices use the PCI subsystem device id as
>>>> +		 * virtio device id, same as legacy driver always did.
>>> Where did we require the subsystem vendor ID to be 0x1af4?  This
>>> subsystem device ID really only makes since given that subsystem
>>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
>>> such as the VF, have a hard requirement per the spec for the subsystem
>>> vendor ID.
>>>
>>> Do we want to make this only probe the correct subsystem vendor ID or do
>>> we want to emulate the subsystem vendor ID as well?  I don't see this is
>>> correct without one of those options.
>> Looking in the 1.x spec we can see the below.
>>
>> Legacy Interfaces: A Note on PCI Device Discovery
>>
>> "Transitional devices MUST have the PCI Subsystem
>> Device ID matching the Virtio Device ID, as indicated in section 5 ...
>> This is to match legacy drivers."
>>
>> However, there is no need to enforce Subsystem Vendor ID.
>>
>> This is what we followed here.
>>
>> Makes sense ?
> So do I understand correctly that virtio dictates the subsystem device
> ID for all subsystem vendor IDs that implement a legacy virtio
> interface?  Ok, but this device didn't actually implement a legacy
> virtio interface.  The device itself is not tranistional, we're imposing
> an emulated transitional interface onto it.  So did the subsystem vendor
> agree to have their subsystem device ID managed by the virtio committee
> or might we create conflicts?  I imagine we know we don't have a
> conflict if we also virtualize the subsystem vendor ID.
>
The non transitional net device in the virtio spec defined as the below 
tuple.
T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.

And transitional net device in the virtio spec for a vendor FOO is 
defined as:
T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1

This driver is converting T_A to T_B, which both are defined by the 
virtio spec.
Hence, it does not conflict for the subsystem vendor, it is fine.
> BTW, it would be a lot easier for all of the config space emulation here
> if we could make use of the existing field virtualization in
> vfio-pci-core.  In fact you'll see in vfio_config_init() that
> PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
> simply do the following to report the desired device ID:
>
> 	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);

I would prefer keeping things simple and have one place/flow that 
handles all the fields as we have now as part of the driver.

In any case, I'll further look at that option for managing the DEVICE_ID 
towards V2.

> It appears everything in this function could be handled similarly by
> vfio-pci-core if the right fields in the perm_bits.virt and .write
> bits could be manipulated and vconfig modified appropriately.  I'd look
> for a way that a variant driver could provide an alternate set of
> permissions structures for various capabilities.  Thanks,

OK

However, let's not block V2 and the series acceptance as of that.

It can always be some future refactoring as part of other series that 
will bring the infra-structure that is needed for that.

Yishai

>
> Alex
>
>
>>>> +		 */
>>>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>>>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	return count;
>>>> +}
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>>>> +		       size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	int ret;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>>>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>>>> +
>>>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>>>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +
>>>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>>>> +	if (ret) {
>>>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>>>> +				     ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
>>>> +	pm_runtime_put(&pdev->dev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>>>> +			size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	int ret;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>>>> +		size_t register_offset;
>>>> +		loff_t copy_offset;
>>>> +		size_t copy_count;
>>>> +
>>>> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>>> +
>>>> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
>>>> +					  sizeof(virtvdev->msix_ctrl),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>> MSI-X is setup via ioctl, so you're relying on a userspace that writes
>>> through the control register bit even though it doesn't do anything.
>>> Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
>>> enabled?
>> OK, may switch to your suggestion post of testing it.
>>>   
>>>> +	}
>>>> +
>>>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>>>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>>>> +
>>>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>>>> +	if (ret) {
>>>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>>>> +	pm_runtime_put(&pdev->dev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>>>> +				   unsigned int cmd, unsigned long arg)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>>>> +	void __user *uarg = (void __user *)arg;
>>>> +	struct vfio_region_info info = {};
>>>> +
>>>> +	if (copy_from_user(&info, uarg, minsz))
>>>> +		return -EFAULT;
>>>> +
>>>> +	if (info.argsz < minsz)
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (info.index) {
>>>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>>>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>>>> +		info.size = virtvdev->bar0_virtual_buf_size;
>>>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>>>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>>>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>>>> +	default:
>>>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>>>> +	}
>>>> +}
>>>> +
>>>> +static long
>>>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>>>> +			     unsigned long arg)
>>>> +{
>>>> +	switch (cmd) {
>>>> +	case VFIO_DEVICE_GET_REGION_INFO:
>>>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>>>> +	default:
>>>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>>>> +	}
>>>> +}
>>>> +
>>>> +static int
>>>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>>>> +{
>>>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>>>> +	int ret;
>>>> +
>>>> +	/*
>>>> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
>>>> +	 * This will let us mmap it only once and use it when needed.
>>>> +	 */
>>>> +	ret = vfio_pci_core_setup_barmap(core_device,
>>>> +					 virtvdev->notify_bar);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>>>> +			virtvdev->notify_offset;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_enable(vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	if (virtvdev->bar0_virtual_buf) {
>>>> +		/*
>>>> +		 * Upon close_device() the vfio_pci_core_disable() is called
>>>> +		 * and will close all the previous mmaps, so it seems that the
>>>> +		 * valid life cycle for the 'notify' addr is per open/close.
>>>> +		 */
>>>> +		ret = virtiovf_set_notify_addr(virtvdev);
>>>> +		if (ret) {
>>>> +			vfio_pci_core_disable(vdev);
>>>> +			return ret;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	vfio_pci_core_finish_enable(vdev);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_get_device_config_size(unsigned short device)
>>>> +{
>>>> +	/* Network card */
>>>> +	return offsetofend(struct virtio_net_config, status);
>>>> +}
>>>> +
>>>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>>>> +{
>>>> +	u64 offset;
>>>> +	int ret;
>>>> +	u8 bar;
>>>> +
>>>> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
>>>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>>>> +				&bar, &offset);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	virtvdev->notify_bar = bar;
>>>> +	virtvdev->notify_offset = offset;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_init_dev(core_vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	pdev = virtvdev->core_device.pdev;
>>>> +	ret = virtiovf_read_notify_info(virtvdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* Being ready with a buffer that supports MSIX */
>>>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>>>> +				virtiovf_get_device_config_size(pdev->device);
>>>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>>>> +					     GFP_KERNEL);
>>>> +	if (!virtvdev->bar0_virtual_buf)
>>>> +		return -ENOMEM;
>>>> +	mutex_init(&virtvdev->bar_mutex);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +
>>>> +	kfree(virtvdev->bar0_virtual_buf);
>>>> +	vfio_pci_core_release_dev(core_vdev);
>>>> +}
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
>>>> +	.name = "virtio-transitional-vfio-pci",
>>>> +	.init = virtiovf_pci_init_device,
>>>> +	.release = virtiovf_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>> +	.close_device = vfio_pci_core_close_device,
>>>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>>>> +	.read = virtiovf_pci_core_read,
>>>> +	.write = virtiovf_pci_core_write,
>>>> +	.mmap = vfio_pci_core_mmap,
>>>> +	.request = vfio_pci_core_request,
>>>> +	.match = vfio_pci_core_match,
>>>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>>>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>>>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>>>> +};
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
>>>> +	.name = "virtio-acc-vfio-pci",
>>>> +	.init = vfio_pci_core_init_dev,
>>>> +	.release = vfio_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>> +	.close_device = vfio_pci_core_close_device,
>>>> +	.ioctl = vfio_pci_core_ioctl,
>>>> +	.device_feature = vfio_pci_core_ioctl_feature,
>>>> +	.read = vfio_pci_core_read,
>>>> +	.write = vfio_pci_core_write,
>>>> +	.mmap = vfio_pci_core_mmap,
>>>> +	.request = vfio_pci_core_request,
>>>> +	.match = vfio_pci_core_match,
>>>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>>>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>>>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>>>> +};
>>>> +
>>>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>>>> +{
>>>> +	struct resource *res = pdev->resource;
>>>> +
>>>> +	return res->flags ? true : false;
>>>> +}
>>>> +
>>>> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
>>>> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>> +
>>>> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
>>>> +{
>>>> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
>>>> +	u8 *buf;
>>>> +	int ret;
>>>> +
>>>> +	buf = kzalloc(buf_size, GFP_KERNEL);
>>>> +	if (!buf)
>>>> +		return false;
>>>> +
>>>> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
>>>> +	if (ret)
>>>> +		goto end;
>>>> +
>>>> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
>>>> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
>>>> +		ret = -EOPNOTSUPP;
>>>> +		goto end;
>>>> +	}
>>>> +
>>>> +	/* Confirm the used commands */
>>>> +	memset(buf, 0, buf_size);
>>>> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
>>>> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
>>>> +end:
>>>> +	kfree(buf);
>>>> +	return ret ? false : true;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>>>> +			      const struct pci_device_id *id)
>>>> +{
>>>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>>>> +	struct virtiovf_pci_core_device *virtvdev;
>>>> +	int ret;
>>>> +
>>>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>>>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>>> All but the last test here are fairly evident requirements of the
>>> driver.  Why do we require a device that supports MSI-X?
>> As now we check at run time to decide whether MSI-X is enabled/disabled
>> to pick-up the correct op code, no need for that any more.
>>
>> Will drop this MSI-X check from V2.
>>
>> Thanks,
>> Yishai
>>
>>> Thanks,
>>> Alex
>>>
>>>   
>>>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>>>> +
>>>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>>>> +				     &pdev->dev, ops);
>>>> +	if (IS_ERR(virtvdev))
>>>> +		return PTR_ERR(virtvdev);
>>>> +
>>>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>>>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +	return 0;
>>>> +out:
>>>> +	vfio_put_device(&virtvdev->core_device.vdev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>>>> +
>>>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>>>> +	vfio_put_device(&virtvdev->core_device.vdev);
>>>> +}
>>>> +
>>>> +static const struct pci_device_id virtiovf_pci_table[] = {
>>>> +	/* Only virtio-net is supported/tested so far */
>>>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>>>> +	{}
>>>> +};
>>>> +
>>>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>>>> +
>>>> +static struct pci_driver virtiovf_pci_driver = {
>>>> +	.name = KBUILD_MODNAME,
>>>> +	.id_table = virtiovf_pci_table,
>>>> +	.probe = virtiovf_pci_probe,
>>>> +	.remove = virtiovf_pci_remove,
>>>> +	.err_handler = &vfio_pci_core_err_handlers,
>>>> +	.driver_managed_dma = true,
>>>> +};
>>>> +
>>>> +module_pci_driver(virtiovf_pci_driver);
>>>> +
>>>> +MODULE_LICENSE("GPL");
>>>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>>>> +MODULE_DESCRIPTION(
>>>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
>>


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 12:08           ` Yishai Hadas via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-26 12:08 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On 25/10/2023 22:13, Alex Williamson wrote:
> On Wed, 25 Oct 2023 17:35:51 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> On 24/10/2023 22:57, Alex Williamson wrote:
>>> On Tue, 17 Oct 2023 16:42:17 +0300
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>   
>>>> Introduce a vfio driver over virtio devices to support the legacy
>>>> interface functionality for VFs.
>>>>
>>>> Background, from the virtio spec [1].
>>>> --------------------------------------------------------------------
>>>> In some systems, there is a need to support a virtio legacy driver with
>>>> a device that does not directly support the legacy interface. In such
>>>> scenarios, a group owner device can provide the legacy interface
>>>> functionality for the group member devices. The driver of the owner
>>>> device can then access the legacy interface of a member device on behalf
>>>> of the legacy member device driver.
>>>>
>>>> For example, with the SR-IOV group type, group members (VFs) can not
>>>> present the legacy interface in an I/O BAR in BAR0 as expected by the
>>>> legacy pci driver. If the legacy driver is running inside a virtual
>>>> machine, the hypervisor executing the virtual machine can present a
>>>> virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
>>>> legacy driver accesses to this I/O BAR and forwards them to the group
>>>> owner device (PF) using group administration commands.
>>>> --------------------------------------------------------------------
>>>>
>>>> Specifically, this driver adds support for a virtio-net VF to be exposed
>>>> as a transitional device to a guest driver and allows the legacy IO BAR
>>>> functionality on top.
>>>>
>>>> This allows a VM which uses a legacy virtio-net driver in the guest to
>>>> work transparently over a VF which its driver in the host is that new
>>>> driver.
>>>>
>>>> The driver can be extended easily to support some other types of virtio
>>>> devices (e.g virtio-blk), by adding in a few places the specific type
>>>> properties as was done for virtio-net.
>>>>
>>>> For now, only the virtio-net use case was tested and as such we introduce
>>>> the support only for such a device.
>>>>
>>>> Practically,
>>>> Upon probing a VF for a virtio-net device, in case its PF supports
>>>> legacy access over the virtio admin commands and the VF doesn't have BAR
>>>> 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
>>>> transitional device with I/O BAR in BAR 0.
>>>>
>>>> The existence of the simulated I/O bar is reported later on by
>>>> overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
>>>> exposes itself as a transitional device by overwriting some properties
>>>> upon reading its config space.
>>>>
>>>> Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
>>>> guest may use it via read/write calls according to the virtio
>>>> specification.
>>>>
>>>> Any read/write towards the control parts of the BAR will be captured by
>>>> the new driver and will be translated into admin commands towards the
>>>> device.
>>>>
>>>> Any data path read/write access (i.e. virtio driver notifications) will
>>>> be forwarded to the physical BAR which its properties were supplied by
>>>> the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
>>>> probing/init flow.
>>>>
>>>> With that code in place a legacy driver in the guest has the look and
>>>> feel as if having a transitional device with legacy support for both its
>>>> control and data path flows.
>>>>
>>>> [1]
>>>> https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
>>>>
>>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>>> ---
>>>>    MAINTAINERS                      |   7 +
>>>>    drivers/vfio/pci/Kconfig         |   2 +
>>>>    drivers/vfio/pci/Makefile        |   2 +
>>>>    drivers/vfio/pci/virtio/Kconfig  |  15 +
>>>>    drivers/vfio/pci/virtio/Makefile |   4 +
>>>>    drivers/vfio/pci/virtio/main.c   | 577 +++++++++++++++++++++++++++++++
>>>>    6 files changed, 607 insertions(+)
>>>>    create mode 100644 drivers/vfio/pci/virtio/Kconfig
>>>>    create mode 100644 drivers/vfio/pci/virtio/Makefile
>>>>    create mode 100644 drivers/vfio/pci/virtio/main.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 7a7bd8bd80e9..680a70063775 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -22620,6 +22620,13 @@ L:	kvm@vger.kernel.org
>>>>    S:	Maintained
>>>>    F:	drivers/vfio/pci/mlx5/
>>>>    
>>>> +VFIO VIRTIO PCI DRIVER
>>>> +M:	Yishai Hadas <yishaih@nvidia.com>
>>>> +L:	kvm@vger.kernel.org
>>>> +L:	virtualization@lists.linux-foundation.org
>>>> +S:	Maintained
>>>> +F:	drivers/vfio/pci/virtio
>>>> +
>>>>    VFIO PCI DEVICE SPECIFIC DRIVERS
>>>>    R:	Jason Gunthorpe <jgg@nvidia.com>
>>>>    R:	Yishai Hadas <yishaih@nvidia.com>
>>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>>>> index 8125e5f37832..18c397df566d 100644
>>>> --- a/drivers/vfio/pci/Kconfig
>>>> +++ b/drivers/vfio/pci/Kconfig
>>>> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>>>>    
>>>>    source "drivers/vfio/pci/pds/Kconfig"
>>>>    
>>>> +source "drivers/vfio/pci/virtio/Kconfig"
>>>> +
>>>>    endmenu
>>>> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
>>>> index 45167be462d8..046139a4eca5 100644
>>>> --- a/drivers/vfio/pci/Makefile
>>>> +++ b/drivers/vfio/pci/Makefile
>>>> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>>>>    obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>>>>    
>>>>    obj-$(CONFIG_PDS_VFIO_PCI) += pds/
>>>> +
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
>>>> diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
>>>> new file mode 100644
>>>> index 000000000000..89eddce8b1bd
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Kconfig
>>>> @@ -0,0 +1,15 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +config VIRTIO_VFIO_PCI
>>>> +        tristate "VFIO support for VIRTIO PCI devices"
>>>> +        depends on VIRTIO_PCI
>>>> +        select VFIO_PCI_CORE
>>>> +        help
>>>> +          This provides support for exposing VIRTIO VF devices using the VFIO
>>>> +          framework that can work with a legacy virtio driver in the guest.
>>>> +          Based on PCIe spec, VFs do not support I/O Space; thus, VF BARs shall
>>>> +          not indicate I/O Space.
>>>> +          As of that this driver emulated I/O BAR in software to let a VF be
>>>> +          seen as a transitional device in the guest and let it work with
>>>> +          a legacy driver.
>>> This description is a little bit subtle to the hard requirements on the
>>> device.  Reading this, one might think that this should work for any
>>> SR-IOV VF virtio device, when in reality it only support virtio-net
>>> currently and places a number of additional requirements on the device
>>> (ex. legacy access and MSI-X support).
>> Sure, will change to refer only to virtio-net devices which are capable
>> for 'legacy access'.
>>
>> No need to refer to MSI-X, please see below.
>>
>>>   
>>>> +
>>>> +          If you don't know what to do here, say N.
>>>> diff --git a/drivers/vfio/pci/virtio/Makefile b/drivers/vfio/pci/virtio/Makefile
>>>> new file mode 100644
>>>> index 000000000000..2039b39fb723
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/Makefile
>>>> @@ -0,0 +1,4 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o
>>>> +virtio-vfio-pci-y := main.o
>>>> +
>>>> diff --git a/drivers/vfio/pci/virtio/main.c b/drivers/vfio/pci/virtio/main.c
>>>> new file mode 100644
>>>> index 000000000000..3fef4b21f7e6
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/pci/virtio/main.c
>>>> @@ -0,0 +1,577 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>>>> + */
>>>> +
>>>> +#include <linux/device.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/pm_runtime.h>
>>>> +#include <linux/types.h>
>>>> +#include <linux/uaccess.h>
>>>> +#include <linux/vfio.h>
>>>> +#include <linux/vfio_pci_core.h>
>>>> +#include <linux/virtio_pci.h>
>>>> +#include <linux/virtio_net.h>
>>>> +#include <linux/virtio_pci_admin.h>
>>>> +
>>>> +struct virtiovf_pci_core_device {
>>>> +	struct vfio_pci_core_device core_device;
>>>> +	u8 bar0_virtual_buf_size;
>>>> +	u8 *bar0_virtual_buf;
>>>> +	/* synchronize access to the virtual buf */
>>>> +	struct mutex bar_mutex;
>>>> +	void __iomem *notify_addr;
>>>> +	u32 notify_offset;
>>>> +	u8 notify_bar;
>>> Push the above u8 to the end of the structure for better packing.
>> OK
>>>> +	u16 pci_cmd;
>>>> +	u16 msix_ctrl;
>>>> +};
>>>> +
>>>> +static int
>>>> +virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev,
>>>> +			     loff_t pos, char __user *buf,
>>>> +			     size_t count, bool read)
>>>> +{
>>>> +	bool msix_enabled = virtvdev->msix_ctrl & PCI_MSIX_FLAGS_ENABLE;
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	u8 *bar0_buf = virtvdev->bar0_virtual_buf;
>>>> +	u16 opcode;
>>>> +	int ret;
>>>> +
>>>> +	mutex_lock(&virtvdev->bar_mutex);
>>>> +	if (read) {
>>>> +		opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ :
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ;
>>>> +		ret = virtio_pci_admin_legacy_io_read(pdev, opcode, pos, count,
>>>> +						      bar0_buf + pos);
>>>> +		if (ret)
>>>> +			goto out;
>>>> +		if (copy_to_user(buf, bar0_buf + pos, count))
>>>> +			ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>> TBH, I think the symmetry of read vs write would be more apparent if
>>> this were an else branch.
>> OK, will do.
>>>> +
>>>> +	if (copy_from_user(bar0_buf + pos, buf, count)) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	opcode = (pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled)) ?
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE :
>>>> +			VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE;
>>>> +	ret = virtio_pci_admin_legacy_io_write(pdev, opcode, pos, count,
>>>> +					       bar0_buf + pos);
>>>> +out:
>>>> +	mutex_unlock(&virtvdev->bar_mutex);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +translate_io_bar_to_mem_bar(struct virtiovf_pci_core_device *virtvdev,
>>>> +			    loff_t pos, char __user *buf,
>>>> +			    size_t count, bool read)
>>>> +{
>>>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>>>> +	u16 queue_notify;
>>>> +	int ret;
>>>> +
>>>> +	if (pos + count > virtvdev->bar0_virtual_buf_size)
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (pos) {
>>>> +	case VIRTIO_PCI_QUEUE_NOTIFY:
>>>> +		if (count != sizeof(queue_notify))
>>>> +			return -EINVAL;
>>>> +		if (read) {
>>>> +			ret = vfio_pci_ioread16(core_device, true, &queue_notify,
>>>> +						virtvdev->notify_addr);
>>>> +			if (ret)
>>>> +				return ret;
>>>> +			if (copy_to_user(buf, &queue_notify,
>>>> +					 sizeof(queue_notify)))
>>>> +				return -EFAULT;
>>>> +			break;
>>>> +		}
>>> Same.
>> OK
>>>> +
>>>> +		if (copy_from_user(&queue_notify, buf, count))
>>>> +			return -EFAULT;
>>>> +
>>>> +		ret = vfio_pci_iowrite16(core_device, true, queue_notify,
>>>> +					 virtvdev->notify_addr);
>>>> +		break;
>>>> +	default:
>>>> +		ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count,
>>>> +						   read);
>>>> +	}
>>>> +
>>>> +	return ret ? ret : count;
>>>> +}
>>>> +
>>>> +static bool range_intersect_range(loff_t range1_start, size_t count1,
>>>> +				  loff_t range2_start, size_t count2,
>>>> +				  loff_t *start_offset,
>>>> +				  size_t *intersect_count,
>>>> +				  size_t *register_offset)
>>>> +{
>>>> +	if (range1_start <= range2_start &&
>>>> +	    range1_start + count1 > range2_start) {
>>>> +		*start_offset = range2_start - range1_start;
>>>> +		*intersect_count = min_t(size_t, count2,
>>>> +					 range1_start + count1 - range2_start);
>>>> +		if (register_offset)
>>>> +			*register_offset = 0;
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	if (range1_start > range2_start &&
>>>> +	    range1_start < range2_start + count2) {
>>>> +		*start_offset = range1_start;
>>>> +		*intersect_count = min_t(size_t, count1,
>>>> +					 range2_start + count2 - range1_start);
>>>> +		if (register_offset)
>>>> +			*register_offset = range1_start - range2_start;
>>>> +		return true;
>>>> +	}
>>> Seems like we're missing a case, and some documentation.
>>>
>>> The first test requires range1 to fully enclose range2 and provides the
>>> offset of range2 within range1 and the length of the intersection.
>>>
>>> The second test requires range1 to start from a non-zero offset within
>>> range2 and returns the absolute offset of range1 and the length of the
>>> intersection.
>>>
>>> The register offset is then non-zero offset of range1 into range2.  So
>>> does the caller use the zero value in the previous test to know range2
>>> exists within range1?
>>>
>>> We miss the cases where range1_start is <= range2_start and range1
>>> terminates within range2.
>> The first test should cover this case as well of the case of fully
>> enclosing.
>>
>> It checks whether range1_start + count1 > range2_start which can
>> terminates also within range2.
>>
>> Isn't it ?
> Hmm, maybe I read it wrong.  Let me try again...
>
> The first test covers the cases where range1 starts at or below range2
> and range1 extends into or through range2.  start_offset describes the
> offset into range1 that range2 begins.  The intersect_count is the
> extent of the intersection and it's not clear what register_offset
> describes since it's zero.
>
> The second test covers the cases where range1 starts within range2.
> start_offset is the start of range1, which doesn't seem consistent with
> the previous branch usage.

Right, start_offest needs to be 0 in that second branch.


>   The intersect_count does look consistent
> with the previous branch.  register_offset is then the offset of range1
> into range2
>
> So I had some things wrong, but I'm still having trouble with a
> consistent definition of start_offset and register_offset.
>
>
>> I may add some documentation for that function as part of V2 as you asked.
>>
>>>    I suppose we'll see below how this is used,
>>> but it seems asymmetric and incomplete.
>>>   
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
>>>> +					char __user *buf, size_t count,
>>>> +					loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	size_t register_offset;
>>>> +	loff_t copy_offset;
>>>> +	size_t copy_count;
>>>> +	__le32 val32;
>>>> +	__le16 val16;
>>>> +	u8 val8;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>> If a user does 'setpci -s x:00.0 2.b' (range1 <= range2, but terminates
>>> within range2) they'll not enter this branch and see 41 rather than 00.
> Yes, this does take the first branch per my second look, so copy_offset
> is zero, copy_count is 1.  I think the copy_to_user() works correctly
>
>>> If a user does 'setpci -s x:00.0 3.b' (range1 > range2, range 1
>>> contained within range 2), the above function returns a copy_offset of
>>> range1_start (ie. 3).  But that offset is applied to the buffer, which
>>> is out of bounds.  The function needs to have returned an offset of 1
>>> and it should have applied to the val16 address.
>>>
>>> I don't think this works like it's intended.
>> Is that because of the missing case ?
>> Please see my note above.
> No, I think my original evaluation of this second case still holds,
> copy_offset is wrong.  I suspect what you're trying to do with
> start_offset and register_offset is specify the output buffer offset,
> ie. relative to range1 or buf, or the input offset, ie. range2 or our
> local val variable.  But start_offset is incorrectly calculated in the
> second branch above (should always be zero) and the caller didn't ask
> for the register offset here, which is seems it always should.

OK,  I got what you said.

Yes, start_offset should always be 0 in the second branch, and the 
caller may need always to ask for the register_offest and use it.

As current QEMU code doesn't read partial register/field we didn't get 
to the second branch and to the above issues.

I will fix as part of V2, it should be a simple change.

>
>>>> +		val16 = cpu_to_le16(0x1000);
>>> Please #define this somewhere rather than hiding a magic value here.
>> Sure, will just replace to VIRTIO_TRANS_ID_NET.
>>>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if ((virtvdev->pci_cmd & PCI_COMMAND_IO) &&
>>>> +	    range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, &register_offset)) {
>>>> +		if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset,
>>>> +				   copy_count))
>>>> +			return -EFAULT;
>>>> +		val16 |= cpu_to_le16(PCI_COMMAND_IO);
>>>> +		if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset,
>>>> +				 copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>> +		/* Transional needs to have revision 0 */
>>>> +		val8 = 0;
>>>> +		if (copy_to_user(buf + copy_offset, &val8, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>> +		val32 = cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO);
>>> I'd still like to see the remainder of the BAR follow the semantics
>>> vfio-pci does.  I think this requires a __le32 bar0 field on the
>>> virtvdev struct to store writes and the read here would mask the lower
>>> bits up to the BAR size and OR in the IO indicator bit.
>> OK, will do.
>>
>>>   
>>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>> +		/*
>>>> +		 * Transitional devices use the PCI subsystem device id as
>>>> +		 * virtio device id, same as legacy driver always did.
>>> Where did we require the subsystem vendor ID to be 0x1af4?  This
>>> subsystem device ID really only makes since given that subsystem
>>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
>>> such as the VF, have a hard requirement per the spec for the subsystem
>>> vendor ID.
>>>
>>> Do we want to make this only probe the correct subsystem vendor ID or do
>>> we want to emulate the subsystem vendor ID as well?  I don't see this is
>>> correct without one of those options.
>> Looking in the 1.x spec we can see the below.
>>
>> Legacy Interfaces: A Note on PCI Device Discovery
>>
>> "Transitional devices MUST have the PCI Subsystem
>> Device ID matching the Virtio Device ID, as indicated in section 5 ...
>> This is to match legacy drivers."
>>
>> However, there is no need to enforce Subsystem Vendor ID.
>>
>> This is what we followed here.
>>
>> Makes sense ?
> So do I understand correctly that virtio dictates the subsystem device
> ID for all subsystem vendor IDs that implement a legacy virtio
> interface?  Ok, but this device didn't actually implement a legacy
> virtio interface.  The device itself is not tranistional, we're imposing
> an emulated transitional interface onto it.  So did the subsystem vendor
> agree to have their subsystem device ID managed by the virtio committee
> or might we create conflicts?  I imagine we know we don't have a
> conflict if we also virtualize the subsystem vendor ID.
>
The non transitional net device in the virtio spec defined as the below 
tuple.
T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.

And transitional net device in the virtio spec for a vendor FOO is 
defined as:
T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1

This driver is converting T_A to T_B, which both are defined by the 
virtio spec.
Hence, it does not conflict for the subsystem vendor, it is fine.
> BTW, it would be a lot easier for all of the config space emulation here
> if we could make use of the existing field virtualization in
> vfio-pci-core.  In fact you'll see in vfio_config_init() that
> PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
> simply do the following to report the desired device ID:
>
> 	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);

I would prefer keeping things simple and have one place/flow that 
handles all the fields as we have now as part of the driver.

In any case, I'll further look at that option for managing the DEVICE_ID 
towards V2.

> It appears everything in this function could be handled similarly by
> vfio-pci-core if the right fields in the perm_bits.virt and .write
> bits could be manipulated and vconfig modified appropriately.  I'd look
> for a way that a variant driver could provide an alternate set of
> permissions structures for various capabilities.  Thanks,

OK

However, let's not block V2 and the series acceptance as of that.

It can always be some future refactoring as part of other series that 
will bring the infra-structure that is needed for that.

Yishai

>
> Alex
>
>
>>>> +		 */
>>>> +		val16 = cpu_to_le16(VIRTIO_ID_NET);
>>>> +		if (copy_to_user(buf + copy_offset, &val16, copy_count))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	return count;
>>>> +}
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>>>> +		       size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	int ret;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX)
>>>> +		return virtiovf_pci_read_config(core_vdev, buf, count, ppos);
>>>> +
>>>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>>>> +		return vfio_pci_core_read(core_vdev, buf, count, ppos);
>>>> +
>>>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>>>> +	if (ret) {
>>>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n",
>>>> +				     ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, buf, count, true);
>>>> +	pm_runtime_put(&pdev->dev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static ssize_t
>>>> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>>>> +			size_t count, loff_t *ppos)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev = virtvdev->core_device.pdev;
>>>> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
>>>> +	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>>> +	int ret;
>>>> +
>>>> +	if (!count)
>>>> +		return 0;
>>>> +
>>>> +	if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
>>>> +		size_t register_offset;
>>>> +		loff_t copy_offset;
>>>> +		size_t copy_count;
>>>> +
>>>> +		if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>>> +
>>>> +		if (range_intersect_range(pos, count, pdev->msix_cap + PCI_MSIX_FLAGS,
>>>> +					  sizeof(virtvdev->msix_ctrl),
>>>> +					  &copy_offset, &copy_count,
>>>> +					  &register_offset)) {
>>>> +			if (copy_from_user((void *)&virtvdev->msix_ctrl + register_offset,
>>>> +					   buf + copy_offset,
>>>> +					   copy_count))
>>>> +				return -EFAULT;
>>>> +		}
>>> MSI-X is setup via ioctl, so you're relying on a userspace that writes
>>> through the control register bit even though it doesn't do anything.
>>> Why not use vfio_pci_core_device.irq_type to track if MSI-X mode is
>>> enabled?
>> OK, may switch to your suggestion post of testing it.
>>>   
>>>> +	}
>>>> +
>>>> +	if (index != VFIO_PCI_BAR0_REGION_INDEX)
>>>> +		return vfio_pci_core_write(core_vdev, buf, count, ppos);
>>>> +
>>>> +	ret = pm_runtime_resume_and_get(&pdev->dev);
>>>> +	if (ret) {
>>>> +		pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret);
>>>> +		return -EIO;
>>>> +	}
>>>> +
>>>> +	ret = translate_io_bar_to_mem_bar(virtvdev, pos, (char __user *)buf, count, false);
>>>> +	pm_runtime_put(&pdev->dev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
>>>> +				   unsigned int cmd, unsigned long arg)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	unsigned long minsz = offsetofend(struct vfio_region_info, offset);
>>>> +	void __user *uarg = (void __user *)arg;
>>>> +	struct vfio_region_info info = {};
>>>> +
>>>> +	if (copy_from_user(&info, uarg, minsz))
>>>> +		return -EFAULT;
>>>> +
>>>> +	if (info.argsz < minsz)
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (info.index) {
>>>> +	case VFIO_PCI_BAR0_REGION_INDEX:
>>>> +		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
>>>> +		info.size = virtvdev->bar0_virtual_buf_size;
>>>> +		info.flags = VFIO_REGION_INFO_FLAG_READ |
>>>> +			     VFIO_REGION_INFO_FLAG_WRITE;
>>>> +		return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
>>>> +	default:
>>>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>>>> +	}
>>>> +}
>>>> +
>>>> +static long
>>>> +virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>>>> +			     unsigned long arg)
>>>> +{
>>>> +	switch (cmd) {
>>>> +	case VFIO_DEVICE_GET_REGION_INFO:
>>>> +		return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg);
>>>> +	default:
>>>> +		return vfio_pci_core_ioctl(core_vdev, cmd, arg);
>>>> +	}
>>>> +}
>>>> +
>>>> +static int
>>>> +virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
>>>> +{
>>>> +	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
>>>> +	int ret;
>>>> +
>>>> +	/*
>>>> +	 * Setup the BAR where the 'notify' exists to be used by vfio as well
>>>> +	 * This will let us mmap it only once and use it when needed.
>>>> +	 */
>>>> +	ret = vfio_pci_core_setup_barmap(core_device,
>>>> +					 virtvdev->notify_bar);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
>>>> +			virtvdev->notify_offset;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_open_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct vfio_pci_core_device *vdev = &virtvdev->core_device;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_enable(vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	if (virtvdev->bar0_virtual_buf) {
>>>> +		/*
>>>> +		 * Upon close_device() the vfio_pci_core_disable() is called
>>>> +		 * and will close all the previous mmaps, so it seems that the
>>>> +		 * valid life cycle for the 'notify' addr is per open/close.
>>>> +		 */
>>>> +		ret = virtiovf_set_notify_addr(virtvdev);
>>>> +		if (ret) {
>>>> +			vfio_pci_core_disable(vdev);
>>>> +			return ret;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	vfio_pci_core_finish_enable(vdev);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_get_device_config_size(unsigned short device)
>>>> +{
>>>> +	/* Network card */
>>>> +	return offsetofend(struct virtio_net_config, status);
>>>> +}
>>>> +
>>>> +static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev)
>>>> +{
>>>> +	u64 offset;
>>>> +	int ret;
>>>> +	u8 bar;
>>>> +
>>>> +	ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev,
>>>> +				VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM,
>>>> +				&bar, &offset);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	virtvdev->notify_bar = bar;
>>>> +	virtvdev->notify_offset = offset;
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +	struct pci_dev *pdev;
>>>> +	int ret;
>>>> +
>>>> +	ret = vfio_pci_core_init_dev(core_vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	pdev = virtvdev->core_device.pdev;
>>>> +	ret = virtiovf_read_notify_info(virtvdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* Being ready with a buffer that supports MSIX */
>>>> +	virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
>>>> +				virtiovf_get_device_config_size(pdev->device);
>>>> +	virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size,
>>>> +					     GFP_KERNEL);
>>>> +	if (!virtvdev->bar0_virtual_buf)
>>>> +		return -ENOMEM;
>>>> +	mutex_init(&virtvdev->bar_mutex);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = container_of(
>>>> +		core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
>>>> +
>>>> +	kfree(virtvdev->bar0_virtual_buf);
>>>> +	vfio_pci_core_release_dev(core_vdev);
>>>> +}
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_tran_ops = {
>>>> +	.name = "virtio-transitional-vfio-pci",
>>>> +	.init = virtiovf_pci_init_device,
>>>> +	.release = virtiovf_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>> +	.close_device = vfio_pci_core_close_device,
>>>> +	.ioctl = virtiovf_vfio_pci_core_ioctl,
>>>> +	.read = virtiovf_pci_core_read,
>>>> +	.write = virtiovf_pci_core_write,
>>>> +	.mmap = vfio_pci_core_mmap,
>>>> +	.request = vfio_pci_core_request,
>>>> +	.match = vfio_pci_core_match,
>>>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>>>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>>>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>>>> +};
>>>> +
>>>> +static const struct vfio_device_ops virtiovf_acc_vfio_pci_ops = {
>>>> +	.name = "virtio-acc-vfio-pci",
>>>> +	.init = vfio_pci_core_init_dev,
>>>> +	.release = vfio_pci_core_release_dev,
>>>> +	.open_device = virtiovf_pci_open_device,
>>>> +	.close_device = vfio_pci_core_close_device,
>>>> +	.ioctl = vfio_pci_core_ioctl,
>>>> +	.device_feature = vfio_pci_core_ioctl_feature,
>>>> +	.read = vfio_pci_core_read,
>>>> +	.write = vfio_pci_core_write,
>>>> +	.mmap = vfio_pci_core_mmap,
>>>> +	.request = vfio_pci_core_request,
>>>> +	.match = vfio_pci_core_match,
>>>> +	.bind_iommufd = vfio_iommufd_physical_bind,
>>>> +	.unbind_iommufd = vfio_iommufd_physical_unbind,
>>>> +	.attach_ioas = vfio_iommufd_physical_attach_ioas,
>>>> +};
>>>> +
>>>> +static bool virtiovf_bar0_exists(struct pci_dev *pdev)
>>>> +{
>>>> +	struct resource *res = pdev->resource;
>>>> +
>>>> +	return res->flags ? true : false;
>>>> +}
>>>> +
>>>> +#define VIRTIOVF_USE_ADMIN_CMD_BITMAP \
>>>> +	(BIT_ULL(VIRTIO_ADMIN_CMD_LIST_QUERY) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LIST_USE) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \
>>>> +	 BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO))
>>>> +
>>>> +static bool virtiovf_support_legacy_access(struct pci_dev *pdev)
>>>> +{
>>>> +	int buf_size = DIV_ROUND_UP(VIRTIO_ADMIN_MAX_CMD_OPCODE, 64) * 8;
>>>> +	u8 *buf;
>>>> +	int ret;
>>>> +
>>>> +	buf = kzalloc(buf_size, GFP_KERNEL);
>>>> +	if (!buf)
>>>> +		return false;
>>>> +
>>>> +	ret = virtio_pci_admin_list_query(pdev, buf, buf_size);
>>>> +	if (ret)
>>>> +		goto end;
>>>> +
>>>> +	if ((le64_to_cpup((__le64 *)buf) & VIRTIOVF_USE_ADMIN_CMD_BITMAP) !=
>>>> +		VIRTIOVF_USE_ADMIN_CMD_BITMAP) {
>>>> +		ret = -EOPNOTSUPP;
>>>> +		goto end;
>>>> +	}
>>>> +
>>>> +	/* Confirm the used commands */
>>>> +	memset(buf, 0, buf_size);
>>>> +	*(__le64 *)buf = cpu_to_le64(VIRTIOVF_USE_ADMIN_CMD_BITMAP);
>>>> +	ret = virtio_pci_admin_list_use(pdev, buf, buf_size);
>>>> +end:
>>>> +	kfree(buf);
>>>> +	return ret ? false : true;
>>>> +}
>>>> +
>>>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>>>> +			      const struct pci_device_id *id)
>>>> +{
>>>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>>>> +	struct virtiovf_pci_core_device *virtvdev;
>>>> +	int ret;
>>>> +
>>>> +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
>>>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>>> All but the last test here are fairly evident requirements of the
>>> driver.  Why do we require a device that supports MSI-X?
>> As now we check at run time to decide whether MSI-X is enabled/disabled
>> to pick-up the correct op code, no need for that any more.
>>
>> Will drop this MSI-X check from V2.
>>
>> Thanks,
>> Yishai
>>
>>> Thanks,
>>> Alex
>>>
>>>   
>>>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>>>> +
>>>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
>>>> +				     &pdev->dev, ops);
>>>> +	if (IS_ERR(virtvdev))
>>>> +		return PTR_ERR(virtvdev);
>>>> +
>>>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>>>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>>>> +	if (ret)
>>>> +		goto out;
>>>> +	return 0;
>>>> +out:
>>>> +	vfio_put_device(&virtvdev->core_device.vdev);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>>>> +{
>>>> +	struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
>>>> +
>>>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>>>> +	vfio_put_device(&virtvdev->core_device.vdev);
>>>> +}
>>>> +
>>>> +static const struct pci_device_id virtiovf_pci_table[] = {
>>>> +	/* Only virtio-net is supported/tested so far */
>>>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) },
>>>> +	{}
>>>> +};
>>>> +
>>>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>>>> +
>>>> +static struct pci_driver virtiovf_pci_driver = {
>>>> +	.name = KBUILD_MODNAME,
>>>> +	.id_table = virtiovf_pci_table,
>>>> +	.probe = virtiovf_pci_probe,
>>>> +	.remove = virtiovf_pci_remove,
>>>> +	.err_handler = &vfio_pci_core_err_handlers,
>>>> +	.driver_managed_dma = true,
>>>> +};
>>>> +
>>>> +module_pci_driver(virtiovf_pci_driver);
>>>> +
>>>> +MODULE_LICENSE("GPL");
>>>> +MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
>>>> +MODULE_DESCRIPTION(
>>>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 12:08           ` Yishai Hadas via Virtualization
@ 2023-10-26 12:12             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 12:12 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > Makes sense ?
> > So do I understand correctly that virtio dictates the subsystem device
> > ID for all subsystem vendor IDs that implement a legacy virtio
> > interface?  Ok, but this device didn't actually implement a legacy
> > virtio interface.  The device itself is not tranistional, we're imposing
> > an emulated transitional interface onto it.  So did the subsystem vendor
> > agree to have their subsystem device ID managed by the virtio committee
> > or might we create conflicts?  I imagine we know we don't have a
> > conflict if we also virtualize the subsystem vendor ID.
> > 
> The non transitional net device in the virtio spec defined as the below
> tuple.
> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> 
> And transitional net device in the virtio spec for a vendor FOO is defined
> as:
> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> 
> This driver is converting T_A to T_B, which both are defined by the virtio
> spec.
> Hence, it does not conflict for the subsystem vendor, it is fine.

You are talking about legacy guests, what 1.X spec says about them
is much less important than what guests actually do.
Check the INF of the open source windows drivers and linux code, at least.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 12:12             ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 12:12 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Alex Williamson, jasowang, jgg, kvm, virtualization, parav,
	feliu, jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro,
	maorg

On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > Makes sense ?
> > So do I understand correctly that virtio dictates the subsystem device
> > ID for all subsystem vendor IDs that implement a legacy virtio
> > interface?  Ok, but this device didn't actually implement a legacy
> > virtio interface.  The device itself is not tranistional, we're imposing
> > an emulated transitional interface onto it.  So did the subsystem vendor
> > agree to have their subsystem device ID managed by the virtio committee
> > or might we create conflicts?  I imagine we know we don't have a
> > conflict if we also virtualize the subsystem vendor ID.
> > 
> The non transitional net device in the virtio spec defined as the below
> tuple.
> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> 
> And transitional net device in the virtio spec for a vendor FOO is defined
> as:
> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> 
> This driver is converting T_A to T_B, which both are defined by the virtio
> spec.
> Hence, it does not conflict for the subsystem vendor, it is fine.

You are talking about legacy guests, what 1.X spec says about them
is much less important than what guests actually do.
Check the INF of the open source windows drivers and linux code, at least.

-- 
MST


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 12:12             ` Michael S. Tsirkin
@ 2023-10-26 12:40               ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2023-10-26 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: Alex Williamson, jasowang, Jason Gunthorpe, kvm, virtualization,
	Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins, si-wei.liu,
	Leon Romanovsky, Maor Gottlieb

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 5:42 PM
> 
> On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > > Makes sense ?
> > > So do I understand correctly that virtio dictates the subsystem
> > > device ID for all subsystem vendor IDs that implement a legacy
> > > virtio interface?  Ok, but this device didn't actually implement a
> > > legacy virtio interface.  The device itself is not tranistional,
> > > we're imposing an emulated transitional interface onto it.  So did
> > > the subsystem vendor agree to have their subsystem device ID managed
> > > by the virtio committee or might we create conflicts?  I imagine we
> > > know we don't have a conflict if we also virtualize the subsystem vendor ID.
> > >
> > The non transitional net device in the virtio spec defined as the
> > below tuple.
> > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> >
> > And transitional net device in the virtio spec for a vendor FOO is
> > defined
> > as:
> > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> >
> > This driver is converting T_A to T_B, which both are defined by the
> > virtio spec.
> > Hence, it does not conflict for the subsystem vendor, it is fine.
> 
> You are talking about legacy guests, what 1.X spec says about them is much less
> important than what guests actually do.
> Check the INF of the open source windows drivers and linux code, at least.

Linux legacy guest has,

static struct pci_device_id virtio_pci_id_table[] = {
        { 0x1af4, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
        { 0 },
};
Followed by an open coded driver check for 0x1000 to 0x103f range.
Do you mean windows driver expects specific subsystem vendor id of 0x1af4?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 12:40               ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-26 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yishai Hadas
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 5:42 PM
> 
> On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > > Makes sense ?
> > > So do I understand correctly that virtio dictates the subsystem
> > > device ID for all subsystem vendor IDs that implement a legacy
> > > virtio interface?  Ok, but this device didn't actually implement a
> > > legacy virtio interface.  The device itself is not tranistional,
> > > we're imposing an emulated transitional interface onto it.  So did
> > > the subsystem vendor agree to have their subsystem device ID managed
> > > by the virtio committee or might we create conflicts?  I imagine we
> > > know we don't have a conflict if we also virtualize the subsystem vendor ID.
> > >
> > The non transitional net device in the virtio spec defined as the
> > below tuple.
> > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> >
> > And transitional net device in the virtio spec for a vendor FOO is
> > defined
> > as:
> > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> >
> > This driver is converting T_A to T_B, which both are defined by the
> > virtio spec.
> > Hence, it does not conflict for the subsystem vendor, it is fine.
> 
> You are talking about legacy guests, what 1.X spec says about them is much less
> important than what guests actually do.
> Check the INF of the open source windows drivers and linux code, at least.

Linux legacy guest has,

static struct pci_device_id virtio_pci_id_table[] = {
        { 0x1af4, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
        { 0 },
};
Followed by an open coded driver check for 0x1000 to 0x103f range.
Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 12:40               ` Parav Pandit via Virtualization
@ 2023-10-26 13:15                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 13:15 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Thu, Oct 26, 2023 at 12:40:04PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 26, 2023 5:42 PM
> > 
> > On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > > > Makes sense ?
> > > > So do I understand correctly that virtio dictates the subsystem
> > > > device ID for all subsystem vendor IDs that implement a legacy
> > > > virtio interface?  Ok, but this device didn't actually implement a
> > > > legacy virtio interface.  The device itself is not tranistional,
> > > > we're imposing an emulated transitional interface onto it.  So did
> > > > the subsystem vendor agree to have their subsystem device ID managed
> > > > by the virtio committee or might we create conflicts?  I imagine we
> > > > know we don't have a conflict if we also virtualize the subsystem vendor ID.
> > > >
> > > The non transitional net device in the virtio spec defined as the
> > > below tuple.
> > > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> > >
> > > And transitional net device in the virtio spec for a vendor FOO is
> > > defined
> > > as:
> > > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> > >
> > > This driver is converting T_A to T_B, which both are defined by the
> > > virtio spec.
> > > Hence, it does not conflict for the subsystem vendor, it is fine.
> > 
> > You are talking about legacy guests, what 1.X spec says about them is much less
> > important than what guests actually do.
> > Check the INF of the open source windows drivers and linux code, at least.
> 
> Linux legacy guest has,
> 
> static struct pci_device_id virtio_pci_id_table[] = {
>         { 0x1af4, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
>         { 0 },
> };
> Followed by an open coded driver check for 0x1000 to 0x103f range.
> Do you mean windows driver expects specific subsystem vendor id of 0x1af4?

Look it up, it's open source.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 13:15                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 13:15 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yishai Hadas, Alex Williamson, jasowang, Jason Gunthorpe, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	si-wei.liu, Leon Romanovsky, Maor Gottlieb

On Thu, Oct 26, 2023 at 12:40:04PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 26, 2023 5:42 PM
> > 
> > On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > > > Makes sense ?
> > > > So do I understand correctly that virtio dictates the subsystem
> > > > device ID for all subsystem vendor IDs that implement a legacy
> > > > virtio interface?  Ok, but this device didn't actually implement a
> > > > legacy virtio interface.  The device itself is not tranistional,
> > > > we're imposing an emulated transitional interface onto it.  So did
> > > > the subsystem vendor agree to have their subsystem device ID managed
> > > > by the virtio committee or might we create conflicts?  I imagine we
> > > > know we don't have a conflict if we also virtualize the subsystem vendor ID.
> > > >
> > > The non transitional net device in the virtio spec defined as the
> > > below tuple.
> > > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> > >
> > > And transitional net device in the virtio spec for a vendor FOO is
> > > defined
> > > as:
> > > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> > >
> > > This driver is converting T_A to T_B, which both are defined by the
> > > virtio spec.
> > > Hence, it does not conflict for the subsystem vendor, it is fine.
> > 
> > You are talking about legacy guests, what 1.X spec says about them is much less
> > important than what guests actually do.
> > Check the INF of the open source windows drivers and linux code, at least.
> 
> Linux legacy guest has,
> 
> static struct pci_device_id virtio_pci_id_table[] = {
>         { 0x1af4, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
>         { 0 },
> };
> Followed by an open coded driver check for 0x1000 to 0x103f range.
> Do you mean windows driver expects specific subsystem vendor id of 0x1af4?

Look it up, it's open source.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 13:15                 ` Michael S. Tsirkin
@ 2023-10-26 13:28                   ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2023-10-26 13:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, Alex Williamson, jasowang, Jason Gunthorpe, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	si-wei.liu, Leon Romanovsky, Maor Gottlieb


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 6:45 PM

> > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
> 
> Look it up, it's open source.

Those are not OS inbox drivers anyway.
:)
The current vfio driver is following the virtio spec based on legacy spec, 1.x spec following the transitional device sections.
There is no need to do something out of spec at this point.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 13:28                   ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-26 13:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 6:45 PM

> > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
> 
> Look it up, it's open source.

Those are not OS inbox drivers anyway.
:)
The current vfio driver is following the virtio spec based on legacy spec, 1.x spec following the transitional device sections.
There is no need to do something out of spec at this point.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 13:28                   ` Parav Pandit via Virtualization
@ 2023-10-26 15:06                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 15:06 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 26, 2023 6:45 PM
> 
> > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
> > 
> > Look it up, it's open source.
> 
> Those are not OS inbox drivers anyway.
> :)

Does not matter at all if guest has drivers installed.
Either you worry about legacy guests or not.


> The current vfio driver is following the virtio spec based on legacy spec, 1.x spec following the transitional device sections.
> There is no need to do something out of spec at this point.

legacy spec wasn't maintained properly, drivers diverged sometimes
significantly. what matters is installed base.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 15:06                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 15:06 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yishai Hadas, Alex Williamson, jasowang, Jason Gunthorpe, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	si-wei.liu, Leon Romanovsky, Maor Gottlieb

On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 26, 2023 6:45 PM
> 
> > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
> > 
> > Look it up, it's open source.
> 
> Those are not OS inbox drivers anyway.
> :)

Does not matter at all if guest has drivers installed.
Either you worry about legacy guests or not.


> The current vfio driver is following the virtio spec based on legacy spec, 1.x spec following the transitional device sections.
> There is no need to do something out of spec at this point.

legacy spec wasn't maintained properly, drivers diverged sometimes
significantly. what matters is installed base.

-- 
MST


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 15:06                     ` Michael S. Tsirkin
@ 2023-10-26 15:09                       ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2023-10-26 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, Alex Williamson, jasowang, Jason Gunthorpe, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	si-wei.liu, Leon Romanovsky, Maor Gottlieb


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 8:36 PM
> 
> On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, October 26, 2023 6:45 PM
> >
> > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > Do you mean windows driver expects specific subsystem vendor id of
> 0x1af4?
> > >
> > > Look it up, it's open source.
> >
> > Those are not OS inbox drivers anyway.
> > :)
> 
> Does not matter at all if guest has drivers installed.
> Either you worry about legacy guests or not.
> 
So, Linux guests have inbox drivers, that we care about and they seems to be covered, right?

> 
> > The current vfio driver is following the virtio spec based on legacy spec, 1.x
> spec following the transitional device sections.
> > There is no need to do something out of spec at this point.
> 
> legacy spec wasn't maintained properly, drivers diverged sometimes
> significantly. what matters is installed base.

So if you know the subsystem vendor id that Windows expects, please share, so we can avoid playing puzzle game. :)
It anyway can be reported by the device itself.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 15:09                       ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-26 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 8:36 PM
> 
> On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, October 26, 2023 6:45 PM
> >
> > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > Do you mean windows driver expects specific subsystem vendor id of
> 0x1af4?
> > >
> > > Look it up, it's open source.
> >
> > Those are not OS inbox drivers anyway.
> > :)
> 
> Does not matter at all if guest has drivers installed.
> Either you worry about legacy guests or not.
> 
So, Linux guests have inbox drivers, that we care about and they seems to be covered, right?

> 
> > The current vfio driver is following the virtio spec based on legacy spec, 1.x
> spec following the transitional device sections.
> > There is no need to do something out of spec at this point.
> 
> legacy spec wasn't maintained properly, drivers diverged sometimes
> significantly. what matters is installed base.

So if you know the subsystem vendor id that Windows expects, please share, so we can avoid playing puzzle game. :)
It anyway can be reported by the device itself.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 15:09                       ` Parav Pandit via Virtualization
@ 2023-10-26 15:46                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 15:46 UTC (permalink / raw)
  To: Parav Pandit
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky

On Thu, Oct 26, 2023 at 03:09:13PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 26, 2023 8:36 PM
> > 
> > On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Thursday, October 26, 2023 6:45 PM
> > >
> > > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > > Do you mean windows driver expects specific subsystem vendor id of
> > 0x1af4?
> > > >
> > > > Look it up, it's open source.
> > >
> > > Those are not OS inbox drivers anyway.
> > > :)
> > 
> > Does not matter at all if guest has drivers installed.
> > Either you worry about legacy guests or not.
> > 
> So, Linux guests have inbox drivers, that we care about and they seems to be covered, right?
> 
> > 
> > > The current vfio driver is following the virtio spec based on legacy spec, 1.x
> > spec following the transitional device sections.
> > > There is no need to do something out of spec at this point.
> > 
> > legacy spec wasn't maintained properly, drivers diverged sometimes
> > significantly. what matters is installed base.
> 
> So if you know the subsystem vendor id that Windows expects, please share, so we can avoid playing puzzle game. :)
> It anyway can be reported by the device itself.

I don't know myself offhand. I just know it's not so simple. Looking at the source
for network drivers I see:

%kvmnet6.DeviceDesc%    = kvmnet6.ndi, PCI\VEN_1AF4&DEV_1000&SUBSYS_0001_INX_SUBSYS_VENDOR_ID&REV_00, PCI\VEN_1AF4&DEV_1000


So the drivers will:
A. bind with high priority to subsystem vendor ID used when drivers where built.
   popular drivers built and distributed for free by Red Hat have 1AF4
B. bind with low priority to any subsystem device/vendor id as long as
   vendor is 1af4 and device is 1000


My conclusions:
- you probably need a way to tweak subsystem vendor id in software
- default should probably be 1AF4 not whatever actual device uses


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 15:46                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 15:46 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yishai Hadas, Alex Williamson, jasowang, Jason Gunthorpe, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	si-wei.liu, Leon Romanovsky, Maor Gottlieb

On Thu, Oct 26, 2023 at 03:09:13PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 26, 2023 8:36 PM
> > 
> > On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Thursday, October 26, 2023 6:45 PM
> > >
> > > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > > Do you mean windows driver expects specific subsystem vendor id of
> > 0x1af4?
> > > >
> > > > Look it up, it's open source.
> > >
> > > Those are not OS inbox drivers anyway.
> > > :)
> > 
> > Does not matter at all if guest has drivers installed.
> > Either you worry about legacy guests or not.
> > 
> So, Linux guests have inbox drivers, that we care about and they seems to be covered, right?
> 
> > 
> > > The current vfio driver is following the virtio spec based on legacy spec, 1.x
> > spec following the transitional device sections.
> > > There is no need to do something out of spec at this point.
> > 
> > legacy spec wasn't maintained properly, drivers diverged sometimes
> > significantly. what matters is installed base.
> 
> So if you know the subsystem vendor id that Windows expects, please share, so we can avoid playing puzzle game. :)
> It anyway can be reported by the device itself.

I don't know myself offhand. I just know it's not so simple. Looking at the source
for network drivers I see:

%kvmnet6.DeviceDesc%    = kvmnet6.ndi, PCI\VEN_1AF4&DEV_1000&SUBSYS_0001_INX_SUBSYS_VENDOR_ID&REV_00, PCI\VEN_1AF4&DEV_1000


So the drivers will:
A. bind with high priority to subsystem vendor ID used when drivers where built.
   popular drivers built and distributed for free by Red Hat have 1AF4
B. bind with low priority to any subsystem device/vendor id as long as
   vendor is 1af4 and device is 1000


My conclusions:
- you probably need a way to tweak subsystem vendor id in software
- default should probably be 1AF4 not whatever actual device uses


-- 
MST


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 15:46                         ` Michael S. Tsirkin
@ 2023-10-26 15:56                           ` Parav Pandit via Virtualization
  -1 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2023-10-26 15:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yishai Hadas, Alex Williamson, jasowang, Jason Gunthorpe, kvm,
	virtualization, Feng Liu, Jiri Pirko, kevin.tian, joao.m.martins,
	si-wei.liu, Leon Romanovsky, Maor Gottlieb


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 9:16 PM

> On Thu, Oct 26, 2023 at 03:09:13PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, October 26, 2023 8:36 PM
> > >
> > > On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Thursday, October 26, 2023 6:45 PM
> > > >
> > > > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > > > Do you mean windows driver expects specific subsystem vendor
> > > > > > id of
> > > 0x1af4?
> > > > >
> > > > > Look it up, it's open source.
> > > >
> > > > Those are not OS inbox drivers anyway.
> > > > :)
> > >
> > > Does not matter at all if guest has drivers installed.
> > > Either you worry about legacy guests or not.
> > >
> > So, Linux guests have inbox drivers, that we care about and they seems to be
> covered, right?
> >
> > >
> > > > The current vfio driver is following the virtio spec based on
> > > > legacy spec, 1.x
> > > spec following the transitional device sections.
> > > > There is no need to do something out of spec at this point.
> > >
> > > legacy spec wasn't maintained properly, drivers diverged sometimes
> > > significantly. what matters is installed base.
> >
> > So if you know the subsystem vendor id that Windows expects, please
> > share, so we can avoid playing puzzle game. :) It anyway can be reported by
> the device itself.
> 
> I don't know myself offhand. I just know it's not so simple. Looking at the source
> for network drivers I see:
> 
> %kvmnet6.DeviceDesc%    = kvmnet6.ndi,
> PCI\VEN_1AF4&DEV_1000&SUBSYS_0001_INX_SUBSYS_VENDOR_ID&REV_00,
> PCI\VEN_1AF4&DEV_1000
> 
Yeah, I was checking the cryptic notation at https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master

> 
> So the drivers will:
> A. bind with high priority to subsystem vendor ID used when drivers where built.
>    popular drivers built and distributed for free by Red Hat have 1AF4 B. bind
> with low priority to any subsystem device/vendor id as long as
>    vendor is 1af4 and device is 1000
> 
> 
> My conclusions:
> - you probably need a way to tweak subsystem vendor id in software
> - default should probably be 1AF4 not whatever actual device uses
Ok.
It is not mandatory but, if you and Alex are ok to also tweak the subsystem vendor id, it seems fine to me. It does not hurt anything.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 15:56                           ` Parav Pandit via Virtualization
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit via Virtualization @ 2023-10-26 15:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Maor Gottlieb, virtualization, Jason Gunthorpe, Jiri Pirko,
	Leon Romanovsky


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 26, 2023 9:16 PM

> On Thu, Oct 26, 2023 at 03:09:13PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, October 26, 2023 8:36 PM
> > >
> > > On Thu, Oct 26, 2023 at 01:28:18PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Thursday, October 26, 2023 6:45 PM
> > > >
> > > > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > > > Do you mean windows driver expects specific subsystem vendor
> > > > > > id of
> > > 0x1af4?
> > > > >
> > > > > Look it up, it's open source.
> > > >
> > > > Those are not OS inbox drivers anyway.
> > > > :)
> > >
> > > Does not matter at all if guest has drivers installed.
> > > Either you worry about legacy guests or not.
> > >
> > So, Linux guests have inbox drivers, that we care about and they seems to be
> covered, right?
> >
> > >
> > > > The current vfio driver is following the virtio spec based on
> > > > legacy spec, 1.x
> > > spec following the transitional device sections.
> > > > There is no need to do something out of spec at this point.
> > >
> > > legacy spec wasn't maintained properly, drivers diverged sometimes
> > > significantly. what matters is installed base.
> >
> > So if you know the subsystem vendor id that Windows expects, please
> > share, so we can avoid playing puzzle game. :) It anyway can be reported by
> the device itself.
> 
> I don't know myself offhand. I just know it's not so simple. Looking at the source
> for network drivers I see:
> 
> %kvmnet6.DeviceDesc%    = kvmnet6.ndi,
> PCI\VEN_1AF4&DEV_1000&SUBSYS_0001_INX_SUBSYS_VENDOR_ID&REV_00,
> PCI\VEN_1AF4&DEV_1000
> 
Yeah, I was checking the cryptic notation at https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master

> 
> So the drivers will:
> A. bind with high priority to subsystem vendor ID used when drivers where built.
>    popular drivers built and distributed for free by Red Hat have 1AF4 B. bind
> with low priority to any subsystem device/vendor id as long as
>    vendor is 1af4 and device is 1000
> 
> 
> My conclusions:
> - you probably need a way to tweak subsystem vendor id in software
> - default should probably be 1AF4 not whatever actual device uses
Ok.
It is not mandatory but, if you and Alex are ok to also tweak the subsystem vendor id, it seems fine to me. It does not hurt anything.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 12:08           ` Yishai Hadas via Virtualization
@ 2023-10-26 17:55             ` Alex Williamson
  -1 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-26 17:55 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On Thu, 26 Oct 2023 15:08:12 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 25/10/2023 22:13, Alex Williamson wrote:
> > On Wed, 25 Oct 2023 17:35:51 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >  
> >> On 24/10/2023 22:57, Alex Williamson wrote:  
> >>> On Tue, 17 Oct 2023 16:42:17 +0300
> >>> Yishai Hadas <yishaih@nvidia.com> wrote:
   
> >>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> >>>> +			return -EFAULT;
> >>>> +	}
> >>>> +
> >>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> >>>> +				  &copy_offset, &copy_count, NULL)) {
> >>>> +		/*
> >>>> +		 * Transitional devices use the PCI subsystem device id as
> >>>> +		 * virtio device id, same as legacy driver always did.  
> >>> Where did we require the subsystem vendor ID to be 0x1af4?  This
> >>> subsystem device ID really only makes since given that subsystem
> >>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> >>> such as the VF, have a hard requirement per the spec for the subsystem
> >>> vendor ID.
> >>>
> >>> Do we want to make this only probe the correct subsystem vendor ID or do
> >>> we want to emulate the subsystem vendor ID as well?  I don't see this is
> >>> correct without one of those options.  
> >> Looking in the 1.x spec we can see the below.
> >>
> >> Legacy Interfaces: A Note on PCI Device Discovery
> >>
> >> "Transitional devices MUST have the PCI Subsystem
> >> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> >> This is to match legacy drivers."
> >>
> >> However, there is no need to enforce Subsystem Vendor ID.
> >>
> >> This is what we followed here.
> >>
> >> Makes sense ?  
> > So do I understand correctly that virtio dictates the subsystem device
> > ID for all subsystem vendor IDs that implement a legacy virtio
> > interface?  Ok, but this device didn't actually implement a legacy
> > virtio interface.  The device itself is not tranistional, we're imposing
> > an emulated transitional interface onto it.  So did the subsystem vendor
> > agree to have their subsystem device ID managed by the virtio committee
> > or might we create conflicts?  I imagine we know we don't have a
> > conflict if we also virtualize the subsystem vendor ID.
> >  
> The non transitional net device in the virtio spec defined as the below 
> tuple.
> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> 
> And transitional net device in the virtio spec for a vendor FOO is 
> defined as:
> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> 
> This driver is converting T_A to T_B, which both are defined by the 
> virtio spec.
> Hence, it does not conflict for the subsystem vendor, it is fine.

Surprising to me that the virtio spec dictates subsystem device ID in
all cases.  The further discussion in this thread seems to indicate we
need to virtualize subsystem vendor ID for broader driver compatibility
anyway.

> > BTW, it would be a lot easier for all of the config space emulation here
> > if we could make use of the existing field virtualization in
> > vfio-pci-core.  In fact you'll see in vfio_config_init() that
> > PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
> > simply do the following to report the desired device ID:
> >
> > 	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);  
> 
> I would prefer keeping things simple and have one place/flow that 
> handles all the fields as we have now as part of the driver.

That's the same argument I'd make for re-using the core code, we don't
need multiple implementations handling merging physical and virtual
bits within config space.

> In any case, I'll further look at that option for managing the DEVICE_ID 
> towards V2.
> 
> > It appears everything in this function could be handled similarly by
> > vfio-pci-core if the right fields in the perm_bits.virt and .write
> > bits could be manipulated and vconfig modified appropriately.  I'd look
> > for a way that a variant driver could provide an alternate set of
> > permissions structures for various capabilities.  Thanks,  
> 
> OK
> 
> However, let's not block V2 and the series acceptance as of that.
> 
> It can always be some future refactoring as part of other series that 
> will bring the infra-structure that is needed for that.

We're already on the verge of the v6.7 merge window, so this looks like
v6.8 material anyway.  We have time.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 17:55             ` Alex Williamson
  0 siblings, 0 replies; 100+ messages in thread
From: Alex Williamson @ 2023-10-26 17:55 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Thu, 26 Oct 2023 15:08:12 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 25/10/2023 22:13, Alex Williamson wrote:
> > On Wed, 25 Oct 2023 17:35:51 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >  
> >> On 24/10/2023 22:57, Alex Williamson wrote:  
> >>> On Tue, 17 Oct 2023 16:42:17 +0300
> >>> Yishai Hadas <yishaih@nvidia.com> wrote:
   
> >>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> >>>> +			return -EFAULT;
> >>>> +	}
> >>>> +
> >>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> >>>> +				  &copy_offset, &copy_count, NULL)) {
> >>>> +		/*
> >>>> +		 * Transitional devices use the PCI subsystem device id as
> >>>> +		 * virtio device id, same as legacy driver always did.  
> >>> Where did we require the subsystem vendor ID to be 0x1af4?  This
> >>> subsystem device ID really only makes since given that subsystem
> >>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> >>> such as the VF, have a hard requirement per the spec for the subsystem
> >>> vendor ID.
> >>>
> >>> Do we want to make this only probe the correct subsystem vendor ID or do
> >>> we want to emulate the subsystem vendor ID as well?  I don't see this is
> >>> correct without one of those options.  
> >> Looking in the 1.x spec we can see the below.
> >>
> >> Legacy Interfaces: A Note on PCI Device Discovery
> >>
> >> "Transitional devices MUST have the PCI Subsystem
> >> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> >> This is to match legacy drivers."
> >>
> >> However, there is no need to enforce Subsystem Vendor ID.
> >>
> >> This is what we followed here.
> >>
> >> Makes sense ?  
> > So do I understand correctly that virtio dictates the subsystem device
> > ID for all subsystem vendor IDs that implement a legacy virtio
> > interface?  Ok, but this device didn't actually implement a legacy
> > virtio interface.  The device itself is not tranistional, we're imposing
> > an emulated transitional interface onto it.  So did the subsystem vendor
> > agree to have their subsystem device ID managed by the virtio committee
> > or might we create conflicts?  I imagine we know we don't have a
> > conflict if we also virtualize the subsystem vendor ID.
> >  
> The non transitional net device in the virtio spec defined as the below 
> tuple.
> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> 
> And transitional net device in the virtio spec for a vendor FOO is 
> defined as:
> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> 
> This driver is converting T_A to T_B, which both are defined by the 
> virtio spec.
> Hence, it does not conflict for the subsystem vendor, it is fine.

Surprising to me that the virtio spec dictates subsystem device ID in
all cases.  The further discussion in this thread seems to indicate we
need to virtualize subsystem vendor ID for broader driver compatibility
anyway.

> > BTW, it would be a lot easier for all of the config space emulation here
> > if we could make use of the existing field virtualization in
> > vfio-pci-core.  In fact you'll see in vfio_config_init() that
> > PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
> > simply do the following to report the desired device ID:
> >
> > 	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);  
> 
> I would prefer keeping things simple and have one place/flow that 
> handles all the fields as we have now as part of the driver.

That's the same argument I'd make for re-using the core code, we don't
need multiple implementations handling merging physical and virtual
bits within config space.

> In any case, I'll further look at that option for managing the DEVICE_ID 
> towards V2.
> 
> > It appears everything in this function could be handled similarly by
> > vfio-pci-core if the right fields in the perm_bits.virt and .write
> > bits could be manipulated and vconfig modified appropriately.  I'd look
> > for a way that a variant driver could provide an alternate set of
> > permissions structures for various capabilities.  Thanks,  
> 
> OK
> 
> However, let's not block V2 and the series acceptance as of that.
> 
> It can always be some future refactoring as part of other series that 
> will bring the infra-structure that is needed for that.

We're already on the verge of the v6.7 merge window, so this looks like
v6.8 material anyway.  We have time.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 17:55             ` Alex Williamson
@ 2023-10-26 19:49               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 19:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, maorg, virtualization, jgg, jiri, leonro

On Thu, Oct 26, 2023 at 11:55:39AM -0600, Alex Williamson wrote:
> On Thu, 26 Oct 2023 15:08:12 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > On 25/10/2023 22:13, Alex Williamson wrote:
> > > On Wed, 25 Oct 2023 17:35:51 +0300
> > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > >  
> > >> On 24/10/2023 22:57, Alex Williamson wrote:  
> > >>> On Tue, 17 Oct 2023 16:42:17 +0300
> > >>> Yishai Hadas <yishaih@nvidia.com> wrote:
>    
> > >>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> > >>>> +			return -EFAULT;
> > >>>> +	}
> > >>>> +
> > >>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> > >>>> +				  &copy_offset, &copy_count, NULL)) {
> > >>>> +		/*
> > >>>> +		 * Transitional devices use the PCI subsystem device id as
> > >>>> +		 * virtio device id, same as legacy driver always did.  
> > >>> Where did we require the subsystem vendor ID to be 0x1af4?  This
> > >>> subsystem device ID really only makes since given that subsystem
> > >>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> > >>> such as the VF, have a hard requirement per the spec for the subsystem
> > >>> vendor ID.
> > >>>
> > >>> Do we want to make this only probe the correct subsystem vendor ID or do
> > >>> we want to emulate the subsystem vendor ID as well?  I don't see this is
> > >>> correct without one of those options.  
> > >> Looking in the 1.x spec we can see the below.
> > >>
> > >> Legacy Interfaces: A Note on PCI Device Discovery
> > >>
> > >> "Transitional devices MUST have the PCI Subsystem
> > >> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> > >> This is to match legacy drivers."
> > >>
> > >> However, there is no need to enforce Subsystem Vendor ID.
> > >>
> > >> This is what we followed here.
> > >>
> > >> Makes sense ?  
> > > So do I understand correctly that virtio dictates the subsystem device
> > > ID for all subsystem vendor IDs that implement a legacy virtio
> > > interface?  Ok, but this device didn't actually implement a legacy
> > > virtio interface.  The device itself is not tranistional, we're imposing
> > > an emulated transitional interface onto it.  So did the subsystem vendor
> > > agree to have their subsystem device ID managed by the virtio committee
> > > or might we create conflicts?  I imagine we know we don't have a
> > > conflict if we also virtualize the subsystem vendor ID.
> > >  
> > The non transitional net device in the virtio spec defined as the below 
> > tuple.
> > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> > 
> > And transitional net device in the virtio spec for a vendor FOO is 
> > defined as:
> > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> > 
> > This driver is converting T_A to T_B, which both are defined by the 
> > virtio spec.
> > Hence, it does not conflict for the subsystem vendor, it is fine.
> 
> Surprising to me that the virtio spec dictates subsystem device ID in
> all cases.

Modern virtio spec doesn't. Legacy spec did.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-26 19:49               ` Michael S. Tsirkin
  0 siblings, 0 replies; 100+ messages in thread
From: Michael S. Tsirkin @ 2023-10-26 19:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, jasowang, jgg, kvm, virtualization, parav, feliu,
	jiri, kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On Thu, Oct 26, 2023 at 11:55:39AM -0600, Alex Williamson wrote:
> On Thu, 26 Oct 2023 15:08:12 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > On 25/10/2023 22:13, Alex Williamson wrote:
> > > On Wed, 25 Oct 2023 17:35:51 +0300
> > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > >  
> > >> On 24/10/2023 22:57, Alex Williamson wrote:  
> > >>> On Tue, 17 Oct 2023 16:42:17 +0300
> > >>> Yishai Hadas <yishaih@nvidia.com> wrote:
>    
> > >>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
> > >>>> +			return -EFAULT;
> > >>>> +	}
> > >>>> +
> > >>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
> > >>>> +				  &copy_offset, &copy_count, NULL)) {
> > >>>> +		/*
> > >>>> +		 * Transitional devices use the PCI subsystem device id as
> > >>>> +		 * virtio device id, same as legacy driver always did.  
> > >>> Where did we require the subsystem vendor ID to be 0x1af4?  This
> > >>> subsystem device ID really only makes since given that subsystem
> > >>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
> > >>> such as the VF, have a hard requirement per the spec for the subsystem
> > >>> vendor ID.
> > >>>
> > >>> Do we want to make this only probe the correct subsystem vendor ID or do
> > >>> we want to emulate the subsystem vendor ID as well?  I don't see this is
> > >>> correct without one of those options.  
> > >> Looking in the 1.x spec we can see the below.
> > >>
> > >> Legacy Interfaces: A Note on PCI Device Discovery
> > >>
> > >> "Transitional devices MUST have the PCI Subsystem
> > >> Device ID matching the Virtio Device ID, as indicated in section 5 ...
> > >> This is to match legacy drivers."
> > >>
> > >> However, there is no need to enforce Subsystem Vendor ID.
> > >>
> > >> This is what we followed here.
> > >>
> > >> Makes sense ?  
> > > So do I understand correctly that virtio dictates the subsystem device
> > > ID for all subsystem vendor IDs that implement a legacy virtio
> > > interface?  Ok, but this device didn't actually implement a legacy
> > > virtio interface.  The device itself is not tranistional, we're imposing
> > > an emulated transitional interface onto it.  So did the subsystem vendor
> > > agree to have their subsystem device ID managed by the virtio committee
> > > or might we create conflicts?  I imagine we know we don't have a
> > > conflict if we also virtualize the subsystem vendor ID.
> > >  
> > The non transitional net device in the virtio spec defined as the below 
> > tuple.
> > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> > 
> > And transitional net device in the virtio spec for a vendor FOO is 
> > defined as:
> > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> > 
> > This driver is converting T_A to T_B, which both are defined by the 
> > virtio spec.
> > Hence, it does not conflict for the subsystem vendor, it is fine.
> 
> Surprising to me that the virtio spec dictates subsystem device ID in
> all cases.

Modern virtio spec doesn't. Legacy spec did.

-- 
MST


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
  2023-10-26 17:55             ` Alex Williamson
@ 2023-10-29 16:13               ` Yishai Hadas
  -1 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas via Virtualization @ 2023-10-29 16:13 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mst, maorg, virtualization, jgg, jiri, leonro

On 26/10/2023 20:55, Alex Williamson wrote:
> On Thu, 26 Oct 2023 15:08:12 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> On 25/10/2023 22:13, Alex Williamson wrote:
>>> On Wed, 25 Oct 2023 17:35:51 +0300
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>   
>>>> On 24/10/2023 22:57, Alex Williamson wrote:
>>>>> On Tue, 17 Oct 2023 16:42:17 +0300
>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>     
>>>>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
>>>>>> +			return -EFAULT;
>>>>>> +	}
>>>>>> +
>>>>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>>>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>>>> +		/*
>>>>>> +		 * Transitional devices use the PCI subsystem device id as
>>>>>> +		 * virtio device id, same as legacy driver always did.
>>>>> Where did we require the subsystem vendor ID to be 0x1af4?  This
>>>>> subsystem device ID really only makes since given that subsystem
>>>>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
>>>>> such as the VF, have a hard requirement per the spec for the subsystem
>>>>> vendor ID.
>>>>>
>>>>> Do we want to make this only probe the correct subsystem vendor ID or do
>>>>> we want to emulate the subsystem vendor ID as well?  I don't see this is
>>>>> correct without one of those options.
>>>> Looking in the 1.x spec we can see the below.
>>>>
>>>> Legacy Interfaces: A Note on PCI Device Discovery
>>>>
>>>> "Transitional devices MUST have the PCI Subsystem
>>>> Device ID matching the Virtio Device ID, as indicated in section 5 ...
>>>> This is to match legacy drivers."
>>>>
>>>> However, there is no need to enforce Subsystem Vendor ID.
>>>>
>>>> This is what we followed here.
>>>>
>>>> Makes sense ?
>>> So do I understand correctly that virtio dictates the subsystem device
>>> ID for all subsystem vendor IDs that implement a legacy virtio
>>> interface?  Ok, but this device didn't actually implement a legacy
>>> virtio interface.  The device itself is not tranistional, we're imposing
>>> an emulated transitional interface onto it.  So did the subsystem vendor
>>> agree to have their subsystem device ID managed by the virtio committee
>>> or might we create conflicts?  I imagine we know we don't have a
>>> conflict if we also virtualize the subsystem vendor ID.
>>>   
>> The non transitional net device in the virtio spec defined as the below
>> tuple.
>> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
>>
>> And transitional net device in the virtio spec for a vendor FOO is
>> defined as:
>> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
>>
>> This driver is converting T_A to T_B, which both are defined by the
>> virtio spec.
>> Hence, it does not conflict for the subsystem vendor, it is fine.
> Surprising to me that the virtio spec dictates subsystem device ID in
> all cases.  The further discussion in this thread seems to indicate we
> need to virtualize subsystem vendor ID for broader driver compatibility
> anyway.
>
>>> BTW, it would be a lot easier for all of the config space emulation here
>>> if we could make use of the existing field virtualization in
>>> vfio-pci-core.  In fact you'll see in vfio_config_init() that
>>> PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
>>> simply do the following to report the desired device ID:
>>>
>>> 	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);
>> I would prefer keeping things simple and have one place/flow that
>> handles all the fields as we have now as part of the driver.
> That's the same argument I'd make for re-using the core code, we don't
> need multiple implementations handling merging physical and virtual
> bits within config space.
>
>> In any case, I'll further look at that option for managing the DEVICE_ID
>> towards V2.
>>
>>> It appears everything in this function could be handled similarly by
>>> vfio-pci-core if the right fields in the perm_bits.virt and .write
>>> bits could be manipulated and vconfig modified appropriately.  I'd look
>>> for a way that a variant driver could provide an alternate set of
>>> permissions structures for various capabilities.  Thanks,
>> OK
>>
>> However, let's not block V2 and the series acceptance as of that.
>>
>> It can always be some future refactoring as part of other series that
>> will bring the infra-structure that is needed for that.
> We're already on the verge of the v6.7 merge window, so this looks like
> v6.8 material anyway.  We have time.  Thanks,

OK

I sent V2 having all the other notes handled to share and get feedback 
from both you and Michael.

Let's continue from there to see what is needed towards v6.8.

Thanks,
Yishai

>
> Alex
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
@ 2023-10-29 16:13               ` Yishai Hadas
  0 siblings, 0 replies; 100+ messages in thread
From: Yishai Hadas @ 2023-10-29 16:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mst, jasowang, jgg, kvm, virtualization, parav, feliu, jiri,
	kevin.tian, joao.m.martins, si-wei.liu, leonro, maorg

On 26/10/2023 20:55, Alex Williamson wrote:
> On Thu, 26 Oct 2023 15:08:12 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> On 25/10/2023 22:13, Alex Williamson wrote:
>>> On Wed, 25 Oct 2023 17:35:51 +0300
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>   
>>>> On 24/10/2023 22:57, Alex Williamson wrote:
>>>>> On Tue, 17 Oct 2023 16:42:17 +0300
>>>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>     
>>>>>> +		if (copy_to_user(buf + copy_offset, &val32, copy_count))
>>>>>> +			return -EFAULT;
>>>>>> +	}
>>>>>> +
>>>>>> +	if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16),
>>>>>> +				  &copy_offset, &copy_count, NULL)) {
>>>>>> +		/*
>>>>>> +		 * Transitional devices use the PCI subsystem device id as
>>>>>> +		 * virtio device id, same as legacy driver always did.
>>>>> Where did we require the subsystem vendor ID to be 0x1af4?  This
>>>>> subsystem device ID really only makes since given that subsystem
>>>>> vendor ID, right?  Otherwise I don't see that non-transitional devices,
>>>>> such as the VF, have a hard requirement per the spec for the subsystem
>>>>> vendor ID.
>>>>>
>>>>> Do we want to make this only probe the correct subsystem vendor ID or do
>>>>> we want to emulate the subsystem vendor ID as well?  I don't see this is
>>>>> correct without one of those options.
>>>> Looking in the 1.x spec we can see the below.
>>>>
>>>> Legacy Interfaces: A Note on PCI Device Discovery
>>>>
>>>> "Transitional devices MUST have the PCI Subsystem
>>>> Device ID matching the Virtio Device ID, as indicated in section 5 ...
>>>> This is to match legacy drivers."
>>>>
>>>> However, there is no need to enforce Subsystem Vendor ID.
>>>>
>>>> This is what we followed here.
>>>>
>>>> Makes sense ?
>>> So do I understand correctly that virtio dictates the subsystem device
>>> ID for all subsystem vendor IDs that implement a legacy virtio
>>> interface?  Ok, but this device didn't actually implement a legacy
>>> virtio interface.  The device itself is not tranistional, we're imposing
>>> an emulated transitional interface onto it.  So did the subsystem vendor
>>> agree to have their subsystem device ID managed by the virtio committee
>>> or might we create conflicts?  I imagine we know we don't have a
>>> conflict if we also virtualize the subsystem vendor ID.
>>>   
>> The non transitional net device in the virtio spec defined as the below
>> tuple.
>> T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
>>
>> And transitional net device in the virtio spec for a vendor FOO is
>> defined as:
>> T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
>>
>> This driver is converting T_A to T_B, which both are defined by the
>> virtio spec.
>> Hence, it does not conflict for the subsystem vendor, it is fine.
> Surprising to me that the virtio spec dictates subsystem device ID in
> all cases.  The further discussion in this thread seems to indicate we
> need to virtualize subsystem vendor ID for broader driver compatibility
> anyway.
>
>>> BTW, it would be a lot easier for all of the config space emulation here
>>> if we could make use of the existing field virtualization in
>>> vfio-pci-core.  In fact you'll see in vfio_config_init() that
>>> PCI_DEVICE_ID is already virtualized for VFs, so it would be enough to
>>> simply do the following to report the desired device ID:
>>>
>>> 	*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(0x1000);
>> I would prefer keeping things simple and have one place/flow that
>> handles all the fields as we have now as part of the driver.
> That's the same argument I'd make for re-using the core code, we don't
> need multiple implementations handling merging physical and virtual
> bits within config space.
>
>> In any case, I'll further look at that option for managing the DEVICE_ID
>> towards V2.
>>
>>> It appears everything in this function could be handled similarly by
>>> vfio-pci-core if the right fields in the perm_bits.virt and .write
>>> bits could be manipulated and vconfig modified appropriately.  I'd look
>>> for a way that a variant driver could provide an alternate set of
>>> permissions structures for various capabilities.  Thanks,
>> OK
>>
>> However, let's not block V2 and the series acceptance as of that.
>>
>> It can always be some future refactoring as part of other series that
>> will bring the infra-structure that is needed for that.
> We're already on the verge of the v6.7 merge window, so this looks like
> v6.8 material anyway.  We have time.  Thanks,

OK

I sent V2 having all the other notes handled to share and get feedback 
from both you and Michael.

Let's continue from there to see what is needed towards v6.8.

Thanks,
Yishai

>
> Alex
>


^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2023-10-29 16:14 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-17 13:42 [PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices Yishai Hadas
2023-10-17 13:42 ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 1/9] virtio-pci: Fix common config map for modern device Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 2/9] virtio: Define feature bit for administration virtqueue Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 3/9] virtio-pci: Introduce admin virtqueue Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 4/9] virtio-pci: Introduce admin command sending function Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 5/9] virtio-pci: Introduce admin commands Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO " Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 20:33   ` kernel test robot
2023-10-17 20:33     ` kernel test robot
2023-10-22  1:14   ` kernel test robot
2023-10-22  1:14     ` kernel test robot
2023-10-24 21:01   ` Michael S. Tsirkin
2023-10-24 21:01     ` Michael S. Tsirkin
2023-10-25  9:18     ` Yishai Hadas via Virtualization
2023-10-25 10:17       ` Michael S. Tsirkin
2023-10-25 10:17         ` Michael S. Tsirkin
2023-10-25 13:00         ` Yishai Hadas
2023-10-25 13:00           ` Yishai Hadas via Virtualization
2023-10-25 13:04           ` Michael S. Tsirkin
2023-10-25 13:04             ` Michael S. Tsirkin
2023-10-25 13:44           ` Michael S. Tsirkin
2023-10-25 13:44             ` Michael S. Tsirkin
2023-10-25 14:03             ` Yishai Hadas
2023-10-25 14:03               ` Yishai Hadas via Virtualization
2023-10-25 16:31               ` Michael S. Tsirkin
2023-10-25 16:31                 ` Michael S. Tsirkin
2023-10-25  9:36     ` Yishai Hadas
2023-10-25  9:36       ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap() Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size() Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 13:42 ` [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices Yishai Hadas
2023-10-17 13:42   ` Yishai Hadas via Virtualization
2023-10-17 20:24   ` Alex Williamson
2023-10-17 20:24     ` Alex Williamson
2023-10-18  9:01     ` Yishai Hadas
2023-10-18  9:01       ` Yishai Hadas via Virtualization
2023-10-18 12:51       ` Alex Williamson
2023-10-18 12:51         ` Alex Williamson
2023-10-18 13:06         ` Parav Pandit
2023-10-18 13:06           ` Parav Pandit via Virtualization
2023-10-18 16:33     ` Jason Gunthorpe
2023-10-18 18:29       ` Alex Williamson
2023-10-18 18:29         ` Alex Williamson
2023-10-18 19:28         ` Jason Gunthorpe
2023-10-24 19:57   ` Alex Williamson
2023-10-24 19:57     ` Alex Williamson
2023-10-25 14:35     ` Yishai Hadas
2023-10-25 14:35       ` Yishai Hadas via Virtualization
2023-10-25 16:28       ` Michael S. Tsirkin
2023-10-25 16:28         ` Michael S. Tsirkin
2023-10-25 19:13       ` Alex Williamson
2023-10-25 19:13         ` Alex Williamson
2023-10-26 12:08         ` Yishai Hadas
2023-10-26 12:08           ` Yishai Hadas via Virtualization
2023-10-26 12:12           ` Michael S. Tsirkin
2023-10-26 12:12             ` Michael S. Tsirkin
2023-10-26 12:40             ` Parav Pandit
2023-10-26 12:40               ` Parav Pandit via Virtualization
2023-10-26 13:15               ` Michael S. Tsirkin
2023-10-26 13:15                 ` Michael S. Tsirkin
2023-10-26 13:28                 ` Parav Pandit
2023-10-26 13:28                   ` Parav Pandit via Virtualization
2023-10-26 15:06                   ` Michael S. Tsirkin
2023-10-26 15:06                     ` Michael S. Tsirkin
2023-10-26 15:09                     ` Parav Pandit
2023-10-26 15:09                       ` Parav Pandit via Virtualization
2023-10-26 15:46                       ` Michael S. Tsirkin
2023-10-26 15:46                         ` Michael S. Tsirkin
2023-10-26 15:56                         ` Parav Pandit
2023-10-26 15:56                           ` Parav Pandit via Virtualization
2023-10-26 17:55           ` Alex Williamson
2023-10-26 17:55             ` Alex Williamson
2023-10-26 19:49             ` Michael S. Tsirkin
2023-10-26 19:49               ` Michael S. Tsirkin
2023-10-29 16:13             ` Yishai Hadas via Virtualization
2023-10-29 16:13               ` Yishai Hadas
2023-10-22  8:20 ` [PATCH V1 vfio 0/9] " Yishai Hadas
2023-10-22  8:20   ` Yishai Hadas via Virtualization
2023-10-22  9:12   ` Michael S. Tsirkin
2023-10-22  9:12     ` Michael S. Tsirkin
2023-10-23 15:33   ` Alex Williamson
2023-10-23 15:33     ` Alex Williamson
2023-10-23 15:42     ` Jason Gunthorpe
2023-10-23 16:09       ` Alex Williamson
2023-10-23 16:09         ` Alex Williamson
2023-10-23 16:20         ` Jason Gunthorpe
2023-10-23 16:45           ` Alex Williamson
2023-10-23 16:45             ` Alex Williamson
2023-10-23 17:27             ` Jason Gunthorpe
2023-10-25  8:34       ` Tian, Kevin
2023-10-25  8:34         ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.