All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v5 00/26] vDPA shadow virtqueue
@ 2021-10-29 18:34 Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 01/26] util: Make some iova_tree parameters const Eugenio Pérez
                   ` (27 more replies)
  0 siblings, 28 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's virtio device operation.

When SVQ is enabled qemu offers a new virtual address space to the
device to read and write into, and it maps new vrings and the guest
memory in it. SVQ also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked, but at this RFC SVQ is not enabled on migration automatically.

Thanks of being a buffers relay system, SVQ can be used also to
communicate devices and drivers with different capabilities, like
devices that only supports packed vring and not split and old guest
with no driver packed support.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

For qemu to use shadow virtqueues the guest virtio driver must not use
features like event_idx.

SVQ needs to be enabled with QMP command:

{ "execute": "x-vhost-set-shadow-vq",
      "arguments": { "name": "vhost-vdpa0", "enable": true } }

This series includes some patches to delete in the final version that
helps with its testing. The first two of the series have been sent
sepparately but they haven't been included in qemu main branch.

The two after them adds the feature to stop the device and be able to
set and get its status. It's intended to be used with vp_vpda driver in
a nested environment, so they are also external to this series. The
vp_vdpa driver also need modifications to forward the new status bit,
they will be proposed sepparately

Patches 5-12 prepares the SVQ and QMP command to support guest to host
notifications forwarding. If the SVQ is enabled with these ones
applied and the device supports it, that part can be tested in
isolation (for example, with networking), hopping through SVQ.

Same thing is true with patches 13-17, but with device to guest
notifications.

Based on them, patches from 18 to 22 implement the actual buffer
forwarding, using some features already introduced in previous.
However, they will need a host device with no iommu, something that
is not available at the moment.

The last part of the series uses properly the host iommu, so the driver
can access this new virtual address space created.

Comments are welcome.

TODO:
* Event, indirect, packed, and others features of virtio.
* To sepparate buffers forwarding in its own AIO context, so we can
  throw more threads to that task and we don't need to stop the main
  event loop.
* Support multiqueue virtio-net vdpa.
* Proper documentation.

Changes from v4 RFC:
* Support of allocating / freeing iova ranges in IOVA tree. Extending
  already present iova-tree for that.
* Proper validation of guest features. Now SVQ can negotiate a
  different set of features with the device when enabled.
* Support of host notifiers memory regions
* Handling of SVQ full queue in case guest's descriptors span to
  different memory regions (qemu's VA chunks).
* Flush pending used buffers at end of SVQ operation.
* QMP command now looks by NetClientState name. Other devices will need
  to implement it's way to enable vdpa.
* Rename QMP command to set, so it looks more like a way of working
* Better use of qemu error system
* Make a few assertions proper error-handling paths.
* Add more documentation
* Less coupling of virtio / vhost, that could cause friction on changes
* Addressed many other small comments and small fixes.

Changes from v3 RFC:
  * Move everything to vhost-vdpa backend. A big change, this allowed
    some cleanup but more code has been added in other places.
  * More use of glib utilities, especially to manage memory.
v3 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html

Changes from v2 RFC:
  * Adding vhost-vdpa devices support
  * Fixed some memory leaks pointed by different comments
v2 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html

Changes from v1 RFC:
  * Use QMP instead of migration to start SVQ mode.
  * Only accepting IOMMU devices, closer behavior with target devices
    (vDPA)
  * Fix invalid masking/unmasking of vhost call fd.
  * Use of proper methods for synchronization.
  * No need to modify VirtIO device code, all of the changes are
    contained in vhost code.
  * Delete superfluous code.
  * An intermediate RFC was sent with only the notifications forwarding
    changes. It can be seen in
    https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
v1 link:
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html

Eugenio Pérez (20):
      virtio: Add VIRTIO_F_QUEUE_STATE
      virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
      virtio: Add virtio_queue_is_host_notifier_enabled
      vhost: Make vhost_virtqueue_{start,stop} public
      vhost: Add x-vhost-enable-shadow-vq qmp
      vhost: Add VhostShadowVirtqueue
      vdpa: Register vdpa devices in a list
      vhost: Route guest->host notification through shadow virtqueue
      Add vhost_svq_get_svq_call_notifier
      Add vhost_svq_set_guest_call_notifier
      vdpa: Save call_fd in vhost-vdpa
      vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
      vhost: Route host->guest notification through shadow virtqueue
      virtio: Add vhost_shadow_vq_get_vring_addr
      vdpa: Save host and guest features
      vhost: Add vhost_svq_valid_device_features to shadow vq
      vhost: Shadow virtqueue buffers forwarding
      vhost: Add VhostIOVATree
      vhost: Use a tree to store memory mappings
      vdpa: Add custom IOTLB translations to SVQ

Eugenio Pérez (26):
  util: Make some iova_tree parameters const
  vhost: Fix last queue index of devices with no cvq
  virtio: Add VIRTIO_F_QUEUE_STATE
  virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
  vhost: Add x-vhost-set-shadow-vq qmp
  vhost: Add VhostShadowVirtqueue
  vdpa: Save kick_fd in vhost-vdpa
  vdpa: Add vhost_svq_get_dev_kick_notifier
  vdpa: Add vhost_svq_set_svq_kick_fd
  vhost: Add Shadow VirtQueue kick forwarding capabilities
  vhost: Handle host notifiers in SVQ
  vhost: Route guest->host notification through shadow virtqueue
  Add vhost_svq_get_svq_call_notifier
  Add vhost_svq_set_guest_call_notifier
  vdpa: Save call_fd in vhost-vdpa
  vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  vhost: Route host->guest notification through shadow virtqueue
  virtio: Add vhost_shadow_vq_get_vring_addr
  vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
  vhost: Add vhost_svq_valid_device_features to shadow vq
  vhost: Add vhost_svq_valid_guest_features to shadow vq
  vhost: Shadow virtqueue buffers forwarding
  util: Add iova_tree_alloc
  vhost: Add VhostIOVATree
  vhost: Use a tree to store memory mappings
  vdpa: Add custom IOTLB translations to SVQ

 qapi/net.json                                 |  20 +
 hw/virtio/vhost-iova-tree.h                   |  27 +
 hw/virtio/vhost-shadow-virtqueue.h            |  44 ++
 hw/virtio/virtio-pci.h                        |   1 +
 include/hw/virtio/vhost-vdpa.h                |  12 +
 include/hw/virtio/virtio.h                    |   4 +-
 include/qemu/iova-tree.h                      |  25 +-
 .../standard-headers/linux/virtio_config.h    |   5 +
 include/standard-headers/linux/virtio_pci.h   |   2 +
 hw/i386/intel_iommu.c                         |   2 +-
 hw/net/vhost_net.c                            |   2 +-
 hw/net/virtio-net.c                           |   6 +-
 hw/virtio/vhost-iova-tree.c                   | 157 ++++
 hw/virtio/vhost-shadow-virtqueue.c            | 746 ++++++++++++++++++
 hw/virtio/vhost-vdpa.c                        | 437 +++++++++-
 hw/virtio/virtio-pci.c                        |  16 +-
 net/vhost-vdpa.c                              |  28 +
 util/iova-tree.c                              | 151 +++-
 hw/virtio/meson.build                         |   2 +-
 19 files changed, 1664 insertions(+), 23 deletions(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-iova-tree.c
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

-- 
2.27.0




^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 01/26] util: Make some iova_tree parameters const
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-31 18:59     ` Juan Quintela
  2021-10-29 18:35 ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Eugenio Pérez
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

As qemu guidelines:
Unless a pointer is used to modify the pointed-to storage, give it the
"const" attribute.

In the particular case of iova_tree_find it allows to enforce what is
requested by its comment, since the compiler would shout in case of
modifying or freeing the const-qualified returned pointer.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/qemu/iova-tree.h |  8 ++++----
 hw/i386/intel_iommu.c    |  2 +-
 util/iova-tree.c         | 12 ++++++------
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index b66cf93c4b..8249edd764 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -59,7 +59,7 @@ IOVATree *iova_tree_new(void);
  *
  * Return: 0 if succeeded, or <0 if error.
  */
-int iova_tree_insert(IOVATree *tree, DMAMap *map);
+int iova_tree_insert(IOVATree *tree, const DMAMap *map);
 
 /**
  * iova_tree_remove:
@@ -74,7 +74,7 @@ int iova_tree_insert(IOVATree *tree, DMAMap *map);
  *
  * Return: 0 if succeeded, or <0 if error.
  */
-int iova_tree_remove(IOVATree *tree, DMAMap *map);
+int iova_tree_remove(IOVATree *tree, const DMAMap *map);
 
 /**
  * iova_tree_find:
@@ -92,7 +92,7 @@ int iova_tree_remove(IOVATree *tree, DMAMap *map);
  * user is responsible to make sure the pointer is valid (say, no
  * concurrent deletion in progress).
  */
-DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map);
+const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map);
 
 /**
  * iova_tree_find_address:
@@ -105,7 +105,7 @@ DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map);
  *
  * Return: same as iova_tree_find().
  */
-DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova);
+const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
 
 /**
  * iova_tree_foreach:
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 75f075547f..33a8af9191 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1105,7 +1105,7 @@ static int vtd_page_walk_one(IOMMUTLBEvent *event, vtd_page_walk_info *info)
         .translated_addr = entry->translated_addr,
         .perm = entry->perm,
     };
-    DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
+    const DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
 
     if (event->type == IOMMU_NOTIFIER_UNMAP && !info->notify_unmap) {
         trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 7990692cbd..23ea35b7a4 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -42,14 +42,14 @@ IOVATree *iova_tree_new(void)
     return iova_tree;
 }
 
-DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map)
+const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map)
 {
     return g_tree_lookup(tree->tree, map);
 }
 
-DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova)
+const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova)
 {
-    DMAMap map = { .iova = iova, .size = 0 };
+    const DMAMap map = { .iova = iova, .size = 0 };
 
     return iova_tree_find(tree, &map);
 }
@@ -60,7 +60,7 @@ static inline void iova_tree_insert_internal(GTree *gtree, DMAMap *range)
     g_tree_insert(gtree, range, range);
 }
 
-int iova_tree_insert(IOVATree *tree, DMAMap *map)
+int iova_tree_insert(IOVATree *tree, const DMAMap *map)
 {
     DMAMap *new;
 
@@ -96,9 +96,9 @@ void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
     g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
 }
 
-int iova_tree_remove(IOVATree *tree, DMAMap *map)
+int iova_tree_remove(IOVATree *tree, const DMAMap *map)
 {
-    DMAMap *overlap;
+    const DMAMap *overlap;
 
     while ((overlap = iova_tree_find(tree, map))) {
         g_tree_remove(tree->tree, overlap);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 01/26] util: Make some iova_tree parameters const Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  7:25     ` Juan Quintela
  2021-11-02  7:40     ` Juan Quintela
  2021-10-29 18:35 ` [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE Eugenio Pérez
                   ` (25 subsequent siblings)
  27 siblings, 2 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

The -1 assumes that all devices with no cvq have an spare vq allocated
for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
case, and the device may have a pair number of queues.

To fix this, just resort to the lower even number of queues.

Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the virtio device")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/net/vhost_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 0d888f29a6..edf56a597f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     NetClientState *peer;
 
     if (!cvq) {
-        last_index -= 1;
+        last_index &= ~1ULL;
     }
 
     if (!k->set_guest_notifiers) {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 01/26] util: Make some iova_tree parameters const Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  4:57     ` Jason Wang
  2021-10-29 18:35 ` [RFC PATCH v5 04/26] virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED Eugenio Pérez
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Implementation of RFC of device state capability:
https://lists.oasis-open.org/archives/virtio-comment/202012/msg00005.html

With this capability, vdpa device can reset it's index so it can start
consuming from guest after disabling shadow virtqueue (SVQ), with state
not 0.

The use case is to test SVQ with virtio-pci vdpa (vp_vdpa) with nested
virtualization. Spawning a L0 qemu with a virtio-net device, use
vp_vdpa driver to handle it in the guest, and then spawn a L1 qemu using
that vdpa device. When L1 qemu calls device to set a new state though
vdpa ioctl, vp_vdpa should set each queue state though virtio
VIRTIO_PCI_COMMON_Q_AVAIL_STATE.

Since this is only for testing vhost-vdpa, it's added here before of
proposing to kernel code. No effort is done for checking that device
can actually change its state, its layout, or if the device even
supports to change state at all. These will be added in the future.

Also, a modified version of vp_vdpa that allows to set these in PCI
config is needed.

TODO: Check for feature enabled and split in virtio pci config

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/virtio-pci.h                         | 1 +
 include/hw/virtio/virtio.h                     | 4 +++-
 include/standard-headers/linux/virtio_config.h | 3 +++
 include/standard-headers/linux/virtio_pci.h    | 2 ++
 hw/virtio/virtio-pci.c                         | 9 +++++++++
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 2446dcd9ae..019badbd7c 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -120,6 +120,7 @@ typedef struct VirtIOPCIQueue {
   uint32_t desc[2];
   uint32_t avail[2];
   uint32_t used[2];
+  uint16_t state;
 } VirtIOPCIQueue;
 
 struct VirtIOPCIProxy {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 8bab9cfb75..5fe575b8f0 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -289,7 +289,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
     DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
                       VIRTIO_F_IOMMU_PLATFORM, false), \
     DEFINE_PROP_BIT64("packed", _state, _field, \
-                      VIRTIO_F_RING_PACKED, false)
+                      VIRTIO_F_RING_PACKED, false), \
+    DEFINE_PROP_BIT64("save_restore_q_state", _state, _field, \
+                      VIRTIO_F_QUEUE_STATE, true)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index 22e3a85f67..59fad3eb45 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -90,4 +90,7 @@
  * Does the device support Single Root I/O Virtualization?
  */
 #define VIRTIO_F_SR_IOV			37
+
+/* Device support save and restore virtqueue state */
+#define VIRTIO_F_QUEUE_STATE            40
 #endif /* _LINUX_VIRTIO_CONFIG_H */
diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h
index db7a8e2fcb..c8d9802a87 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -164,6 +164,7 @@ struct virtio_pci_common_cfg {
 	uint32_t queue_avail_hi;		/* read-write */
 	uint32_t queue_used_lo;		/* read-write */
 	uint32_t queue_used_hi;		/* read-write */
+	uint16_t queue_avail_state;     /* read-write */
 };
 
 /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
@@ -202,6 +203,7 @@ struct virtio_pci_cfg_cap {
 #define VIRTIO_PCI_COMMON_Q_AVAILHI	44
 #define VIRTIO_PCI_COMMON_Q_USEDLO	48
 #define VIRTIO_PCI_COMMON_Q_USEDHI	52
+#define VIRTIO_PCI_COMMON_Q_AVAIL_STATE	56
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 750aa47ec1..d7bb549033 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1244,6 +1244,9 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
     case VIRTIO_PCI_COMMON_Q_USEDHI:
         val = proxy->vqs[vdev->queue_sel].used[1];
         break;
+    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
+        val = virtio_queue_get_last_avail_idx(vdev, vdev->queue_sel);
+        break;
     default:
         val = 0;
     }
@@ -1330,6 +1333,8 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
                        proxy->vqs[vdev->queue_sel].avail[0],
                        ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
                        proxy->vqs[vdev->queue_sel].used[0]);
+            virtio_queue_set_last_avail_idx(vdev, vdev->queue_sel,
+                        proxy->vqs[vdev->queue_sel].state);
             proxy->vqs[vdev->queue_sel].enabled = 1;
         } else {
             virtio_error(vdev, "wrong value for queue_enable %"PRIx64, val);
@@ -1353,6 +1358,9 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
     case VIRTIO_PCI_COMMON_Q_USEDHI:
         proxy->vqs[vdev->queue_sel].used[1] = val;
         break;
+    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
+        proxy->vqs[vdev->queue_sel].state = val;
+        break;
     default:
         break;
     }
@@ -1951,6 +1959,7 @@ static void virtio_pci_reset(DeviceState *qdev)
         proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
         proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
         proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
+        proxy->vqs[i].state = 0;
     }
 
     if (pci_is_express(dev)) {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 04/26] virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (2 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp Eugenio Pérez
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

So the guest can stop and start net device. It freely implements the RFC
https://lists.oasis-open.org/archives/virtio-comment/202012/msg00027.html

To stop (as "pause") the device is required to migrate status and vring
addresses between device and SVQ. Once the device is stopped, the driver
can request avail_idx, so it can be assigned to SVQ.

This is a WIP commit: as with VIRTIO_F_QUEUE_STATE, is introduced in
virtio_config.h before of even proposing for the kernel, with no feature
flag, and, with no checking in the device. It also needs a modified
vp_vdpa driver that supports to set and retrieve status.

For virtio-net with qemu device there is no need to restore avail
state: Since every tx and rx operation is entirely done in BQL
regarding virtio, it would be enough with restore last_avail_idx with
used_idx. Doing this way test the vq state part of the rest of the
series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/standard-headers/linux/virtio_config.h | 2 ++
 hw/net/virtio-net.c                            | 6 ++++--
 hw/virtio/virtio-pci.c                         | 7 +++++--
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index 59fad3eb45..b3f6b1365d 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -40,6 +40,8 @@
 #define VIRTIO_CONFIG_S_DRIVER_OK	4
 /* Driver has finished configuring features */
 #define VIRTIO_CONFIG_S_FEATURES_OK	8
+/* Device is stopped */
+#define VIRTIO_CONFIG_S_DEVICE_STOPPED 32
 /* Device entered invalid state, driver must reset it */
 #define VIRTIO_CONFIG_S_NEEDS_RESET	0x40
 /* We've given up on this device. */
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f2014d5ea0..8b7b97e42d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -198,6 +198,7 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
     return (status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+        (!(status & VIRTIO_CONFIG_S_DEVICE_STOPPED)) &&
         (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
 }
 
@@ -386,7 +387,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
             qemu_flush_queued_packets(ncs);
         }
 
-        if (!q->tx_waiting) {
+        if (!q->tx_waiting && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED)) {
             continue;
         }
 
@@ -1489,7 +1490,8 @@ static bool virtio_net_can_receive(NetClientState *nc)
     }
 
     if (!virtio_queue_ready(q->rx_vq) ||
-        !(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+        !(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) ||
+        vdev->status == VIRTIO_CONFIG_S_DEVICE_STOPPED) {
         return false;
     }
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d7bb549033..741a2bd2fa 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -327,13 +327,15 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         }
         break;
     case VIRTIO_PCI_STATUS:
-        if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+        if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) ||
+            val & VIRTIO_CONFIG_S_DEVICE_STOPPED) {
             virtio_pci_stop_ioeventfd(proxy);
         }
 
         virtio_set_status(vdev, val & 0xFF);
 
-        if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (val & VIRTIO_CONFIG_S_DRIVER_OK &&
+            !(val & VIRTIO_CONFIG_S_DEVICE_STOPPED)) {
             virtio_pci_start_ioeventfd(proxy);
         }
 
@@ -1335,6 +1337,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
                        proxy->vqs[vdev->queue_sel].used[0]);
             virtio_queue_set_last_avail_idx(vdev, vdev->queue_sel,
                         proxy->vqs[vdev->queue_sel].state);
+            virtio_queue_update_used_idx(vdev, vdev->queue_sel);
             proxy->vqs[vdev->queue_sel].enabled = 1;
         } else {
             virtio_error(vdev, "wrong value for queue_enable %"PRIx64, val);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (3 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 04/26] virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  7:36     ` Juan Quintela
  2021-10-29 18:35 ` [RFC PATCH v5 06/26] vhost: Add VhostShadowVirtqueue Eugenio Pérez
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Command to set shadow virtqueue mode.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    | 22 ++++++++++++++++++++++
 net/vhost-vdpa.c |  6 ++++++
 2 files changed, 28 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 7fab2e7cd8..b191b6787b 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -79,6 +79,28 @@
 { 'command': 'netdev_del', 'data': {'id': 'str'},
   'allow-preconfig': true }
 
+##
+# @x-vhost-set-shadow-vq:
+#
+# Use vhost shadow virtqueue.
+#
+# @name: the device name of the VirtIO device
+#
+# @set: true to use the alternate shadow VQ notifications
+#
+# Returns: Always error, since SVQ is not implemented at the moment.
+#
+# Since: 6.2
+#
+# Example:
+#
+# -> { "execute": "x-vhost-set-shadow-vq",
+#     "arguments": { "name": "virtio-net", "set": false } }
+#
+##
+{ 'command': 'x-vhost-set-shadow-vq', 'data': {'name': 'str', 'set': 'bool'},
+  'if': 'CONFIG_VHOST_VDPA' }
+
 ##
 # @NetLegacyNicOptions:
 #
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 49ab322511..3b360da27d 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -18,6 +18,7 @@
 #include "qemu/error-report.h"
 #include "qemu/option.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-net.h"
 #include <linux/vhost.h>
 #include <sys/ioctl.h>
 #include <err.h>
@@ -301,3 +302,8 @@ err:
 
     return -1;
 }
+
+void qmp_x_vhost_set_shadow_vq(const char *name, bool set, Error **errp)
+{
+    error_setg(errp, "Shadow virtqueue still not implemented");
+}
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 06/26] vhost: Add VhostShadowVirtqueue
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (4 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 07/26] vdpa: Save kick_fd in vhost-vdpa Eugenio Pérez
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
notifications and buffers, allowing qemu to track them. While qemu is
forwarding the buffers and virtqueue changes, is able to commit the
memory it's being dirtied, the same way regular qemu's VirtIO devices
do.

This commit only exposes basic SVQ allocation and free, so changes
regarding different aspects of SVQ (notifications forwarding, buffer
forwarding, starting/stopping) are more isolated and easier to bisect.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
 hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
 hw/virtio/meson.build              |  2 +-
 3 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
new file mode 100644
index 0000000000..27ac6388fa
--- /dev/null
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -0,0 +1,21 @@
+/*
+ * vhost shadow virtqueue
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef VHOST_SHADOW_VIRTQUEUE_H
+#define VHOST_SHADOW_VIRTQUEUE_H
+
+#include "hw/virtio/vhost.h"
+
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+
+VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
+
+void vhost_svq_free(VhostShadowVirtqueue *vq);
+
+#endif
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
new file mode 100644
index 0000000000..38887c3433
--- /dev/null
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -0,0 +1,64 @@
+/*
+ * vhost shadow virtqueue
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
+
+#include "qemu/error-report.h"
+#include "qemu/event_notifier.h"
+
+/* Shadow virtqueue to relay notifications */
+typedef struct VhostShadowVirtqueue {
+    /* Shadow kick notifier, sent to vhost */
+    EventNotifier hdev_kick;
+    /* Shadow call notifier, sent to vhost */
+    EventNotifier hdev_call;
+} VhostShadowVirtqueue;
+
+/*
+ * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
+ * methods and file descriptors.
+ */
+VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
+{
+    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
+    int r;
+
+    r = event_notifier_init(&svq->hdev_kick, 0);
+    if (r != 0) {
+        error_report("Couldn't create kick event notifier: %s",
+                     strerror(errno));
+        goto err_init_hdev_kick;
+    }
+
+    r = event_notifier_init(&svq->hdev_call, 0);
+    if (r != 0) {
+        error_report("Couldn't create call event notifier: %s",
+                     strerror(errno));
+        goto err_init_hdev_call;
+    }
+
+    return g_steal_pointer(&svq);
+
+err_init_hdev_call:
+    event_notifier_cleanup(&svq->hdev_kick);
+
+err_init_hdev_kick:
+    return NULL;
+}
+
+/*
+ * Free the resources of the shadow virtqueue.
+ */
+void vhost_svq_free(VhostShadowVirtqueue *vq)
+{
+    event_notifier_cleanup(&vq->hdev_kick);
+    event_notifier_cleanup(&vq->hdev_call);
+    g_free(vq);
+}
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 521f7d64a8..2dc87613bc 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 07/26] vdpa: Save kick_fd in vhost-vdpa
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (5 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 06/26] vhost: Add VhostShadowVirtqueue Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 08/26] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

We need to know it to switch to Shadow VirtQueue and back to normal
operation.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h | 1 +
 hw/virtio/vhost-vdpa.c         | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 3ce79a646d..c79a21c3c8 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -28,6 +28,7 @@ typedef struct vhost_vdpa {
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
     struct vhost_dev *dev;
+    int kick_fd[VIRTIO_QUEUE_MAX];
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 12661fd5b1..e6ee227385 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -702,7 +702,12 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
 static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
+    struct vhost_vdpa *v = dev->opaque;
+    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
+
     trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
+
+    v->kick_fd[vdpa_idx] = file->fd;
     return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 08/26] vdpa: Add vhost_svq_get_dev_kick_notifier
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (6 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 07/26] vdpa: Save kick_fd in vhost-vdpa Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 09/26] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Is needed so vhost-vdpa knows what to send to device as kick event fd.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
 hw/virtio/vhost-shadow-virtqueue.c | 10 +++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 27ac6388fa..50ebddbbb9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -11,9 +11,13 @@
 #define VHOST_SHADOW_VIRTQUEUE_H
 
 #include "hw/virtio/vhost.h"
+#include "qemu/event_notifier.h"
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
+const EventNotifier *vhost_svq_get_dev_kick_notifier(
+                                              const VhostShadowVirtqueue *svq);
+
 VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
 
 void vhost_svq_free(VhostShadowVirtqueue *vq);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 38887c3433..076418556d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,7 +11,6 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
 #include "qemu/error-report.h"
-#include "qemu/event_notifier.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
@@ -21,6 +20,15 @@ typedef struct VhostShadowVirtqueue {
     EventNotifier hdev_call;
 } VhostShadowVirtqueue;
 
+/**
+ * The notifier that SVQ will use to notify the device.
+ */
+const EventNotifier *vhost_svq_get_dev_kick_notifier(
+                                               const VhostShadowVirtqueue *svq)
+{
+    return &svq->hdev_kick;
+}
+
 /*
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 09/26] vdpa: Add vhost_svq_set_svq_kick_fd
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (7 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 08/26] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 10/26] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This function allow vhost-vdpa backend to override kick_fd.

There are a few pieces still missing, like the guest's kick handler in
SVQ, and how to handle the first set of kick file descriptor, that has
its own complexities. These will be added in next patches.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 36 ++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 50ebddbbb9..a19eede089 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,7 @@
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 076418556d..513d7f2782 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,7 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
@@ -18,6 +19,16 @@ typedef struct VhostShadowVirtqueue {
     EventNotifier hdev_kick;
     /* Shadow call notifier, sent to vhost */
     EventNotifier hdev_call;
+
+    /*
+     * Borrowed virtqueue's guest to host notifier.
+     * To borrow it in this event notifier allows to register on the event
+     * loop and access the associated shadow virtqueue easily. If we use the
+     * VirtQueue, we don't have an easy way to retrieve it.
+     *
+     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
+     */
+    EventNotifier svq_kick;
 } VhostShadowVirtqueue;
 
 /**
@@ -29,6 +40,31 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+/**
+ * Set a new file descriptor for the guest to kick SVQ and notify for avail
+ *
+ * @svq          The svq
+ * @svq_kick_fd  The new svq kick fd
+ */
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+    EventNotifier tmp;
+
+    event_notifier_set_handler(&svq->svq_kick, NULL);
+    event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
+
+    /*
+     * event_notifier_set_handler already checks for guest's notifications if
+     * they arrive to the new file descriptor in the switch, so there is no
+     * need to explicitely check for them.
+     */
+    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
+
+    if (event_notifier_test_and_clear(&tmp)) {
+        event_notifier_set(&svq->hdev_kick);
+    }
+}
+
 /*
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 10/26] vhost: Add Shadow VirtQueue kick forwarding capabilities
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (8 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 09/26] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ Eugenio Pérez
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

At this mode no buffer forwarding will be performed in SVQ mode: Qemu
will just forward the guest's kicks to the device.

Also, host notifiers must be disabled at SVQ start, and they will not
start if SVQ has been enabled when device is stopped. This will be
addressed in next patches.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 ++
 hw/virtio/vhost-shadow-virtqueue.c | 77 +++++++++++++++++++++++++++---
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index a19eede089..30ab9643b9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -18,6 +18,10 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
+void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
+                     VhostShadowVirtqueue *svq, int svq_kick_fd);
+void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
+                    VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 513d7f2782..fda60d11db 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -40,18 +40,36 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+/* Forward guest notifications */
+static void vhost_handle_guest_kick(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             svq_kick);
+
+    if (unlikely(!event_notifier_test_and_clear(n))) {
+        return;
+    }
+
+    event_notifier_set(&svq->hdev_kick);
+}
+
 /**
- * Set a new file descriptor for the guest to kick SVQ and notify for avail
+ * Convenience function to set guest to SVQ kick fd
  *
- * @svq          The svq
- * @svq_kick_fd  The new svq kick fd
+ * @svq         The shadow VirtQueue
+ * @svq_kick_fd The guest to SVQ kick fd
+ * @check_old   Check old file descriptor for pending notifications
  */
-void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+static void vhost_svq_set_svq_kick_fd_internal(VhostShadowVirtqueue *svq,
+                                               int svq_kick_fd,
+                                               bool check_old)
 {
     EventNotifier tmp;
 
-    event_notifier_set_handler(&svq->svq_kick, NULL);
-    event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
+    if (check_old) {
+        event_notifier_set_handler(&svq->svq_kick, NULL);
+        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
+    }
 
     /*
      * event_notifier_set_handler already checks for guest's notifications if
@@ -59,12 +77,57 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
      * need to explicitely check for them.
      */
     event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
+    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
 
-    if (event_notifier_test_and_clear(&tmp)) {
+    /*
+     * !check_old means that we are starting SVQ, taking the descriptor from
+     * vhost-vdpa device. This means that we can't trust old file descriptor
+     * pending notifications, since they could have been swallowed by kernel
+     * vhost or paused device. So let it enabled, and qemu event loop will call
+     * us to handle guest avail ring when SVQ is ready.
+     */
+    if (!check_old || event_notifier_test_and_clear(&tmp)) {
         event_notifier_set(&svq->hdev_kick);
     }
 }
 
+/**
+ * Set a new file descriptor for the guest to kick SVQ and notify for avail
+ *
+ * @svq          The svq
+ * @svq_kick_fd  The svq kick fd
+ *
+ * Note that SVQ will never close the old file descriptor.
+ */
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+    vhost_svq_set_svq_kick_fd_internal(svq, svq_kick_fd, true);
+}
+
+/*
+ * Start shadow virtqueue operation.
+ * @dev vhost device
+ * @hidx vhost virtqueue index
+ * @svq Shadow Virtqueue
+ */
+void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
+                     VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+    vhost_svq_set_svq_kick_fd_internal(svq, svq_kick_fd, false);
+}
+
+/*
+ * Stop shadow virtqueue operation.
+ * @dev vhost device
+ * @idx vhost queue index
+ * @svq Shadow Virtqueue
+ */
+void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
+                    VhostShadowVirtqueue *svq)
+{
+    event_notifier_set_handler(&svq->svq_kick, NULL);
+}
+
 /*
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (9 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 10/26] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  7:54     ` Jason Wang
  2021-10-29 18:35 ` [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

If device supports host notifiers, this makes one jump less (kernel) to
deliver SVQ notifications to it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 30ab9643b9..eb0a54f954 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
+void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
+
 void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
                      VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index fda60d11db..e3dcc039b6 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
      * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
      */
     EventNotifier svq_kick;
+
+    /* Device's host notifier memory region. NULL means no region */
+    void *host_notifier_mr;
+
+    /* Virtio queue shadowing */
+    VirtQueue *vq;
 } VhostShadowVirtqueue;
 
 /**
@@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
         return;
     }
 
-    event_notifier_set(&svq->hdev_kick);
+    if (svq->host_notifier_mr) {
+        uint16_t *mr = svq->host_notifier_mr;
+        *mr = virtio_get_queue_index(svq->vq);
+    } else {
+        event_notifier_set(&svq->hdev_kick);
+    }
+}
+
+/*
+ * Set the device's memory region notifier. addr = NULL clear it.
+ */
+void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
+{
+    svq->host_notifier_mr = addr;
 }
 
 /**
@@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
  */
 VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
 {
+    int vq_idx = dev->vq_index + idx;
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
 
@@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
         goto err_init_hdev_call;
     }
 
+    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (10 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  5:36     ` Jason Wang
  2021-10-29 18:35 ` [RFC PATCH v5 13/26] Add vhost_svq_get_svq_call_notifier Eugenio Pérez
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

At this mode no buffer forwarding will be performed in SVQ mode: Qemu
just forward the guest's kicks to the device.

Shadow virtqueue notifications forwarding is disabled when vhost_dev
stops, so code flow follows usual cleanup.

Also, host notifiers must be disabled at SVQ start, and they will not
start if SVQ has been enabled when device is stopped. This is trivial
to address, but it is left out for simplicity at this moment.

This is an intermediate step before introduce the full SVQ mode, useful
to test if the device is playing well with notifications forwarding.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json                  |   5 +-
 include/hw/virtio/vhost-vdpa.h |   6 ++
 hw/virtio/vhost-vdpa.c         | 183 ++++++++++++++++++++++++++++++++-
 net/vhost-vdpa.c               |  24 ++++-
 4 files changed, 210 insertions(+), 8 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index b191b6787b..fca2f6ebca 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -84,12 +84,13 @@
 #
 # Use vhost shadow virtqueue.
 #
+# SVQ can just forward notifications between the device and the guest at this
+# moment. This will expand in future changes.
+#
 # @name: the device name of the VirtIO device
 #
 # @set: true to use the alternate shadow VQ notifications
 #
-# Returns: Always error, since SVQ is not implemented at the moment.
-#
 # Since: 6.2
 #
 # Example:
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index c79a21c3c8..6d60092c96 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -12,6 +12,8 @@
 #ifndef HW_VIRTIO_VHOST_VDPA_H
 #define HW_VIRTIO_VHOST_VDPA_H
 
+#include <gmodule.h>
+
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -27,9 +29,13 @@ typedef struct vhost_vdpa {
     bool iotlb_batch_begin_sent;
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
+    bool shadow_vqs_enabled;
+    GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     int kick_fd[VIRTIO_QUEUE_MAX];
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
+void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index e6ee227385..c388705e73 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -17,12 +17,14 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio-net.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "exec/address-spaces.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
 #include "trace.h"
 #include "qemu-common.h"
+#include "qapi/error.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -326,6 +328,16 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
     return v->index != 0;
 }
 
+/**
+ * Adaptor function to free shadow virtqueue through gpointer
+ *
+ * @svq   The Shadow Virtqueue
+ */
+static void vhost_psvq_free(gpointer svq)
+{
+    vhost_svq_free(svq);
+}
+
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 {
     struct vhost_vdpa *v;
@@ -337,6 +349,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     dev->opaque =  opaque ;
     v->listener = vhost_vdpa_memory_listener;
     v->msg_type = VHOST_IOTLB_MSG_V2;
+    v->shadow_vqs = g_ptr_array_new_full(dev->nvqs, vhost_psvq_free);
 
     vhost_vdpa_get_iova_range(v);
 
@@ -361,7 +374,13 @@ static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
     n = &v->notifier[queue_index];
 
     if (n->addr) {
-        virtio_queue_set_host_notifier_mr(vdev, queue_index, &n->mr, false);
+        if (v->shadow_vqs_enabled) {
+            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                          queue_index);
+            vhost_svq_set_host_mr_notifier(svq, NULL);
+        } else {
+            virtio_queue_set_host_notifier_mr(vdev, queue_index, &n->mr, false);
+        }
         object_unparent(OBJECT(&n->mr));
         munmap(n->addr, page_size);
         n->addr = NULL;
@@ -403,7 +422,12 @@ static int vhost_vdpa_host_notifier_init(struct vhost_dev *dev, int queue_index)
                                       page_size, addr);
     g_free(name);
 
-    if (virtio_queue_set_host_notifier_mr(vdev, queue_index, &n->mr, true)) {
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                      queue_index);
+        vhost_svq_set_host_mr_notifier(svq, addr);
+    } else if (virtio_queue_set_host_notifier_mr(vdev, queue_index, &n->mr,
+                                                 true)) {
         munmap(addr, page_size);
         goto err;
     }
@@ -432,6 +456,17 @@ err:
     return;
 }
 
+static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    size_t idx;
+
+    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
+        vhost_svq_stop(dev, idx, g_ptr_array_index(v->shadow_vqs, idx));
+    }
+    g_ptr_array_free(v->shadow_vqs, true);
+}
+
 static int vhost_vdpa_cleanup(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v;
@@ -440,6 +475,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     trace_vhost_vdpa_cleanup(dev, v);
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     memory_listener_unregister(&v->listener);
+    vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
     return 0;
@@ -699,16 +735,27 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
     return ret;
 }
 
+static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
+                                         struct vhost_vring_file *file)
+{
+    trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
+}
+
 static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
     struct vhost_vdpa *v = dev->opaque;
     int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
 
-    trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
-
     v->kick_fd[vdpa_idx] = file->fd;
-    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+        vhost_svq_set_svq_kick_fd(svq, file->fd);
+        return 0;
+    } else {
+        return vhost_vdpa_set_vring_dev_kick(dev, file);
+    }
 }
 
 static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
@@ -755,6 +802,132 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
     return true;
 }
 
+/*
+ * Start or stop a shadow virtqueue in a vdpa device
+ *
+ * @dev Vhost device
+ * @idx Vhost device model queue index
+ * @svq_mode Shadow virtqueue mode
+ * @errp Error if any
+ *
+ * The function will not fall back previous values to vhost-vdpa device, so in
+ * case of a failure setting again the device properties calling this function
+ * with the negated svq_mode is needed.
+ */
+static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
+                                    bool svq_mode, Error **errp)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
+    VhostVDPAHostNotifier *n = &v->notifier[idx];
+    unsigned vq_index = idx + dev->vq_index;
+    struct vhost_vring_file vhost_kick_file = {
+        .index = vq_index,
+    };
+    int r;
+
+    if (svq_mode) {
+        const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
+
+        if (n->addr) {
+            r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
+                                                  false);
+
+            /*
+             * vhost_vdpa_host_notifier_init already validated as a proper
+             * host notifier memory region
+             */
+            assert(r == 0);
+            vhost_svq_set_host_mr_notifier(svq, n->addr);
+        }
+        vhost_svq_start(dev, idx, svq, v->kick_fd[idx]);
+
+        vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
+    } else {
+        vhost_svq_stop(dev, idx, svq);
+
+        if (n->addr) {
+            r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
+                                                  true);
+            /*
+             * vhost_vdpa_host_notifier_init already validated as a proper
+             * host notifier memory region
+             */
+            assert(r == 0);
+        }
+        vhost_kick_file.fd = v->kick_fd[idx];
+    }
+
+    r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
+        return false;
+    }
+
+    return true;
+}
+
+/**
+ * Enable or disable shadow virtqueue in a vhost vdpa device.
+ *
+ * This function is idempotent, to call it many times with the same value for
+ * enable_svq will simply return success.
+ *
+ * @v       Vhost vdpa device
+ * @enable  True to set SVQ mode
+ * @errp    Error pointer
+ */
+void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
+{
+    struct vhost_dev *hdev = v->dev;
+    unsigned n;
+    ERRP_GUARD();
+
+    if (enable == v->shadow_vqs_enabled) {
+        return;
+    }
+
+    if (enable) {
+        /* Allocate resources */
+        assert(v->shadow_vqs->len == 0);
+        for (n = 0; n < hdev->nvqs; ++n) {
+            VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
+            bool ok;
+
+            if (unlikely(!svq)) {
+                error_setg(errp, "Cannot create svq");
+                enable = false;
+                goto err_svq_new;
+            }
+            g_ptr_array_add(v->shadow_vqs, svq);
+
+            ok = vhost_vdpa_svq_start_vq(hdev, n, true, errp);
+            if (unlikely(!ok)) {
+                /* Free still not started svqs, and go with disable path */
+                g_ptr_array_set_size(v->shadow_vqs, n);
+                enable = false;
+                break;
+            }
+        }
+    }
+
+    v->shadow_vqs_enabled = enable;
+
+    if (!enable) {
+        /* Disable all queues or clean up failed start */
+        for (n = 0; n < v->shadow_vqs->len; ++n) {
+            vhost_vdpa_svq_start_vq(hdev, n, false, *errp ? NULL : errp);
+        }
+
+    }
+
+err_svq_new:
+    if (!enable) {
+        /* Resources cleanup */
+        g_ptr_array_set_size(v->shadow_vqs, 0);
+    }
+}
+
 const VhostOps vdpa_ops = {
         .backend_type = VHOST_BACKEND_TYPE_VDPA,
         .vhost_backend_init = vhost_vdpa_init,
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3b360da27d..325971d8da 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -305,5 +305,27 @@ err:
 
 void qmp_x_vhost_set_shadow_vq(const char *name, bool set, Error **errp)
 {
-    error_setg(errp, "Shadow virtqueue still not implemented");
+    NetClientState *ncs;
+    int queues;
+    ERRP_GUARD();
+
+    queues = qemu_find_net_clients_except(name, &ncs,
+                                          NET_CLIENT_DRIVER_NIC, 1);
+
+    if (!queues) {
+        error_setg(errp, "Device not found");
+    } else if (ncs->info->type != NET_CLIENT_DRIVER_VHOST_VDPA) {
+        error_setg(errp, "Device type is not vdpa");
+    } else if (queues > 1) {
+        error_setg(errp, "Device has control virtqueue");
+    } else {
+        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, ncs);
+        struct vhost_vdpa *v = &s->vhost_vdpa;
+
+        vhost_vdpa_enable_svq(v, set, errp);
+    }
+
+    if (*errp) {
+        error_prepend(errp, "Can't set shadow vq on %s: ", name);
+    }
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 13/26] Add vhost_svq_get_svq_call_notifier
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (11 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 14/26] Add vhost_svq_set_guest_call_notifier Eugenio Pérez
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This allows vhost-vdpa device to retrieve device -> svq call eventfd.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index eb0a54f954..9e089edb17 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
+const EventNotifier *vhost_svq_get_svq_call_notifier(
+                                              const VhostShadowVirtqueue *svq);
 void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
 
 void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index e3dcc039b6..7acac1be87 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -72,6 +72,18 @@ void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
     svq->host_notifier_mr = addr;
 }
 
+/*
+ * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
+ * exists pending used buffers.
+ *
+ * @svq Shadow Virtqueue
+ */
+const EventNotifier *vhost_svq_get_svq_call_notifier(
+                                               const VhostShadowVirtqueue *svq)
+{
+    return &svq->hdev_call;
+}
+
 /**
  * Convenience function to set guest to SVQ kick fd
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 14/26] Add vhost_svq_set_guest_call_notifier
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (12 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 13/26] Add vhost_svq_get_svq_call_notifier Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 15/26] vdpa: Save call_fd in vhost-vdpa Eugenio Pérez
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This allows vhost-vdpa device to set SVQ -> guest notifier to SVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 9e089edb17..607ec6e5eb 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -16,6 +16,7 @@
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
 const EventNotifier *vhost_svq_get_svq_call_notifier(
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 7acac1be87..6535eefccd 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -30,6 +30,9 @@ typedef struct VhostShadowVirtqueue {
      */
     EventNotifier svq_kick;
 
+    /* Guest's call notifier, where SVQ calls guest. */
+    EventNotifier svq_call;
+
     /* Device's host notifier memory region. NULL means no region */
     void *host_notifier_mr;
 
@@ -84,6 +87,19 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
     return &svq->hdev_call;
 }
 
+/**
+ * Set the call notifier for the SVQ to call the guest
+ *
+ * @svq Shadow virtqueue
+ * @call_fd call notifier
+ *
+ * Called on BQL context.
+ */
+void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
+{
+    event_notifier_init_fd(&svq->svq_call, call_fd);
+}
+
 /**
  * Convenience function to set guest to SVQ kick fd
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 15/26] vdpa: Save call_fd in vhost-vdpa
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (13 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 14/26] Add vhost_svq_set_guest_call_notifier Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 16/26] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

We need to know it to switch to Shadow VirtQueue.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h | 2 ++
 hw/virtio/vhost-vdpa.c         | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 6d60092c96..2f57b17208 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -33,6 +33,8 @@ typedef struct vhost_vdpa {
     GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     int kick_fd[VIRTIO_QUEUE_MAX];
+    /* File descriptor the device uses to call VM/SVQ */
+    int call_fd[VIRTIO_QUEUE_MAX];
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c388705e73..64f71bd51b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -761,7 +761,12 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
 static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
+    struct vhost_vdpa *v = dev->opaque;
+    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
+
     trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
+
+    v->call_fd[vdpa_idx] = file->fd;
     return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 16/26] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (14 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 15/26] vdpa: Save call_fd in vhost-vdpa Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 17/26] vhost: Route host->guest notification through shadow virtqueue Eugenio Pérez
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 64f71bd51b..89d77f3452 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -758,16 +758,27 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
     }
 }
 
+static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
+                                         struct vhost_vring_file *file)
+{
+    trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
+}
+
 static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
     struct vhost_vdpa *v = dev->opaque;
     int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
 
-    trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
-
     v->call_fd[vdpa_idx] = file->fd;
-    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+        vhost_svq_set_guest_call_notifier(svq, file->fd);
+        return 0;
+    } else {
+        return vhost_vdpa_set_vring_dev_call(dev, file);
+    }
 }
 
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 17/26] vhost: Route host->guest notification through shadow virtqueue
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (15 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 16/26] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 18/26] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This will make qemu aware of the device used buffers, allowing it to
write the guest memory with its contents if needed.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 15 +++++++++++++++
 hw/virtio/vhost-vdpa.c             | 13 +++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 6535eefccd..77916d2fed 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -75,6 +75,19 @@ void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
     svq->host_notifier_mr = addr;
 }
 
+/* Forward vhost notifications */
+static void vhost_svq_handle_call(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             hdev_call);
+
+    if (unlikely(!event_notifier_test_and_clear(n))) {
+        return;
+    }
+
+    event_notifier_set(&svq->svq_call);
+}
+
 /*
  * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
  * exists pending used buffers.
@@ -200,6 +213,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
     }
 
     svq->vq = virtio_get_queue(dev->vdev, vq_idx);
+    event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
@@ -215,6 +229,7 @@ err_init_hdev_kick:
 void vhost_svq_free(VhostShadowVirtqueue *vq)
 {
     event_notifier_cleanup(&vq->hdev_kick);
+    event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
     g_free(vq);
 }
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 89d77f3452..c2580693b3 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -840,10 +840,14 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
     struct vhost_vring_file vhost_kick_file = {
         .index = vq_index,
     };
+    struct vhost_vring_file vhost_call_file = {
+        .index = idx + dev->vq_index,
+    };
     int r;
 
     if (svq_mode) {
         const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
+        const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
 
         if (n->addr) {
             r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
@@ -856,9 +860,12 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
             assert(r == 0);
             vhost_svq_set_host_mr_notifier(svq, n->addr);
         }
+
+        vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
         vhost_svq_start(dev, idx, svq, v->kick_fd[idx]);
 
         vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
+        vhost_call_file.fd = event_notifier_get_fd(vhost_call);
     } else {
         vhost_svq_stop(dev, idx, svq);
 
@@ -872,6 +879,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
             assert(r == 0);
         }
         vhost_kick_file.fd = v->kick_fd[idx];
+        vhost_call_file.fd = v->call_fd[idx];
     }
 
     r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
@@ -879,6 +887,11 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
         error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
         return false;
     }
+    r = vhost_vdpa_set_vring_dev_call(dev, &vhost_call_file);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "vhost_vdpa_set_vring_call failed");
+        return false;
+    }
 
     return true;
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 18/26] virtio: Add vhost_shadow_vq_get_vring_addr
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (16 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 17/26] vhost: Route host->guest notification through shadow virtqueue Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 19/26] vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it Eugenio Pérez
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

It reports the shadow virtqueue address from qemu virtual address space.

Since this will be different from the guest's vaddr, but device can
access it, SVQ takes special care about its alignment & lack of garbage
data. It assumes that IOMMU will work in host_page_size ranges for
that.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 +++
 hw/virtio/vhost-shadow-virtqueue.c | 51 ++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 607ec6e5eb..ed647d9648 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -22,6 +22,10 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
 const EventNotifier *vhost_svq_get_svq_call_notifier(
                                               const VhostShadowVirtqueue *svq);
 void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+                              struct vhost_vring_addr *addr);
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
 void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
                      VhostShadowVirtqueue *svq, int svq_kick_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 77916d2fed..4a37ed62a8 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -9,12 +9,16 @@
 
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
+#include "standard-headers/linux/vhost_types.h"
 
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
+    /* Shadow vring */
+    struct vring vring;
+
     /* Shadow kick notifier, sent to vhost */
     EventNotifier hdev_kick;
     /* Shadow call notifier, sent to vhost */
@@ -38,6 +42,9 @@ typedef struct VhostShadowVirtqueue {
 
     /* Virtio queue shadowing */
     VirtQueue *vq;
+
+    /* Virtio device */
+    VirtIODevice *vdev;
 } VhostShadowVirtqueue;
 
 /**
@@ -113,6 +120,35 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
     event_notifier_init_fd(&svq->svq_call, call_fd);
 }
 
+/*
+ * Get the shadow vq vring address.
+ * @svq Shadow virtqueue
+ * @addr Destination to store address
+ */
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+                              struct vhost_vring_addr *addr)
+{
+    addr->desc_user_addr = (uint64_t)svq->vring.desc;
+    addr->avail_user_addr = (uint64_t)svq->vring.avail;
+    addr->used_user_addr = (uint64_t)svq->vring.used;
+}
+
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
+{
+    uint16_t vq_idx = virtio_get_queue_index(svq->vq);
+    size_t desc_size = virtio_queue_get_desc_size(svq->vdev, vq_idx);
+    size_t avail_size = virtio_queue_get_avail_size(svq->vdev, vq_idx);
+
+    return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size);
+}
+
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
+{
+    uint16_t vq_idx = virtio_get_queue_index(svq->vq);
+    size_t used_size = virtio_queue_get_used_size(svq->vdev, vq_idx);
+    return ROUND_UP(used_size, qemu_real_host_page_size);
+}
+
 /**
  * Convenience function to set guest to SVQ kick fd
  *
@@ -195,6 +231,10 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
 VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
 {
     int vq_idx = dev->vq_index + idx;
+    unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
+    size_t desc_size = virtio_queue_get_desc_size(dev->vdev, vq_idx);
+    size_t driver_size;
+    size_t device_size;
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
 
@@ -213,6 +253,15 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
     }
 
     svq->vq = virtio_get_queue(dev->vdev, vq_idx);
+    svq->vdev = dev->vdev;
+    driver_size = vhost_svq_driver_area_size(svq);
+    device_size = vhost_svq_device_area_size(svq);
+    svq->vring.num = num;
+    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
+    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
+    memset(svq->vring.desc, 0, driver_size);
+    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
+    memset(svq->vring.used, 0, device_size);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
@@ -231,5 +280,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
     event_notifier_cleanup(&vq->hdev_kick);
     event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
+    qemu_vfree(vq->vring.desc);
+    qemu_vfree(vq->vring.used);
     g_free(vq);
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 19/26] vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (17 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 18/26] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 20/26] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This is needed to enable or disable SVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c2580693b3..fc8396ba8a 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -521,6 +521,9 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
     if (vhost_vdpa_one_time_request(dev)) {
         return 0;
     }
+    if (dev->features & BIT_ULL(VIRTIO_F_QUEUE_STATE)) {
+        features |= BIT_ULL(VIRTIO_F_QUEUE_STATE);
+    }
 
     trace_vhost_vdpa_set_features(dev, features);
     ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 20/26] vhost: Add vhost_svq_valid_device_features to shadow vq
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (18 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 19/26] vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This allows it to test if the guest has aknowledge an invalid transport
feature for SVQ. This will include packed vq layout, invalid descriptors
or event idx at the moment we start forwarding buffers.

We don't check for device features here since they will be re-negotiated
again. This allows SVQ to both use more advanced features of the device
when they are available and the guest is not capable of run them, and to
make SVQ compatible with future transport features.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index ed647d9648..946b2c6295 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,8 @@
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
+bool vhost_svq_valid_device_features(uint64_t *features);
+
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 4a37ed62a8..6e0508a231 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -56,6 +56,12 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+/* If the device is using some of these, SVQ cannot communicate */
+bool vhost_svq_valid_device_features(uint64_t *dev_features)
+{
+    return true;
+}
+
 /* Forward guest notifications */
 static void vhost_handle_guest_kick(EventNotifier *n)
 {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (19 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 20/26] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  5:25     ` Jason Wang
  2021-10-29 18:35 ` [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This allows it to test if the guest has aknowledge an invalid transport
feature for SVQ. This will include packed vq layout or event_idx,
where VirtIO device needs help from SVQ.

There is not needed at this moment, but since SVQ will not re-negotiate
features again with the guest, a failure in acknowledge them is fatal
for SVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 1 +
 hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 946b2c6295..ac55588009 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -16,6 +16,7 @@
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
 bool vhost_svq_valid_device_features(uint64_t *features);
+bool vhost_svq_valid_guest_features(uint64_t *features);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 6e0508a231..cb9ffcb015 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
     return true;
 }
 
+/* If the guest is using some of these, SVQ cannot communicate */
+bool vhost_svq_valid_guest_features(uint64_t *guest_features)
+{
+    return true;
+}
+
 /* Forward guest notifications */
 static void vhost_handle_guest_kick(EventNotifier *n)
 {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (20 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  7:59     ` Jason Wang
  2021-10-29 18:35 ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Eugenio Pérez
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Initial version of shadow virtqueue that actually forward buffers. There
are no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.

For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.

It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review. Later commits add simpler
ones.

However to forwarding buffers have some particular pieces: One of the
most unexpected ones is that a guest's buffer can expand through more
than one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it checking for this condition at both guest's kicks
and device's calls. The code may be more elegant in the future if SVQ
code runs in its own iocontext.

Note that vhost_vdpa_get_vq_state trust the device to write its status
to used_idx at pause(), finishing all in-flight descriptors. This may
not be enough for complex devices, but other development like usage of
inflight_fd on top of this solution may extend the usage in the future.

In particular, SVQ trust it to recover guest's virtqueue at start, and
to mark as used the latest descriptors used by the device in the
meantime.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json                      |   5 +-
 hw/virtio/vhost-shadow-virtqueue.c | 400 +++++++++++++++++++++++++++--
 hw/virtio/vhost-vdpa.c             | 144 ++++++++++-
 3 files changed, 521 insertions(+), 28 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index fca2f6ebca..1c6d3b2179 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -84,12 +84,9 @@
 #
 # Use vhost shadow virtqueue.
 #
-# SVQ can just forward notifications between the device and the guest at this
-# moment. This will expand in future changes.
-#
 # @name: the device name of the VirtIO device
 #
-# @set: true to use the alternate shadow VQ notifications
+# @set: true to use the alternate shadow VQ
 #
 # Since: 6.2
 #
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index cb9ffcb015..ad1b2342be 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -9,6 +9,9 @@
 
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/virtio-access.h"
+
 #include "standard-headers/linux/vhost_types.h"
 
 #include "qemu/error-report.h"
@@ -45,6 +48,27 @@ typedef struct VhostShadowVirtqueue {
 
     /* Virtio device */
     VirtIODevice *vdev;
+
+    /* Map for returning guest's descriptors */
+    VirtQueueElement **ring_id_maps;
+
+    /* Next VirtQueue element that guest made available */
+    VirtQueueElement *next_guest_avail_elem;
+
+    /* Next head to expose to device */
+    uint16_t avail_idx_shadow;
+
+    /* Next free descriptor */
+    uint16_t free_head;
+
+    /* Last seen used idx */
+    uint16_t shadow_used_idx;
+
+    /* Next head to consume from device */
+    uint16_t last_used_idx;
+
+    /* Cache for the exposed notification flag */
+    bool notification;
 } VhostShadowVirtqueue;
 
 /**
@@ -56,25 +80,174 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
-/* If the device is using some of these, SVQ cannot communicate */
+/**
+ * VirtIO transport device feature acknowledge
+ *
+ * @dev_features  The device features. If success, the acknowledged features.
+ *
+ * Returns true if SVQ can go with a subset of these, false otherwise.
+ */
 bool vhost_svq_valid_device_features(uint64_t *dev_features)
 {
-    return true;
+    uint64_t b;
+    bool r = true;
+
+    for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
+        switch (b) {
+        case VIRTIO_F_NOTIFY_ON_EMPTY:
+        case VIRTIO_F_ANY_LAYOUT:
+            continue;
+
+        case VIRTIO_F_ACCESS_PLATFORM:
+            /* SVQ does not know how to translate addresses */
+            if (*dev_features & BIT_ULL(b)) {
+                clear_bit(b, dev_features);
+                r = false;
+            }
+            break;
+
+        case VIRTIO_F_VERSION_1:
+            /* SVQ trust that guest vring is little endian */
+            if (!(*dev_features & BIT_ULL(b))) {
+                set_bit(b, dev_features);
+                r = false;
+            }
+            continue;
+
+        default:
+            if (*dev_features & BIT_ULL(b)) {
+                clear_bit(b, dev_features);
+            }
+        }
+    }
+
+    return r;
 }
 
-/* If the guest is using some of these, SVQ cannot communicate */
+/**
+ * Check of guest's acknowledge features.
+ *
+ * @guest_features  The guest's acknowledged features
+ *
+ * Returns true if SVQ can handle them, false otherwise.
+ */
 bool vhost_svq_valid_guest_features(uint64_t *guest_features)
 {
-    return true;
+    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
+
+    /* These transport features are handled by VirtQueue */
+    static const uint64_t valid = (BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) |
+                                   BIT_ULL(VIRTIO_F_VERSION_1));
+
+    /* We are only interested in transport-related feature bits */
+    uint64_t guest_transport_features = (*guest_features) & transport;
+
+    *guest_features &= (valid | ~transport);
+    return !(guest_transport_features & (transport ^ valid));
 }
 
-/* Forward guest notifications */
-static void vhost_handle_guest_kick(EventNotifier *n)
+/**
+ * Number of descriptors that SVQ can make available from the guest.
+ *
+ * @svq   The svq
+ */
+static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
 {
-    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
-                                             svq_kick);
+    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
+}
 
-    if (unlikely(!event_notifier_test_and_clear(n))) {
+static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
+{
+    uint16_t notification_flag;
+
+    if (svq->notification == enable) {
+        return;
+    }
+
+    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+
+    svq->notification = enable;
+    if (enable) {
+        svq->vring.avail->flags &= ~notification_flag;
+    } else {
+        svq->vring.avail->flags |= notification_flag;
+    }
+}
+
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+                                    const struct iovec *iovec,
+                                    size_t num, bool more_descs, bool write)
+{
+    uint16_t i = svq->free_head, last = svq->free_head;
+    unsigned n;
+    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
+    vring_desc_t *descs = svq->vring.desc;
+
+    if (num == 0) {
+        return;
+    }
+
+    for (n = 0; n < num; n++) {
+        if (more_descs || (n + 1 < num)) {
+            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
+        } else {
+            descs[i].flags = flags;
+        }
+        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
+        descs[i].len = cpu_to_le32(iovec[n].iov_len);
+
+        last = i;
+        i = cpu_to_le16(descs[i].next);
+    }
+
+    svq->free_head = le16_to_cpu(descs[last].next);
+}
+
+static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
+                                    VirtQueueElement *elem)
+{
+    int head;
+    unsigned avail_idx;
+    vring_avail_t *avail = svq->vring.avail;
+
+    head = svq->free_head;
+
+    /* We need some descriptors here */
+    assert(elem->out_num || elem->in_num);
+
+    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
+                            elem->in_num > 0, false);
+    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
+
+    /*
+     * Put entry in available array (but don't update avail->idx until they
+     * do sync).
+     */
+    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
+    avail->ring[avail_idx] = cpu_to_le16(head);
+    svq->avail_idx_shadow++;
+
+    /* Update avail index after the descriptor is wrote */
+    smp_wmb();
+    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
+
+    return head;
+
+}
+
+static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+{
+    unsigned qemu_head = vhost_svq_add_split(svq, elem);
+
+    svq->ring_id_maps[qemu_head] = elem;
+}
+
+static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+{
+    /* We need to expose available array entries before checking used flags */
+    smp_mb();
+    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
         return;
     }
 
@@ -86,25 +259,188 @@ static void vhost_handle_guest_kick(EventNotifier *n)
     }
 }
 
-/*
- * Set the device's memory region notifier. addr = NULL clear it.
+/**
+ * Forward available buffers.
+ *
+ * @svq Shadow VirtQueue
+ *
+ * Note that this function does not guarantee that all guest's available
+ * buffers are available to the device in SVQ avail ring. The guest may have
+ * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
+ * vaddr.
+ *
+ * If that happens, guest's kick notifications will be disabled until device
+ * makes some buffers used.
  */
-void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
+static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
 {
-    svq->host_notifier_mr = addr;
+    /* Clear event notifier */
+    event_notifier_test_and_clear(&svq->svq_kick);
+
+    /* Make available as many buffers as possible */
+    do {
+        if (virtio_queue_get_notification(svq->vq)) {
+            virtio_queue_set_notification(svq->vq, false);
+        }
+
+        while (true) {
+            VirtQueueElement *elem;
+
+            if (svq->next_guest_avail_elem) {
+                elem = g_steal_pointer(&svq->next_guest_avail_elem);
+            } else {
+                elem = virtqueue_pop(svq->vq, sizeof(*elem));
+            }
+
+            if (!elem) {
+                break;
+            }
+
+            if (elem->out_num + elem->in_num >
+                vhost_svq_available_slots(svq)) {
+                /*
+                 * This condition is possible since a contiguous buffer in GPA
+                 * does not imply a contiguous buffer in qemu's VA
+                 * scatter-gather segments. If that happen, the buffer exposed
+                 * to the device needs to be a chain of descriptors at this
+                 * moment.
+                 *
+                 * SVQ cannot hold more available buffers if we are here:
+                 * queue the current guest descriptor and ignore further kicks
+                 * until some elements are used.
+                 */
+                svq->next_guest_avail_elem = elem;
+                return;
+            }
+
+            vhost_svq_add(svq, elem);
+            vhost_svq_kick(svq);
+        }
+
+        virtio_queue_set_notification(svq->vq, true);
+    } while (!virtio_queue_empty(svq->vq));
+}
+
+/**
+ * Handle guest's kick.
+ *
+ * @n guest kick event notifier, the one that guest set to notify svq.
+ */
+static void vhost_handle_guest_kick_notifier(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             svq_kick);
+    vhost_handle_guest_kick(svq);
+}
+
+static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
+{
+    if (svq->last_used_idx != svq->shadow_used_idx) {
+        return true;
+    }
+
+    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
+
+    return svq->last_used_idx != svq->shadow_used_idx;
+}
+
+static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
+{
+    vring_desc_t *descs = svq->vring.desc;
+    const vring_used_t *used = svq->vring.used;
+    vring_used_elem_t used_elem;
+    uint16_t last_used;
+
+    if (!vhost_svq_more_used(svq)) {
+        return NULL;
+    }
+
+    /* Only get used array entries after they have been exposed by dev */
+    smp_rmb();
+    last_used = svq->last_used_idx & (svq->vring.num - 1);
+    used_elem.id = le32_to_cpu(used->ring[last_used].id);
+    used_elem.len = le32_to_cpu(used->ring[last_used].len);
+
+    svq->last_used_idx++;
+    if (unlikely(used_elem.id >= svq->vring.num)) {
+        error_report("Device %s says index %u is used", svq->vdev->name,
+                     used_elem.id);
+        return NULL;
+    }
+
+    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+        error_report(
+            "Device %s says index %u is used, but it was not available",
+            svq->vdev->name, used_elem.id);
+        return NULL;
+    }
+
+    descs[used_elem.id].next = svq->free_head;
+    svq->free_head = used_elem.id;
+
+    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
+    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
 }
 
-/* Forward vhost notifications */
+static void vhost_svq_flush(VhostShadowVirtqueue *svq,
+                            bool check_for_avail_queue)
+{
+    VirtQueue *vq = svq->vq;
+
+    /* Make as many buffers as possible used. */
+    do {
+        unsigned i = 0;
+
+        vhost_svq_set_notification(svq, false);
+        while (true) {
+            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
+            if (!elem) {
+                break;
+            }
+
+            if (unlikely(i >= svq->vring.num)) {
+                virtio_error(svq->vdev,
+                         "More than %u used buffers obtained in a %u size SVQ",
+                         i, svq->vring.num);
+                virtqueue_fill(vq, elem, elem->len, i);
+                virtqueue_flush(vq, i);
+                i = 0;
+            }
+            virtqueue_fill(vq, elem, elem->len, i++);
+        }
+
+        virtqueue_flush(vq, i);
+        event_notifier_set(&svq->svq_call);
+
+        if (check_for_avail_queue && svq->next_guest_avail_elem) {
+            /*
+             * Avail ring was full when vhost_svq_flush was called, so it's a
+             * good moment to make more descriptors available if possible
+             */
+            vhost_handle_guest_kick(svq);
+        }
+
+        vhost_svq_set_notification(svq, true);
+    } while (vhost_svq_more_used(svq));
+}
+
+/**
+ * Forward used buffers.
+ *
+ * @n hdev call event notifier, the one that device set to notify svq.
+ *
+ * Note that we are not making any buffers available in the loop, there is no
+ * way that it runs more than virtqueue size times.
+ */
 static void vhost_svq_handle_call(EventNotifier *n)
 {
     VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
                                              hdev_call);
 
-    if (unlikely(!event_notifier_test_and_clear(n))) {
-        return;
-    }
+    /* Clear event notifier */
+    event_notifier_test_and_clear(n);
 
-    event_notifier_set(&svq->svq_call);
+    vhost_svq_flush(svq, true);
 }
 
 /*
@@ -132,6 +468,14 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
     event_notifier_init_fd(&svq->svq_call, call_fd);
 }
 
+/*
+ * Set the device's memory region notifier. addr = NULL clear it.
+ */
+void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
+{
+    svq->host_notifier_mr = addr;
+}
+
 /*
  * Get the shadow vq vring address.
  * @svq Shadow virtqueue
@@ -185,7 +529,8 @@ static void vhost_svq_set_svq_kick_fd_internal(VhostShadowVirtqueue *svq,
      * need to explicitely check for them.
      */
     event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
-    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
+    event_notifier_set_handler(&svq->svq_kick,
+                               vhost_handle_guest_kick_notifier);
 
     /*
      * !check_old means that we are starting SVQ, taking the descriptor from
@@ -233,7 +578,16 @@ void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
 void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
                     VhostShadowVirtqueue *svq)
 {
+    unsigned i;
     event_notifier_set_handler(&svq->svq_kick, NULL);
+    vhost_svq_flush(svq, false);
+
+    for (i = 0; i < svq->vring.num; ++i) {
+        g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
+        if (elem) {
+            virtqueue_detach_element(svq->vq, elem, elem->len);
+        }
+    }
 }
 
 /*
@@ -248,7 +602,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
     size_t driver_size;
     size_t device_size;
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
-    int r;
+    int r, i;
 
     r = event_notifier_init(&svq->hdev_kick, 0);
     if (r != 0) {
@@ -274,6 +628,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
     memset(svq->vring.used, 0, device_size);
+    for (i = 0; i < num - 1; i++) {
+        svq->vring.desc[i].next = cpu_to_le16(i + 1);
+    }
+
+    svq->ring_id_maps = g_new0(VirtQueueElement *, num);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
@@ -292,6 +651,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
     event_notifier_cleanup(&vq->hdev_kick);
     event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
+    g_free(vq->ring_id_maps);
     qemu_vfree(vq->vring.desc);
     qemu_vfree(vq->vring.used);
     g_free(vq);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index fc8396ba8a..e1c55e43e7 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -19,6 +19,7 @@
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "exec/address-spaces.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
@@ -821,6 +822,19 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
     return true;
 }
 
+static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
+{
+    int r;
+    uint8_t status;
+
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
+    do {
+        r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
+    } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
+
+    return 0;
+}
+
 /*
  * Start or stop a shadow virtqueue in a vdpa device
  *
@@ -844,7 +858,14 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
         .index = vq_index,
     };
     struct vhost_vring_file vhost_call_file = {
-        .index = idx + dev->vq_index,
+        .index = vq_index,
+    };
+    struct vhost_vring_addr addr = {
+        .index = vq_index,
+    };
+    struct vhost_vring_state num = {
+        .index = vq_index,
+        .num = virtio_queue_get_num(dev->vdev, vq_index),
     };
     int r;
 
@@ -852,6 +873,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
         const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
         const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
 
+        vhost_svq_get_vring_addr(svq, &addr);
         if (n->addr) {
             r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
                                                   false);
@@ -870,8 +892,20 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
         vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
         vhost_call_file.fd = event_notifier_get_fd(vhost_call);
     } else {
+        struct vhost_vring_state state = {
+            .index = vq_index,
+        };
+
         vhost_svq_stop(dev, idx, svq);
 
+        state.num = virtio_queue_get_last_avail_idx(dev->vdev, idx);
+        r = vhost_vdpa_set_vring_base(dev, &state);
+        if (unlikely(r)) {
+            error_setg_errno(errp, -r, "vhost_set_vring_base failed");
+            return false;
+        }
+
+        vhost_vdpa_vq_get_addr(dev, &addr, &dev->vqs[idx]);
         if (n->addr) {
             r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
                                                   true);
@@ -885,6 +919,17 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
         vhost_call_file.fd = v->call_fd[idx];
     }
 
+    r = vhost_vdpa_set_vring_addr(dev, &addr);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "vhost_set_vring_addr failed");
+        return false;
+    }
+    r = vhost_vdpa_set_vring_num(dev, &num);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "vhost_set_vring_num failed");
+        return false;
+    }
+
     r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
     if (unlikely(r)) {
         error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
@@ -899,6 +944,50 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
     return true;
 }
 
+static void vhost_vdpa_get_vq_state(struct vhost_dev *dev, unsigned idx)
+{
+    struct VirtIODevice *vdev = dev->vdev;
+
+    virtio_queue_restore_last_avail_idx(vdev, idx);
+    virtio_queue_invalidate_signalled_used(vdev, idx);
+    virtio_queue_update_used_idx(vdev, idx);
+}
+
+/**
+ * Validate device and guest features against SVQ capabilities
+ *
+ * @hdev  The vhost device @errp  The error
+ *
+ * @hdev          The hdev
+ * @svq_features  The subset of device features that svq supports.
+ * @errp          The errp
+ */
+static bool vhost_vdpa_valid_features(struct vhost_dev *hdev,
+                                      uint64_t *svq_features,
+                                      Error **errp)
+{
+    uint64_t acked_features = hdev->acked_features;
+    bool ok;
+
+    ok = vhost_svq_valid_device_features(svq_features);
+    if (unlikely(!ok)) {
+        error_setg(errp,
+            "Unexpected device feature flags, offered: %"PRIx64", ok: %"PRIx64,
+            hdev->features, *svq_features);
+        return false;
+    }
+
+    ok = vhost_svq_valid_guest_features(&acked_features);
+    if (unlikely(!ok)) {
+        error_setg(errp,
+            "Invalid guest acked feature flag, acked:%"PRIx64", ok: %"PRIx64,
+            hdev->acked_features, acked_features);
+        return false;
+    }
+
+    return true;
+}
+
 /**
  * Enable or disable shadow virtqueue in a vhost vdpa device.
  *
@@ -913,6 +1002,9 @@ void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
 {
     struct vhost_dev *hdev = v->dev;
     unsigned n;
+    int r;
+    uint64_t svq_features = hdev->features | BIT_ULL(VIRTIO_F_IOMMU_PLATFORM) |
+                            BIT_ULL(VIRTIO_F_QUEUE_STATE);
     ERRP_GUARD();
 
     if (enable == v->shadow_vqs_enabled) {
@@ -920,20 +1012,43 @@ void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
     }
 
     if (enable) {
+        bool ok = vhost_vdpa_valid_features(hdev, &svq_features, errp);
+        if (unlikely(!ok)) {
+            return;
+        }
+
         /* Allocate resources */
         assert(v->shadow_vqs->len == 0);
         for (n = 0; n < hdev->nvqs; ++n) {
             VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
-            bool ok;
-
             if (unlikely(!svq)) {
                 error_setg(errp, "Cannot create svq");
                 enable = false;
                 goto err_svq_new;
             }
             g_ptr_array_add(v->shadow_vqs, svq);
+        }
+    }
+
+    r = vhost_vdpa_vring_pause(hdev);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "Cannot pause device");
+        enable = !enable;
+        goto err_pause;
+    }
+
+    for (n = 0; n < v->shadow_vqs->len; ++n) {
+        vhost_vdpa_get_vq_state(hdev, hdev->vq_index + n);
+    }
+
+    /* Reset device so it can be configured */
+    vhost_vdpa_dev_start(hdev, false);
+
+    if (enable) {
+        int r;
 
-            ok = vhost_vdpa_svq_start_vq(hdev, n, true, errp);
+        for (n = 0; n < v->shadow_vqs->len; ++n) {
+            bool ok = vhost_vdpa_svq_start_vq(hdev, n, true, errp);
             if (unlikely(!ok)) {
                 /* Free still not started svqs, and go with disable path */
                 g_ptr_array_set_size(v->shadow_vqs, n);
@@ -941,18 +1056,39 @@ void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
                 break;
             }
         }
+
+        /* Need to ack features to set state in vp_vdpa devices */
+        r = vhost_vdpa_set_features(hdev, svq_features);
+        if (unlikely(r && !(*errp))) {
+            error_setg_errno(errp, -r, "Fail to set guest features");
+
+            /* Go through disable SVQ path */
+            enable = false;
+        }
     }
 
     v->shadow_vqs_enabled = enable;
 
     if (!enable) {
+        r = vhost_vdpa_set_features(hdev, hdev->acked_features |
+                                          BIT_ULL(VIRTIO_F_QUEUE_STATE) |
+                                          BIT_ULL(VIRTIO_F_IOMMU_PLATFORM));
+        if (unlikely(r && (!(*errp)))) {
+            error_setg_errno(errp, -r, "Fail to set guest features");
+        }
+
         /* Disable all queues or clean up failed start */
         for (n = 0; n < v->shadow_vqs->len; ++n) {
             vhost_vdpa_svq_start_vq(hdev, n, false, *errp ? NULL : errp);
         }
+    }
 
+    r = vhost_vdpa_dev_start(hdev, true);
+    if (unlikely(r && !(*errp))) {
+        error_setg_errno(errp, -r, "Fail to start device");
     }
 
+err_pause:
 err_svq_new:
     if (!enable) {
         /* Resources cleanup */
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (21 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-02  6:35     ` Jason Wang
                     ` (2 more replies)
  2021-10-29 18:35 ` [RFC PATCH v5 24/26] vhost: Add VhostIOVATree Eugenio Pérez
                   ` (4 subsequent siblings)
  27 siblings, 3 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This iova tree function allows it to look for a hole in allocated
regions and return a totally new translation for a given translated
address.

It's usage is mainly to allow devices to access qemu address space,
remapping guest's one into a new iova space where qemu can add chunks of
addresses.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/qemu/iova-tree.h |  17 +++++
 util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 156 insertions(+)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8249edd764..33f9b2e13f 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -29,6 +29,7 @@
 #define  IOVA_OK           (0)
 #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
 #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
 
 typedef struct IOVATree IOVATree;
 typedef struct DMAMap {
@@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
  */
 void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
 
+/**
+ * iova_tree_alloc:
+ *
+ * @tree: the iova tree to allocate from
+ * @map: the new map (as translated addr & size) to allocate in iova region
+ * @iova_begin: the minimum address of the allocation
+ * @iova_end: the maximum addressable direction of the allocation
+ *
+ * Allocates a new region of a given size, between iova_min and iova_max.
+ *
+ * Return: Same as iova_tree_insert, but cannot overlap and can be out of
+ * free contiguous range. Caller can get the assigned iova in map->iova.
+ */
+int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                    hwaddr iova_end);
+
 /**
  * iova_tree_destroy:
  *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 23ea35b7a4..27c921c4e2 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -16,6 +16,36 @@ struct IOVATree {
     GTree *tree;
 };
 
+/* Args to pass to iova_tree_alloc foreach function. */
+struct IOVATreeAllocArgs {
+    /* Size of the desired allocation */
+    size_t new_size;
+
+    /* The minimum address allowed in the allocation */
+    hwaddr iova_begin;
+
+    /* The last addressable allowed in the allocation */
+    hwaddr iova_last;
+
+    /* Previously-to-last iterated map, can be NULL in the first node */
+    const DMAMap *hole_left;
+
+    /* Last iterated map */
+    const DMAMap *hole_right;
+};
+
+/**
+ * Iterate args to tne next hole
+ *
+ * @args  The alloc arguments
+ * @next  The next mapping in the tree. Can be NULL to signal the last one
+ */
+static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
+                                         const DMAMap *next) {
+    args->hole_left = args->hole_right;
+    args->hole_right = next;
+}
+
 static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
 {
     const DMAMap *m1 = a, *m2 = b;
@@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
     return IOVA_OK;
 }
 
+/**
+ * Try to accomodate a map of size ret->size in a hole between
+ * max(end(hole_left), iova_start).
+ *
+ * @args Arguments to allocation
+ */
+static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
+{
+    const DMAMap *left = args->hole_left, *right = args->hole_right;
+    uint64_t hole_start, hole_last;
+
+    if (right && right->iova + right->size < args->iova_begin) {
+        return false;
+    }
+
+    if (left && left->iova > args->iova_last) {
+        return false;
+    }
+
+    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
+    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
+
+    if (hole_last - hole_start > args->new_size) {
+        /* We found a valid hole. */
+        return true;
+    }
+
+    /* Keep iterating */
+    return false;
+}
+
+/**
+ * Foreach dma node in the tree, compare if there is a hole wit its previous
+ * node (or minimum iova address allowed) and the node.
+ *
+ * @key   Node iterating
+ * @value Node iterating
+ * @pargs Struct to communicate with the outside world
+ *
+ * Return: false to keep iterating, true if needs break.
+ */
+static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
+                                         gpointer pargs)
+{
+    struct IOVATreeAllocArgs *args = pargs;
+    DMAMap *node = value;
+
+    assert(key == value);
+
+    iova_tree_alloc_args_iterate(args, node);
+    if (args->hole_left && args->hole_left->iova > args->iova_last) {
+        return true;
+    }
+
+    if (iova_tree_alloc_map_in_hole(args)) {
+        return true;
+    }
+
+    return false;
+}
+
+int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                    hwaddr iova_last)
+{
+    struct IOVATreeAllocArgs args = {
+        .new_size = map->size,
+        .iova_begin = iova_begin,
+        .iova_last = iova_last,
+    };
+
+    if (iova_begin == 0) {
+        /* Some devices does not like addr 0 */
+        iova_begin += qemu_real_host_page_size;
+    }
+
+    assert(iova_begin < iova_last);
+
+    /*
+     * Find a valid hole for the mapping
+     *
+     * Assuming low iova_begin, so no need to do a binary search to
+     * locate the first node.
+     *
+     * TODO: We can improve the search speed if we save the beginning and the
+     * end of holes, so we don't iterate over the previous saved ones.
+     *
+     * TODO: Replace all this with g_tree_node_first/next/last when available
+     * (from glib since 2.68). To do it with g_tree_foreach complicates the
+     * code a lot.
+     *
+     */
+    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
+    if (!iova_tree_alloc_map_in_hole(&args)) {
+        /*
+         * 2nd try: Last iteration left args->right as the last DMAMap. But
+         * (right, end) hole needs to be checked too
+         */
+        iova_tree_alloc_args_iterate(&args, NULL);
+        if (!iova_tree_alloc_map_in_hole(&args)) {
+            return IOVA_ERR_NOMEM;
+        }
+    }
+
+    map->iova = MAX(iova_begin,
+                    args.hole_left ?
+                    args.hole_left->iova + args.hole_left->size + 1 : 0);
+    return iova_tree_insert(tree, map);
+}
+
 void iova_tree_destroy(IOVATree *tree)
 {
     g_tree_destroy(tree->tree);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 24/26] vhost: Add VhostIOVATree
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (22 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 25/26] vhost: Use a tree to store memory mappings Eugenio Pérez
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

This tree is able to look for a translated address from an IOVA address.

At first glance is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities, like allocating
IOVA chunks or perform reverse translations (qemu addresses to iova).

The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of its reachability, nor
qemu's one. At the moment, the allocation is just done growing, not
allowing deletion.

It duplicates the tree so it can search efficiently both directions,
and it will signal overlap if iova or the translated address is
present in it's each array.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  27 +++++++
 hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
 hw/virtio/meson.build       |   2 +-
 3 files changed, 185 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-iova-tree.c

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
new file mode 100644
index 0000000000..56652e7d2b
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.h
@@ -0,0 +1,27 @@
+/*
+ * vhost software live migration ring
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
+#define HW_VIRTIO_VHOST_IOVA_TREE_H
+
+#include "qemu/iova-tree.h"
+#include "exec/memory.h"
+
+typedef struct VhostIOVATree VhostIOVATree;
+
+VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
+void vhost_iova_tree_unref(VhostIOVATree *iova_rm);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_unref);
+
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_rm,
+                                        const DMAMap *map);
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_rm, DMAMap *map);
+void vhost_iova_tree_remove(VhostIOVATree *iova_rm, const DMAMap *map);
+
+#endif
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
new file mode 100644
index 0000000000..021779cfd5
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.c
@@ -0,0 +1,157 @@
+/*
+ * vhost software live migration ring
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iova-tree.h"
+#include "vhost-iova-tree.h"
+
+#define iova_min_addr qemu_real_host_page_size
+
+/**
+ * VhostIOVATree, able to:
+ * - Translate iova address
+ * - Reverse translate iova address (from translated to iova)
+ * - Allocate IOVA regions for translated range (potentially slow operation)
+ *
+ * Note that it cannot remove nodes.
+ */
+struct VhostIOVATree {
+    /* First addresable iova address in the device */
+    uint64_t iova_first;
+
+    /* Last addressable iova address in the device */
+    uint64_t iova_last;
+
+    /* IOVA address to qemu memory maps. */
+    IOVATree *iova_taddr_map;
+
+    /* QEMU virtual memory address to iova maps */
+    GTree *taddr_iova_map;
+};
+
+static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
+                                      gpointer data)
+{
+    const DMAMap *m1 = a, *m2 = b;
+
+    if (m1->translated_addr > m2->translated_addr + m2->size) {
+        return 1;
+    }
+
+    if (m1->translated_addr + m1->size < m2->translated_addr) {
+        return -1;
+    }
+
+    /* Overlapped */
+    return 0;
+}
+
+/**
+ * Create a new IOVA tree
+ *
+ * Returns the new IOVA tree
+ */
+VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
+{
+    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
+
+    /* Some devices does not like 0 addresses */
+    tree->iova_first = MAX(iova_first, iova_min_addr);
+    tree->iova_last = iova_last;
+
+    tree->iova_taddr_map = iova_tree_new();
+    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
+                                           NULL, g_free);
+    return tree;
+}
+
+/**
+ * Destroy an IOVA tree
+ *
+ * @tree  The iova tree
+ */
+void vhost_iova_tree_unref(VhostIOVATree *tree)
+{
+    iova_tree_destroy(tree->iova_taddr_map);
+    g_tree_unref(tree->taddr_iova_map);
+    g_free(tree);
+}
+
+/**
+ * Find the IOVA address stored from a memory address
+ *
+ * @tree     The iova tree
+ * @map      The map with the memory address
+ *
+ * Return the stored mapping, or NULL if not found.
+ */
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
+                                        const DMAMap *map)
+{
+    return g_tree_lookup(tree->taddr_iova_map, map);
+}
+
+/**
+ * Allocate a new mapping
+ *
+ * @tree  The iova tree
+ * @map   The iova map
+ *
+ * Returns:
+ * - IOVA_OK if the map fits in the container
+ * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
+ * - IOVA_ERR_OVERLAP if the tree already contains that map
+ * - IOVA_ERR_NOMEM if tree cannot allocate more space.
+ *
+ * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
+ */
+int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
+{
+    DMAMap *new;
+    int r;
+
+    if (map->translated_addr + map->size < map->translated_addr ||
+        map->perm == IOMMU_NONE) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /* Check for collisions in translated addresses */
+    if (vhost_iova_tree_find_iova(tree, map)) {
+        return IOVA_ERR_OVERLAP;
+    }
+
+    /* Allocate a node in IOVA address */
+    r = iova_tree_alloc(tree->iova_taddr_map, map, tree->iova_first,
+                        tree->iova_last);
+    if (r != IOVA_OK) {
+        return r;
+    }
+
+    /* Allocate node in qemu -> iova translations */
+    new = g_malloc(sizeof(*new));
+    memcpy(new, map, sizeof(*new));
+    g_tree_insert(tree->taddr_iova_map, new, new);
+    return IOVA_OK;
+}
+
+/**
+ * Remove existing mappings from iova tree
+ *
+ * @param  iova_rm  The vhost iova tree
+ * @param  map      The map to remove
+ */
+void vhost_iova_tree_remove(VhostIOVATree *iova_rm, const DMAMap *map)
+{
+    const DMAMap *overlap;
+
+    iova_tree_remove(iova_rm->iova_taddr_map, map);
+    while ((overlap = vhost_iova_tree_find_iova(iova_rm, map))) {
+        g_tree_remove(iova_rm->taddr_iova_map, overlap);
+    }
+}
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 2dc87613bc..6047670804 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 25/26] vhost: Use a tree to store memory mappings
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (23 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 24/26] vhost: Add VhostIOVATree Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-10-29 18:35 ` [RFC PATCH v5 26/26] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Track memory translations of devices with IOMMU (all vhost-vdpa
devices at the moment). It does not work if device has restrictions in
its iova range at the moment.

Updates to tree are protected by BQL, each one always run from main
event loop context.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h |  3 ++
 hw/virtio/vhost-vdpa.c         | 50 +++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 2f57b17208..365b102c14 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -14,6 +14,7 @@
 
 #include <gmodule.h>
 
+#include "hw/virtio/vhost-iova-tree.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
     bool shadow_vqs_enabled;
+    /* IOVA mapping used by Shadow Virtqueue */
+    VhostIOVATree *iova_map;
     GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     int kick_fd[VIRTIO_QUEUE_MAX];
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index e1c55e43e7..a827ecbe4f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -466,6 +466,7 @@ static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
         vhost_svq_stop(dev, idx, g_ptr_array_index(v->shadow_vqs, idx));
     }
     g_ptr_array_free(v->shadow_vqs, true);
+    g_clear_pointer(&v->iova_map, vhost_iova_tree_unref);
 }
 
 static int vhost_vdpa_cleanup(struct vhost_dev *dev)
@@ -822,6 +823,22 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
     return true;
 }
 
+/**
+ * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
+ * - It always reference qemu memory address, not guest's memory.
+ * - TODO It's always in range of device.
+ *
+ * It returns the translated address
+ */
+static int vhost_vdpa_svq_map(struct vhost_vdpa *v, DMAMap *map)
+{
+    int r = vhost_iova_tree_map_alloc(v->iova_map, map);
+    assert(r == IOVA_OK);
+
+    return vhost_vdpa_dma_map(v, map->iova, map->size,
+                              (void *)map->translated_addr, false);
+}
+
 static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
 {
     int r;
@@ -872,8 +889,36 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
     if (svq_mode) {
         const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
         const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
+        DMAMap device_region, driver_region;
 
         vhost_svq_get_vring_addr(svq, &addr);
+        driver_region = (DMAMap) {
+            .translated_addr = (hwaddr)addr.desc_user_addr,
+
+            /*
+             * DMAMAp.size include the last byte included in the range, while
+             * sizeof marks one past it. Substract one byte to make them match.
+             */
+            .size = vhost_svq_driver_area_size(svq) - 1,
+            .perm = VHOST_ACCESS_RO,
+        };
+        device_region = (DMAMap) {
+            .translated_addr = (hwaddr)addr.used_user_addr,
+            .size = vhost_svq_device_area_size(svq) - 1,
+            .perm = VHOST_ACCESS_RW,
+        };
+
+        r = vhost_vdpa_svq_map(v, &driver_region);
+        assert(r == 0);
+        r = vhost_vdpa_svq_map(v, &device_region);
+        assert(r == 0);
+
+        /* Expose IOVA addresses to vDPA device */
+        addr.avail_user_addr = driver_region.iova + addr.avail_user_addr
+                               - addr.desc_user_addr;
+        addr.desc_user_addr = driver_region.iova;
+        addr.used_user_addr = device_region.iova;
+
         if (n->addr) {
             r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
                                                   false);
@@ -885,7 +930,6 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
             assert(r == 0);
             vhost_svq_set_host_mr_notifier(svq, n->addr);
         }
-
         vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
         vhost_svq_start(dev, idx, svq, v->kick_fd[idx]);
 
@@ -1001,6 +1045,7 @@ static bool vhost_vdpa_valid_features(struct vhost_dev *hdev,
 void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
 {
     struct vhost_dev *hdev = v->dev;
+    hwaddr iova_first = v->iova_range.first, iova_last = v->iova_range.last;
     unsigned n;
     int r;
     uint64_t svq_features = hdev->features | BIT_ULL(VIRTIO_F_IOMMU_PLATFORM) |
@@ -1017,6 +1062,8 @@ void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
             return;
         }
 
+        v->iova_map = vhost_iova_tree_new(iova_first, iova_last);
+
         /* Allocate resources */
         assert(v->shadow_vqs->len == 0);
         for (n = 0; n < hdev->nvqs; ++n) {
@@ -1093,6 +1140,7 @@ err_svq_new:
     if (!enable) {
         /* Resources cleanup */
         g_ptr_array_set_size(v->shadow_vqs, 0);
+        g_clear_pointer(&v->iova_map, vhost_iova_tree_unref);
     }
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH v5 26/26] vdpa: Add custom IOTLB translations to SVQ
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (24 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 25/26] vhost: Use a tree to store memory mappings Eugenio Pérez
@ 2021-10-29 18:35 ` Eugenio Pérez
  2021-11-01  9:06 ` [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Perez Martin
  2021-11-02  4:25   ` Jason Wang
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Pérez @ 2021-10-29 18:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, Stefan Hajnoczi, Peter Xu,
	Markus Armbruster, Harpreet Singh Anand, Xiao W Wang, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Eduardo Habkost

Use translations added in VhostIOVATree in SVQ.

Now every element needs to remember the iova / GPA address also, so
VirtQueue can consume the elements properly. This adds a little
overhead per VQ element, having to allocate more memory to stash them.

As a possible optimization, this allocation could be avoided if the
descriptor is not a chain but a single one, but this is left undone.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |   4 +-
 hw/virtio/vhost-shadow-virtqueue.c | 156 ++++++++++++++++++++++-------
 hw/virtio/vhost-vdpa.c             |  35 ++++++-
 3 files changed, 159 insertions(+), 36 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index ac55588009..903b9f7a14 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -12,6 +12,7 @@
 
 #include "hw/virtio/vhost.h"
 #include "qemu/event_notifier.h"
+#include "hw/virtio/vhost-iova-tree.h"
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
@@ -35,7 +36,8 @@ void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
 void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
                     VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
+VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
+                                    VhostIOVATree *iova_map);
 
 void vhost_svq_free(VhostShadowVirtqueue *vq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index ad1b2342be..7ab506f9e7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,12 +11,19 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/vhost-iova-tree.h"
 
 #include "standard-headers/linux/vhost_types.h"
 
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 
+typedef struct SVQElement {
+    VirtQueueElement elem;
+    void **vaddr_in_sg;
+    void **vaddr_out_sg;
+} SVQElement;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -49,11 +56,14 @@ typedef struct VhostShadowVirtqueue {
     /* Virtio device */
     VirtIODevice *vdev;
 
+    /* IOVA mapping if used */
+    VhostIOVATree *iova_map;
+
     /* Map for returning guest's descriptors */
-    VirtQueueElement **ring_id_maps;
+    SVQElement **ring_id_maps;
 
-    /* Next VirtQueue element that guest made available */
-    VirtQueueElement *next_guest_avail_elem;
+    /* Next SVQ element that guest made available */
+    SVQElement *next_guest_avail_elem;
 
     /* Next head to expose to device */
     uint16_t avail_idx_shadow;
@@ -80,6 +90,14 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+static void vhost_svq_elem_free(SVQElement *elem)
+{
+    g_free(elem->vaddr_in_sg);
+    g_free(elem->vaddr_out_sg);
+    g_free(elem);
+}
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(SVQElement, vhost_svq_elem_free)
+
 /**
  * VirtIO transport device feature acknowledge
  *
@@ -99,13 +117,7 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
             continue;
 
         case VIRTIO_F_ACCESS_PLATFORM:
-            /* SVQ does not know how to translate addresses */
-            if (*dev_features & BIT_ULL(b)) {
-                clear_bit(b, dev_features);
-                r = false;
-            }
-            break;
-
+            /* SVQ trust in host's IOMMU to translate addresses */
         case VIRTIO_F_VERSION_1:
             /* SVQ trust that guest vring is little endian */
             if (!(*dev_features & BIT_ULL(b))) {
@@ -175,7 +187,49 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
     }
 }
 
+static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
+                                     void ***vaddr, const struct iovec *iovec,
+                                     size_t num)
+{
+    size_t i;
+
+    if (num == 0) {
+        return true;
+    }
+
+    g_autofree void **addrs = g_new(void *, num);
+    for (i = 0; i < num; ++i) {
+        DMAMap needle = {
+            .translated_addr = (hwaddr)iovec[i].iov_base,
+            .size = iovec[i].iov_len,
+        };
+        size_t off;
+
+        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_map, &needle);
+        /*
+         * Map cannot be NULL since iova map contains all guest space and
+         * qemu already has a physical address mapped
+         */
+        if (unlikely(!map)) {
+            error_report("Invalid address 0x%"HWADDR_PRIx" given by guest",
+                         needle.translated_addr);
+            return false;
+        }
+
+        /*
+         * Map->iova chunk size is ignored. What to do if descriptor
+         * (addr, size) does not fit is delegated to the device.
+         */
+        off = needle.translated_addr - map->translated_addr;
+        addrs[i] = (void *)(map->iova + off);
+    }
+
+    *vaddr = g_steal_pointer(&addrs);
+    return true;
+}
+
 static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+                                    void * const *vaddr_sg,
                                     const struct iovec *iovec,
                                     size_t num, bool more_descs, bool write)
 {
@@ -194,7 +248,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
         } else {
             descs[i].flags = flags;
         }
-        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
+        descs[i].addr = cpu_to_le64((hwaddr)vaddr_sg[n]);
         descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
         last = i;
@@ -204,43 +258,62 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
     svq->free_head = le16_to_cpu(descs[last].next);
 }
 
-static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                    VirtQueueElement *elem)
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
+                                SVQElement *svq_elem,
+                                unsigned *head)
 {
-    int head;
+    VirtQueueElement *elem = &svq_elem->elem;
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
+    bool ok;
 
-    head = svq->free_head;
+    *head = svq->free_head;
 
     /* We need some descriptors here */
     assert(elem->out_num || elem->in_num);
 
-    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
-                            elem->in_num > 0, false);
-    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
+    ok = vhost_svq_translate_addr(svq, &svq_elem->vaddr_in_sg, elem->in_sg,
+                                  elem->in_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+    ok = vhost_svq_translate_addr(svq, &svq_elem->vaddr_out_sg, elem->out_sg,
+                                  elem->out_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    vhost_vring_write_descs(svq, svq_elem->vaddr_out_sg, elem->out_sg,
+                            elem->out_num, elem->in_num > 0, false);
+    vhost_vring_write_descs(svq, svq_elem->vaddr_in_sg, elem->in_sg,
+                            elem->in_num, false, true);
 
     /*
      * Put entry in available array (but don't update avail->idx until they
      * do sync).
      */
     avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
-    avail->ring[avail_idx] = cpu_to_le16(head);
+    avail->ring[avail_idx] = cpu_to_le16(*head);
     svq->avail_idx_shadow++;
 
     /* Update avail index after the descriptor is wrote */
     smp_wmb();
     avail->idx = cpu_to_le16(svq->avail_idx_shadow);
 
-    return head;
+    return true;
 
 }
 
-static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
 {
-    unsigned qemu_head = vhost_svq_add_split(svq, elem);
+    unsigned qemu_head;
+    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+    if (unlikely(!ok)) {
+        return false;
+    }
 
     svq->ring_id_maps[qemu_head] = elem;
+    return true;
 }
 
 static void vhost_svq_kick(VhostShadowVirtqueue *svq)
@@ -284,7 +357,8 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
         }
 
         while (true) {
-            VirtQueueElement *elem;
+            SVQElement *elem;
+            bool ok;
 
             if (svq->next_guest_avail_elem) {
                 elem = g_steal_pointer(&svq->next_guest_avail_elem);
@@ -296,7 +370,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 break;
             }
 
-            if (elem->out_num + elem->in_num >
+            if (elem->elem.out_num + elem->elem.in_num >
                 vhost_svq_available_slots(svq)) {
                 /*
                  * This condition is possible since a contiguous buffer in GPA
@@ -313,7 +387,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 return;
             }
 
-            vhost_svq_add(svq, elem);
+            ok = vhost_svq_add(svq, elem);
+            if (unlikely(!ok)) {
+                /* VQ is broken, just return and ignore any other kicks */
+                return;
+            }
             vhost_svq_kick(svq);
         }
 
@@ -344,7 +422,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
     return svq->last_used_idx != svq->shadow_used_idx;
 }
 
-static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
+static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
 {
     vring_desc_t *descs = svq->vring.desc;
     const vring_used_t *used = svq->vring.used;
@@ -378,7 +456,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
     descs[used_elem.id].next = svq->free_head;
     svq->free_head = used_elem.id;
 
-    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
+    svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
     return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
 }
 
@@ -393,8 +471,9 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 
         vhost_svq_set_notification(svq, false);
         while (true) {
-            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
-            if (!elem) {
+            g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
+            VirtQueueElement *elem;
+            if (!svq_elem) {
                 break;
             }
 
@@ -406,6 +485,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 virtqueue_flush(vq, i);
                 i = 0;
             }
+            elem = &svq_elem->elem;
             virtqueue_fill(vq, elem, elem->len, i++);
         }
 
@@ -583,10 +663,15 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
     vhost_svq_flush(svq, false);
 
     for (i = 0; i < svq->vring.num; ++i) {
-        g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
-        if (elem) {
-            virtqueue_detach_element(svq->vq, elem, elem->len);
+        g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
+        VirtQueueElement *elem;
+
+        if (!svq_elem) {
+            continue;
         }
+
+        elem = &svq_elem->elem;
+        virtqueue_detach_element(svq->vq, elem, elem->len);
     }
 }
 
@@ -594,7 +679,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
  */
-VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
+VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
+                                    VhostIOVATree *iova_map)
 {
     int vq_idx = dev->vq_index + idx;
     unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
@@ -628,11 +714,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
     memset(svq->vring.used, 0, device_size);
+    svq->iova_map = iova_map;
+
     for (i = 0; i < num - 1; i++) {
         svq->vring.desc[i].next = cpu_to_le16(i + 1);
     }
 
-    svq->ring_id_maps = g_new0(VirtQueueElement *, num);
+    svq->ring_id_maps = g_new0(SVQElement *, num);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index a827ecbe4f..8466580ae7 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -210,6 +210,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                          vaddr, section->readonly);
 
     llsize = int128_sub(llend, int128_make64(iova));
+    if (v->shadow_vqs_enabled) {
+        DMAMap mem_region = {
+            .translated_addr = (hwaddr)vaddr,
+            .size = int128_get64(llsize) - 1,
+            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
+        };
+
+        int r = vhost_iova_tree_map_alloc(v->iova_map, &mem_region);
+        assert(r == IOVA_OK);
+
+        iova = mem_region.iova;
+    }
 
     vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
@@ -262,6 +274,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
 
     llsize = int128_sub(llend, int128_make64(iova));
 
+    if (v->shadow_vqs_enabled) {
+        const DMAMap *result;
+        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+        DMAMap mem_region = {
+            .translated_addr = (hwaddr)vaddr,
+            .size = int128_get64(llsize) - 1,
+        };
+
+        result = vhost_iova_tree_find_iova(v->iova_map, &mem_region);
+        iova = result->iova;
+        vhost_iova_tree_remove(v->iova_map, &mem_region);
+    }
     vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
     if (ret) {
@@ -1067,7 +1093,7 @@ void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
         /* Allocate resources */
         assert(v->shadow_vqs->len == 0);
         for (n = 0; n < hdev->nvqs; ++n) {
-            VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
+            VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
             if (unlikely(!svq)) {
                 error_setg(errp, "Cannot create svq");
                 enable = false;
@@ -1088,6 +1114,13 @@ void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
         vhost_vdpa_get_vq_state(hdev, hdev->vq_index + n);
     }
 
+    memory_listener_unregister(&v->listener);
+    r = vhost_vdpa_dma_unmap(v, iova_first,
+                            (iova_last - iova_first) & TARGET_PAGE_MASK);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "Fail to invalidate IOTLB");
+    }
+
     /* Reset device so it can be configured */
     vhost_vdpa_dev_start(hdev, false);
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 01/26] util: Make some iova_tree parameters const
  2021-10-29 18:35 ` [RFC PATCH v5 01/26] util: Make some iova_tree parameters const Eugenio Pérez
@ 2021-10-31 18:59     ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-10-31 18:59 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> As qemu guidelines:
> Unless a pointer is used to modify the pointed-to storage, give it the
> "const" attribute.
>
> In the particular case of iova_tree_find it allows to enforce what is
> requested by its comment, since the compiler would shout in case of
> modifying or freeing the const-qualified returned pointer.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

This patch can go in already, whose tree should this go through?

Later, Juan.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 01/26] util: Make some iova_tree parameters const
@ 2021-10-31 18:59     ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-10-31 18:59 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-devel, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> As qemu guidelines:
> Unless a pointer is used to modify the pointed-to storage, give it the
> "const" attribute.
>
> In the particular case of iova_tree_find it allows to enforce what is
> requested by its comment, since the compiler would shout in case of
> modifying or freeing the const-qualified returned pointer.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

This patch can go in already, whose tree should this go through?

Later, Juan.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 01/26] util: Make some iova_tree parameters const
  2021-10-31 18:59     ` Juan Quintela
  (?)
@ 2021-11-01  8:20     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-01  8:20 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-level, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Sun, Oct 31, 2021 at 7:59 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Eugenio Pérez <eperezma@redhat.com> wrote:
> > As qemu guidelines:
> > Unless a pointer is used to modify the pointed-to storage, give it the
> > "const" attribute.
> >
> > In the particular case of iova_tree_find it allows to enforce what is
> > requested by its comment, since the compiler would shout in case of
> > modifying or freeing the const-qualified returned pointer.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> This patch can go in already, whose tree should this go through?
>

Hi Juan,

Sorry for being unclear, this patch has been already queued in
Bonzini's tree [1]. I included it here because it is still not in the
master branch.

Thanks!

[1] https://lists.nongnu.org/archive/html/qemu-devel/2021-10/msg03407.html

> Later, Juan.
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 00/26] vDPA shadow virtqueue
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
                   ` (25 preceding siblings ...)
  2021-10-29 18:35 ` [RFC PATCH v5 26/26] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
@ 2021-11-01  9:06 ` Eugenio Perez Martin
  2021-11-02  4:25   ` Jason Wang
  27 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-01  9:06 UTC (permalink / raw)
  To: qemu-level
  Cc: Laurent Vivier, Parav Pandit, Juan Quintela, Jason Wang,
	Michael S. Tsirkin, Richard Henderson, Markus Armbruster,
	Peter Xu, virtualization, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Eduardo Habkost,
	Eric Blake, Stefano Garzarella

On Fri, Oct 29, 2021 at 8:41 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only supports packed vring and not split and old guest
> with no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with QMP command:
>
> { "execute": "x-vhost-set-shadow-vq",
>       "arguments": { "name": "vhost-vdpa0", "enable": true } }
>
> This series includes some patches to delete in the final version that
> helps with its testing. The first two of the series have been sent
> sepparately but they haven't been included in qemu main branch.
>
> The two after them adds the feature to stop the device and be able to
> set and get its status. It's intended to be used with vp_vpda driver in
> a nested environment, so they are also external to this series. The
> vp_vdpa driver also need modifications to forward the new status bit,
> they will be proposed sepparately
>
> Patches 5-12 prepares the SVQ and QMP command to support guest to host
> notifications forwarding. If the SVQ is enabled with these ones
> applied and the device supports it, that part can be tested in
> isolation (for example, with networking), hopping through SVQ.
>
> Same thing is true with patches 13-17, but with device to guest
> notifications.
>
> Based on them, patches from 18 to 22 implement the actual buffer
> forwarding, using some features already introduced in previous.
> However, they will need a host device with no iommu, something that
> is not available at the moment.
>
> The last part of the series uses properly the host iommu, so the driver
> can access this new virtual address space created.
>
> Comments are welcome.
>
> TODO:
> * Event, indirect, packed, and others features of virtio.
> * To sepparate buffers forwarding in its own AIO context, so we can
>   throw more threads to that task and we don't need to stop the main
>   event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>   already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>   different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>   different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>   to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>   * Move everything to vhost-vdpa backend. A big change, this allowed
>     some cleanup but more code has been added in other places.
>   * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>   * Adding vhost-vdpa devices support
>   * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>   * Use QMP instead of migration to start SVQ mode.
>   * Only accepting IOMMU devices, closer behavior with target devices
>     (vDPA)
>   * Fix invalid masking/unmasking of vhost call fd.
>   * Use of proper methods for synchronization.
>   * No need to modify VirtIO device code, all of the changes are
>     contained in vhost code.
>   * Delete superfluous code.
>   * An intermediate RFC was sent with only the notifications forwarding
>     changes. It can be seen in
>     https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>       virtio: Add VIRTIO_F_QUEUE_STATE
>       virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>       virtio: Add virtio_queue_is_host_notifier_enabled
>       vhost: Make vhost_virtqueue_{start,stop} public
>       vhost: Add x-vhost-enable-shadow-vq qmp
>       vhost: Add VhostShadowVirtqueue
>       vdpa: Register vdpa devices in a list
>       vhost: Route guest->host notification through shadow virtqueue
>       Add vhost_svq_get_svq_call_notifier
>       Add vhost_svq_set_guest_call_notifier
>       vdpa: Save call_fd in vhost-vdpa
>       vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>       vhost: Route host->guest notification through shadow virtqueue
>       virtio: Add vhost_shadow_vq_get_vring_addr
>       vdpa: Save host and guest features
>       vhost: Add vhost_svq_valid_device_features to shadow vq
>       vhost: Shadow virtqueue buffers forwarding
>       vhost: Add VhostIOVATree
>       vhost: Use a tree to store memory mappings
>       vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (26):
>   util: Make some iova_tree parameters const
>   vhost: Fix last queue index of devices with no cvq
>   virtio: Add VIRTIO_F_QUEUE_STATE
>   virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>   vhost: Add x-vhost-set-shadow-vq qmp
>   vhost: Add VhostShadowVirtqueue
>   vdpa: Save kick_fd in vhost-vdpa
>   vdpa: Add vhost_svq_get_dev_kick_notifier
>   vdpa: Add vhost_svq_set_svq_kick_fd
>   vhost: Add Shadow VirtQueue kick forwarding capabilities
>   vhost: Handle host notifiers in SVQ
>   vhost: Route guest->host notification through shadow virtqueue
>   Add vhost_svq_get_svq_call_notifier
>   Add vhost_svq_set_guest_call_notifier
>   vdpa: Save call_fd in vhost-vdpa
>   vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>   vhost: Route host->guest notification through shadow virtqueue
>   virtio: Add vhost_shadow_vq_get_vring_addr
>   vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
>   vhost: Add vhost_svq_valid_device_features to shadow vq
>   vhost: Add vhost_svq_valid_guest_features to shadow vq
>   vhost: Shadow virtqueue buffers forwarding
>   util: Add iova_tree_alloc
>   vhost: Add VhostIOVATree
>   vhost: Use a tree to store memory mappings
>   vdpa: Add custom IOTLB translations to SVQ
>
>  qapi/net.json                                 |  20 +
>  hw/virtio/vhost-iova-tree.h                   |  27 +
>  hw/virtio/vhost-shadow-virtqueue.h            |  44 ++
>  hw/virtio/virtio-pci.h                        |   1 +
>  include/hw/virtio/vhost-vdpa.h                |  12 +
>  include/hw/virtio/virtio.h                    |   4 +-
>  include/qemu/iova-tree.h                      |  25 +-
>  .../standard-headers/linux/virtio_config.h    |   5 +
>  include/standard-headers/linux/virtio_pci.h   |   2 +
>  hw/i386/intel_iommu.c                         |   2 +-
>  hw/net/vhost_net.c                            |   2 +-
>  hw/net/virtio-net.c                           |   6 +-
>  hw/virtio/vhost-iova-tree.c                   | 157 ++++
>  hw/virtio/vhost-shadow-virtqueue.c            | 746 ++++++++++++++++++
>  hw/virtio/vhost-vdpa.c                        | 437 +++++++++-
>  hw/virtio/virtio-pci.c                        |  16 +-
>  net/vhost-vdpa.c                              |  28 +
>  util/iova-tree.c                              | 151 +++-
>  hw/virtio/meson.build                         |   2 +-
>  19 files changed, 1664 insertions(+), 23 deletions(-)
>  create mode 100644 hw/virtio/vhost-iova-tree.h
>  create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>  create mode 100644 hw/virtio/vhost-iova-tree.c
>  create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>
> --
> 2.27.0
>
>
>

To make the review easier, this tree is also at [1].

Thanks!

[1] https://github.com/eugpermar/qemu/tree/vdpa_sw_live_migration.d/vdpa-v5



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 00/26] vDPA shadow virtqueue
  2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
@ 2021-11-02  4:25   ` Jason Wang
  2021-10-29 18:35 ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Eugenio Pérez
                     ` (26 subsequent siblings)
  27 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  4:25 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:34, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only supports packed vring and not split and old guest
> with no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with QMP command:
>
> { "execute": "x-vhost-set-shadow-vq",
>        "arguments": { "name": "vhost-vdpa0", "enable": true } }
>
> This series includes some patches to delete in the final version that
> helps with its testing. The first two of the series have been sent
> sepparately but they haven't been included in qemu main branch.
>
> The two after them adds the feature to stop the device and be able to
> set and get its status. It's intended to be used with vp_vpda driver in
> a nested environment, so they are also external to this series. The
> vp_vdpa driver also need modifications to forward the new status bit,
> they will be proposed sepparately
>
> Patches 5-12 prepares the SVQ and QMP command to support guest to host
> notifications forwarding. If the SVQ is enabled with these ones
> applied and the device supports it, that part can be tested in
> isolation (for example, with networking), hopping through SVQ.
>
> Same thing is true with patches 13-17, but with device to guest
> notifications.
>
> Based on them, patches from 18 to 22 implement the actual buffer
> forwarding, using some features already introduced in previous.
> However, they will need a host device with no iommu, something that
> is not available at the moment.
>
> The last part of the series uses properly the host iommu, so the driver
> can access this new virtual address space created.
>
> Comments are welcome.


I think we need do some benchmark to see the performance impact.

Thanks


>
> TODO:
> * Event, indirect, packed, and others features of virtio.
> * To sepparate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (26):
>    util: Make some iova_tree parameters const
>    vhost: Fix last queue index of devices with no cvq
>    virtio: Add VIRTIO_F_QUEUE_STATE
>    virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>    vhost: Add x-vhost-set-shadow-vq qmp
>    vhost: Add VhostShadowVirtqueue
>    vdpa: Save kick_fd in vhost-vdpa
>    vdpa: Add vhost_svq_get_dev_kick_notifier
>    vdpa: Add vhost_svq_set_svq_kick_fd
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Handle host notifiers in SVQ
>    vhost: Route guest->host notification through shadow virtqueue
>    Add vhost_svq_get_svq_call_notifier
>    Add vhost_svq_set_guest_call_notifier
>    vdpa: Save call_fd in vhost-vdpa
>    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>    vhost: Route host->guest notification through shadow virtqueue
>    virtio: Add vhost_shadow_vq_get_vring_addr
>    vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
>    vhost: Add vhost_svq_valid_device_features to shadow vq
>    vhost: Add vhost_svq_valid_guest_features to shadow vq
>    vhost: Shadow virtqueue buffers forwarding
>    util: Add iova_tree_alloc
>    vhost: Add VhostIOVATree
>    vhost: Use a tree to store memory mappings
>    vdpa: Add custom IOTLB translations to SVQ
>
>   qapi/net.json                                 |  20 +
>   hw/virtio/vhost-iova-tree.h                   |  27 +
>   hw/virtio/vhost-shadow-virtqueue.h            |  44 ++
>   hw/virtio/virtio-pci.h                        |   1 +
>   include/hw/virtio/vhost-vdpa.h                |  12 +
>   include/hw/virtio/virtio.h                    |   4 +-
>   include/qemu/iova-tree.h                      |  25 +-
>   .../standard-headers/linux/virtio_config.h    |   5 +
>   include/standard-headers/linux/virtio_pci.h   |   2 +
>   hw/i386/intel_iommu.c                         |   2 +-
>   hw/net/vhost_net.c                            |   2 +-
>   hw/net/virtio-net.c                           |   6 +-
>   hw/virtio/vhost-iova-tree.c                   | 157 ++++
>   hw/virtio/vhost-shadow-virtqueue.c            | 746 ++++++++++++++++++
>   hw/virtio/vhost-vdpa.c                        | 437 +++++++++-
>   hw/virtio/virtio-pci.c                        |  16 +-
>   net/vhost-vdpa.c                              |  28 +
>   util/iova-tree.c                              | 151 +++-
>   hw/virtio/meson.build                         |   2 +-
>   19 files changed, 1664 insertions(+), 23 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 00/26] vDPA shadow virtqueue
@ 2021-11-02  4:25   ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  4:25 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Juan Quintela,
	Richard Henderson, Stefan Hajnoczi, Peter Xu, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:34, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only supports packed vring and not split and old guest
> with no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with QMP command:
>
> { "execute": "x-vhost-set-shadow-vq",
>        "arguments": { "name": "vhost-vdpa0", "enable": true } }
>
> This series includes some patches to delete in the final version that
> helps with its testing. The first two of the series have been sent
> sepparately but they haven't been included in qemu main branch.
>
> The two after them adds the feature to stop the device and be able to
> set and get its status. It's intended to be used with vp_vpda driver in
> a nested environment, so they are also external to this series. The
> vp_vdpa driver also need modifications to forward the new status bit,
> they will be proposed sepparately
>
> Patches 5-12 prepares the SVQ and QMP command to support guest to host
> notifications forwarding. If the SVQ is enabled with these ones
> applied and the device supports it, that part can be tested in
> isolation (for example, with networking), hopping through SVQ.
>
> Same thing is true with patches 13-17, but with device to guest
> notifications.
>
> Based on them, patches from 18 to 22 implement the actual buffer
> forwarding, using some features already introduced in previous.
> However, they will need a host device with no iommu, something that
> is not available at the moment.
>
> The last part of the series uses properly the host iommu, so the driver
> can access this new virtual address space created.
>
> Comments are welcome.


I think we need do some benchmark to see the performance impact.

Thanks


>
> TODO:
> * Event, indirect, packed, and others features of virtio.
> * To sepparate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (26):
>    util: Make some iova_tree parameters const
>    vhost: Fix last queue index of devices with no cvq
>    virtio: Add VIRTIO_F_QUEUE_STATE
>    virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>    vhost: Add x-vhost-set-shadow-vq qmp
>    vhost: Add VhostShadowVirtqueue
>    vdpa: Save kick_fd in vhost-vdpa
>    vdpa: Add vhost_svq_get_dev_kick_notifier
>    vdpa: Add vhost_svq_set_svq_kick_fd
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Handle host notifiers in SVQ
>    vhost: Route guest->host notification through shadow virtqueue
>    Add vhost_svq_get_svq_call_notifier
>    Add vhost_svq_set_guest_call_notifier
>    vdpa: Save call_fd in vhost-vdpa
>    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>    vhost: Route host->guest notification through shadow virtqueue
>    virtio: Add vhost_shadow_vq_get_vring_addr
>    vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
>    vhost: Add vhost_svq_valid_device_features to shadow vq
>    vhost: Add vhost_svq_valid_guest_features to shadow vq
>    vhost: Shadow virtqueue buffers forwarding
>    util: Add iova_tree_alloc
>    vhost: Add VhostIOVATree
>    vhost: Use a tree to store memory mappings
>    vdpa: Add custom IOTLB translations to SVQ
>
>   qapi/net.json                                 |  20 +
>   hw/virtio/vhost-iova-tree.h                   |  27 +
>   hw/virtio/vhost-shadow-virtqueue.h            |  44 ++
>   hw/virtio/virtio-pci.h                        |   1 +
>   include/hw/virtio/vhost-vdpa.h                |  12 +
>   include/hw/virtio/virtio.h                    |   4 +-
>   include/qemu/iova-tree.h                      |  25 +-
>   .../standard-headers/linux/virtio_config.h    |   5 +
>   include/standard-headers/linux/virtio_pci.h   |   2 +
>   hw/i386/intel_iommu.c                         |   2 +-
>   hw/net/vhost_net.c                            |   2 +-
>   hw/net/virtio-net.c                           |   6 +-
>   hw/virtio/vhost-iova-tree.c                   | 157 ++++
>   hw/virtio/vhost-shadow-virtqueue.c            | 746 ++++++++++++++++++
>   hw/virtio/vhost-vdpa.c                        | 437 +++++++++-
>   hw/virtio/virtio-pci.c                        |  16 +-
>   net/vhost-vdpa.c                              |  28 +
>   util/iova-tree.c                              | 151 +++-
>   hw/virtio/meson.build                         |   2 +-
>   19 files changed, 1664 insertions(+), 23 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE
  2021-10-29 18:35 ` [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE Eugenio Pérez
@ 2021-11-02  4:57     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  4:57 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Sat, Oct 30, 2021 at 2:36 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Implementation of RFC of device state capability:
> https://lists.oasis-open.org/archives/virtio-comment/202012/msg00005.html

Considering this still requires time to be done, we need to think of a
way to go without this.

Thanks



>
> With this capability, vdpa device can reset it's index so it can start
> consuming from guest after disabling shadow virtqueue (SVQ), with state
> not 0.
>
> The use case is to test SVQ with virtio-pci vdpa (vp_vdpa) with nested
> virtualization. Spawning a L0 qemu with a virtio-net device, use
> vp_vdpa driver to handle it in the guest, and then spawn a L1 qemu using
> that vdpa device. When L1 qemu calls device to set a new state though
> vdpa ioctl, vp_vdpa should set each queue state though virtio
> VIRTIO_PCI_COMMON_Q_AVAIL_STATE.
>
> Since this is only for testing vhost-vdpa, it's added here before of
> proposing to kernel code. No effort is done for checking that device
> can actually change its state, its layout, or if the device even
> supports to change state at all. These will be added in the future.
>
> Also, a modified version of vp_vdpa that allows to set these in PCI
> config is needed.
>
> TODO: Check for feature enabled and split in virtio pci config
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/virtio-pci.h                         | 1 +
>  include/hw/virtio/virtio.h                     | 4 +++-
>  include/standard-headers/linux/virtio_config.h | 3 +++
>  include/standard-headers/linux/virtio_pci.h    | 2 ++
>  hw/virtio/virtio-pci.c                         | 9 +++++++++
>  5 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 2446dcd9ae..019badbd7c 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -120,6 +120,7 @@ typedef struct VirtIOPCIQueue {
>    uint32_t desc[2];
>    uint32_t avail[2];
>    uint32_t used[2];
> +  uint16_t state;
>  } VirtIOPCIQueue;
>
>  struct VirtIOPCIProxy {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 8bab9cfb75..5fe575b8f0 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -289,7 +289,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>      DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
>                        VIRTIO_F_IOMMU_PLATFORM, false), \
>      DEFINE_PROP_BIT64("packed", _state, _field, \
> -                      VIRTIO_F_RING_PACKED, false)
> +                      VIRTIO_F_RING_PACKED, false), \
> +    DEFINE_PROP_BIT64("save_restore_q_state", _state, _field, \
> +                      VIRTIO_F_QUEUE_STATE, true)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
> index 22e3a85f67..59fad3eb45 100644
> --- a/include/standard-headers/linux/virtio_config.h
> +++ b/include/standard-headers/linux/virtio_config.h
> @@ -90,4 +90,7 @@
>   * Does the device support Single Root I/O Virtualization?
>   */
>  #define VIRTIO_F_SR_IOV                        37
> +
> +/* Device support save and restore virtqueue state */
> +#define VIRTIO_F_QUEUE_STATE            40
>  #endif /* _LINUX_VIRTIO_CONFIG_H */
> diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h
> index db7a8e2fcb..c8d9802a87 100644
> --- a/include/standard-headers/linux/virtio_pci.h
> +++ b/include/standard-headers/linux/virtio_pci.h
> @@ -164,6 +164,7 @@ struct virtio_pci_common_cfg {
>         uint32_t queue_avail_hi;                /* read-write */
>         uint32_t queue_used_lo;         /* read-write */
>         uint32_t queue_used_hi;         /* read-write */
> +       uint16_t queue_avail_state;     /* read-write */
>  };
>
>  /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
> @@ -202,6 +203,7 @@ struct virtio_pci_cfg_cap {
>  #define VIRTIO_PCI_COMMON_Q_AVAILHI    44
>  #define VIRTIO_PCI_COMMON_Q_USEDLO     48
>  #define VIRTIO_PCI_COMMON_Q_USEDHI     52
> +#define VIRTIO_PCI_COMMON_Q_AVAIL_STATE        56
>
>  #endif /* VIRTIO_PCI_NO_MODERN */
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 750aa47ec1..d7bb549033 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1244,6 +1244,9 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
>      case VIRTIO_PCI_COMMON_Q_USEDHI:
>          val = proxy->vqs[vdev->queue_sel].used[1];
>          break;
> +    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
> +        val = virtio_queue_get_last_avail_idx(vdev, vdev->queue_sel);
> +        break;
>      default:
>          val = 0;
>      }
> @@ -1330,6 +1333,8 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>                         proxy->vqs[vdev->queue_sel].avail[0],
>                         ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
>                         proxy->vqs[vdev->queue_sel].used[0]);
> +            virtio_queue_set_last_avail_idx(vdev, vdev->queue_sel,
> +                        proxy->vqs[vdev->queue_sel].state);
>              proxy->vqs[vdev->queue_sel].enabled = 1;
>          } else {
>              virtio_error(vdev, "wrong value for queue_enable %"PRIx64, val);
> @@ -1353,6 +1358,9 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>      case VIRTIO_PCI_COMMON_Q_USEDHI:
>          proxy->vqs[vdev->queue_sel].used[1] = val;
>          break;
> +    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
> +        proxy->vqs[vdev->queue_sel].state = val;
> +        break;
>      default:
>          break;
>      }
> @@ -1951,6 +1959,7 @@ static void virtio_pci_reset(DeviceState *qdev)
>          proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>          proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>          proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
> +        proxy->vqs[i].state = 0;
>      }
>
>      if (pci_is_express(dev)) {
> --
> 2.27.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE
@ 2021-11-02  4:57     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  4:57 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Sat, Oct 30, 2021 at 2:36 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Implementation of RFC of device state capability:
> https://lists.oasis-open.org/archives/virtio-comment/202012/msg00005.html

Considering this still requires time to be done, we need to think of a
way to go without this.

Thanks



>
> With this capability, vdpa device can reset it's index so it can start
> consuming from guest after disabling shadow virtqueue (SVQ), with state
> not 0.
>
> The use case is to test SVQ with virtio-pci vdpa (vp_vdpa) with nested
> virtualization. Spawning a L0 qemu with a virtio-net device, use
> vp_vdpa driver to handle it in the guest, and then spawn a L1 qemu using
> that vdpa device. When L1 qemu calls device to set a new state though
> vdpa ioctl, vp_vdpa should set each queue state though virtio
> VIRTIO_PCI_COMMON_Q_AVAIL_STATE.
>
> Since this is only for testing vhost-vdpa, it's added here before of
> proposing to kernel code. No effort is done for checking that device
> can actually change its state, its layout, or if the device even
> supports to change state at all. These will be added in the future.
>
> Also, a modified version of vp_vdpa that allows to set these in PCI
> config is needed.
>
> TODO: Check for feature enabled and split in virtio pci config
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/virtio-pci.h                         | 1 +
>  include/hw/virtio/virtio.h                     | 4 +++-
>  include/standard-headers/linux/virtio_config.h | 3 +++
>  include/standard-headers/linux/virtio_pci.h    | 2 ++
>  hw/virtio/virtio-pci.c                         | 9 +++++++++
>  5 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 2446dcd9ae..019badbd7c 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -120,6 +120,7 @@ typedef struct VirtIOPCIQueue {
>    uint32_t desc[2];
>    uint32_t avail[2];
>    uint32_t used[2];
> +  uint16_t state;
>  } VirtIOPCIQueue;
>
>  struct VirtIOPCIProxy {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 8bab9cfb75..5fe575b8f0 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -289,7 +289,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>      DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
>                        VIRTIO_F_IOMMU_PLATFORM, false), \
>      DEFINE_PROP_BIT64("packed", _state, _field, \
> -                      VIRTIO_F_RING_PACKED, false)
> +                      VIRTIO_F_RING_PACKED, false), \
> +    DEFINE_PROP_BIT64("save_restore_q_state", _state, _field, \
> +                      VIRTIO_F_QUEUE_STATE, true)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
> index 22e3a85f67..59fad3eb45 100644
> --- a/include/standard-headers/linux/virtio_config.h
> +++ b/include/standard-headers/linux/virtio_config.h
> @@ -90,4 +90,7 @@
>   * Does the device support Single Root I/O Virtualization?
>   */
>  #define VIRTIO_F_SR_IOV                        37
> +
> +/* Device support save and restore virtqueue state */
> +#define VIRTIO_F_QUEUE_STATE            40
>  #endif /* _LINUX_VIRTIO_CONFIG_H */
> diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h
> index db7a8e2fcb..c8d9802a87 100644
> --- a/include/standard-headers/linux/virtio_pci.h
> +++ b/include/standard-headers/linux/virtio_pci.h
> @@ -164,6 +164,7 @@ struct virtio_pci_common_cfg {
>         uint32_t queue_avail_hi;                /* read-write */
>         uint32_t queue_used_lo;         /* read-write */
>         uint32_t queue_used_hi;         /* read-write */
> +       uint16_t queue_avail_state;     /* read-write */
>  };
>
>  /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
> @@ -202,6 +203,7 @@ struct virtio_pci_cfg_cap {
>  #define VIRTIO_PCI_COMMON_Q_AVAILHI    44
>  #define VIRTIO_PCI_COMMON_Q_USEDLO     48
>  #define VIRTIO_PCI_COMMON_Q_USEDHI     52
> +#define VIRTIO_PCI_COMMON_Q_AVAIL_STATE        56
>
>  #endif /* VIRTIO_PCI_NO_MODERN */
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 750aa47ec1..d7bb549033 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1244,6 +1244,9 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
>      case VIRTIO_PCI_COMMON_Q_USEDHI:
>          val = proxy->vqs[vdev->queue_sel].used[1];
>          break;
> +    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
> +        val = virtio_queue_get_last_avail_idx(vdev, vdev->queue_sel);
> +        break;
>      default:
>          val = 0;
>      }
> @@ -1330,6 +1333,8 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>                         proxy->vqs[vdev->queue_sel].avail[0],
>                         ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
>                         proxy->vqs[vdev->queue_sel].used[0]);
> +            virtio_queue_set_last_avail_idx(vdev, vdev->queue_sel,
> +                        proxy->vqs[vdev->queue_sel].state);
>              proxy->vqs[vdev->queue_sel].enabled = 1;
>          } else {
>              virtio_error(vdev, "wrong value for queue_enable %"PRIx64, val);
> @@ -1353,6 +1358,9 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>      case VIRTIO_PCI_COMMON_Q_USEDHI:
>          proxy->vqs[vdev->queue_sel].used[1] = val;
>          break;
> +    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
> +        proxy->vqs[vdev->queue_sel].state = val;
> +        break;
>      default:
>          break;
>      }
> @@ -1951,6 +1959,7 @@ static void virtio_pci_reset(DeviceState *qdev)
>          proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>          proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>          proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
> +        proxy->vqs[i].state = 0;
>      }
>
>      if (pci_is_express(dev)) {
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2021-10-29 18:35 ` [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
@ 2021-11-02  5:25     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  5:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows it to test if the guest has aknowledge an invalid transport
> feature for SVQ. This will include packed vq layout or event_idx,
> where VirtIO device needs help from SVQ.
>
> There is not needed at this moment, but since SVQ will not re-negotiate
> features again with the guest, a failure in acknowledge them is fatal
> for SVQ.
>

It's not clear to me why we need this. Maybe you can give me an
example. E.g isn't it sufficient to filter out the device with
event_idx?

Thanks

> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.h | 1 +
>  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 946b2c6295..ac55588009 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -16,6 +16,7 @@
>  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>
>  bool vhost_svq_valid_device_features(uint64_t *features);
> +bool vhost_svq_valid_guest_features(uint64_t *features);
>
>  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 6e0508a231..cb9ffcb015 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
>      return true;
>  }
>
> +/* If the guest is using some of these, SVQ cannot communicate */
> +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> +{
> +    return true;
> +}
> +
>  /* Forward guest notifications */
>  static void vhost_handle_guest_kick(EventNotifier *n)
>  {
> --
> 2.27.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
@ 2021-11-02  5:25     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  5:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows it to test if the guest has aknowledge an invalid transport
> feature for SVQ. This will include packed vq layout or event_idx,
> where VirtIO device needs help from SVQ.
>
> There is not needed at this moment, but since SVQ will not re-negotiate
> features again with the guest, a failure in acknowledge them is fatal
> for SVQ.
>

It's not clear to me why we need this. Maybe you can give me an
example. E.g isn't it sufficient to filter out the device with
event_idx?

Thanks

> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.h | 1 +
>  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 946b2c6295..ac55588009 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -16,6 +16,7 @@
>  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>
>  bool vhost_svq_valid_device_features(uint64_t *features);
> +bool vhost_svq_valid_guest_features(uint64_t *features);
>
>  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 6e0508a231..cb9ffcb015 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
>      return true;
>  }
>
> +/* If the guest is using some of these, SVQ cannot communicate */
> +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> +{
> +    return true;
> +}
> +
>  /* Forward guest notifications */
>  static void vhost_handle_guest_kick(EventNotifier *n)
>  {
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue
  2021-10-29 18:35 ` [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
@ 2021-11-02  5:36     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  5:36 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> +/**
> + * Enable or disable shadow virtqueue in a vhost vdpa device.
> + *
> + * This function is idempotent, to call it many times with the same value for
> + * enable_svq will simply return success.
> + *
> + * @v       Vhost vdpa device
> + * @enable  True to set SVQ mode
> + * @errp    Error pointer
> + */
> +void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
> +{


What happens if vhost_vpda is not stated when we try to enable svq? 
Another note is that, the vhost device could be stopped and started 
after svq is enabled/disabled. We need to deal with them.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue
@ 2021-11-02  5:36     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  5:36 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Juan Quintela,
	Richard Henderson, Stefan Hajnoczi, Peter Xu, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> +/**
> + * Enable or disable shadow virtqueue in a vhost vdpa device.
> + *
> + * This function is idempotent, to call it many times with the same value for
> + * enable_svq will simply return success.
> + *
> + * @v       Vhost vdpa device
> + * @enable  True to set SVQ mode
> + * @errp    Error pointer
> + */
> +void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
> +{


What happens if vhost_vpda is not stated when we try to enable svq? 
Another note is that, the vhost device could be stopped and started 
after svq is enabled/disabled. We need to deal with them.

Thanks



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-10-29 18:35 ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Eugenio Pérez
@ 2021-11-02  6:35     ` Jason Wang
  2021-11-23  6:56     ` Peter Xu
  2022-01-27  8:57     ` Peter Xu
  2 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  6:35 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
>
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/qemu/iova-tree.h |  17 +++++
>   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 156 insertions(+)
>
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index 8249edd764..33f9b2e13f 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -29,6 +29,7 @@
>   #define  IOVA_OK           (0)
>   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
>   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */


I think we need a better name other than "NOMEM", since it's actually 
means there's no sufficient hole for the range?


>   
>   typedef struct IOVATree IOVATree;
>   typedef struct DMAMap {
> @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
>    */
>   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
>   
> +/**
> + * iova_tree_alloc:
> + *
> + * @tree: the iova tree to allocate from
> + * @map: the new map (as translated addr & size) to allocate in iova region
> + * @iova_begin: the minimum address of the allocation
> + * @iova_end: the maximum addressable direction of the allocation
> + *
> + * Allocates a new region of a given size, between iova_min and iova_max.
> + *
> + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> + * free contiguous range. Caller can get the assigned iova in map->iova.
> + */
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_end);
> +


"iova_tree_alloc_map" seems better.


>   /**
>    * iova_tree_destroy:
>    *
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 23ea35b7a4..27c921c4e2 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -16,6 +16,36 @@ struct IOVATree {
>       GTree *tree;
>   };
>   
> +/* Args to pass to iova_tree_alloc foreach function. */
> +struct IOVATreeAllocArgs {
> +    /* Size of the desired allocation */
> +    size_t new_size;
> +
> +    /* The minimum address allowed in the allocation */
> +    hwaddr iova_begin;
> +
> +    /* The last addressable allowed in the allocation */
> +    hwaddr iova_last;
> +
> +    /* Previously-to-last iterated map, can be NULL in the first node */
> +    const DMAMap *hole_left;
> +
> +    /* Last iterated map */
> +    const DMAMap *hole_right;


Any reason we can move those to IOVATree structure, it can simplify a 
lot of things.


> +};
> +
> +/**
> + * Iterate args to tne next hole
> + *
> + * @args  The alloc arguments
> + * @next  The next mapping in the tree. Can be NULL to signal the last one
> + */
> +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> +                                         const DMAMap *next) {
> +    args->hole_left = args->hole_right;
> +    args->hole_right = next;
> +}
> +
>   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
>   {
>       const DMAMap *m1 = a, *m2 = b;
> @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
>       return IOVA_OK;
>   }
>   
> +/**
> + * Try to accomodate a map of size ret->size in a hole between
> + * max(end(hole_left), iova_start).
> + *
> + * @args Arguments to allocation
> + */
> +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> +{
> +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> +    uint64_t hole_start, hole_last;
> +
> +    if (right && right->iova + right->size < args->iova_begin) {
> +        return false;
> +    }
> +
> +    if (left && left->iova > args->iova_last) {
> +        return false;
> +    }
> +
> +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> +
> +    if (hole_last - hole_start > args->new_size) {
> +        /* We found a valid hole. */
> +        return true;
> +    }
> +
> +    /* Keep iterating */
> +    return false;
> +}
> +
> +/**
> + * Foreach dma node in the tree, compare if there is a hole wit its previous
> + * node (or minimum iova address allowed) and the node.
> + *
> + * @key   Node iterating
> + * @value Node iterating
> + * @pargs Struct to communicate with the outside world
> + *
> + * Return: false to keep iterating, true if needs break.
> + */
> +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> +                                         gpointer pargs)
> +{
> +    struct IOVATreeAllocArgs *args = pargs;
> +    DMAMap *node = value;
> +
> +    assert(key == value);
> +
> +    iova_tree_alloc_args_iterate(args, node);
> +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> +        return true;
> +    }
> +
> +    if (iova_tree_alloc_map_in_hole(args)) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }
> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.
> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.


To say the truth, the codes in iova_tree_alloc_traverse() is hard to be 
reviewed. I think it would be easy to use first/next/last. What we 
really need is to calculate the hole between two ranges with handmade 
first, last.

Thanks


> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {
> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}
> +
>   void iova_tree_destroy(IOVATree *tree)
>   {
>       g_tree_destroy(tree->tree);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
@ 2021-11-02  6:35     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  6:35 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Juan Quintela,
	Richard Henderson, Stefan Hajnoczi, Peter Xu, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
>
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/qemu/iova-tree.h |  17 +++++
>   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 156 insertions(+)
>
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index 8249edd764..33f9b2e13f 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -29,6 +29,7 @@
>   #define  IOVA_OK           (0)
>   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
>   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */


I think we need a better name other than "NOMEM", since it's actually 
means there's no sufficient hole for the range?


>   
>   typedef struct IOVATree IOVATree;
>   typedef struct DMAMap {
> @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
>    */
>   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
>   
> +/**
> + * iova_tree_alloc:
> + *
> + * @tree: the iova tree to allocate from
> + * @map: the new map (as translated addr & size) to allocate in iova region
> + * @iova_begin: the minimum address of the allocation
> + * @iova_end: the maximum addressable direction of the allocation
> + *
> + * Allocates a new region of a given size, between iova_min and iova_max.
> + *
> + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> + * free contiguous range. Caller can get the assigned iova in map->iova.
> + */
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_end);
> +


"iova_tree_alloc_map" seems better.


>   /**
>    * iova_tree_destroy:
>    *
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 23ea35b7a4..27c921c4e2 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -16,6 +16,36 @@ struct IOVATree {
>       GTree *tree;
>   };
>   
> +/* Args to pass to iova_tree_alloc foreach function. */
> +struct IOVATreeAllocArgs {
> +    /* Size of the desired allocation */
> +    size_t new_size;
> +
> +    /* The minimum address allowed in the allocation */
> +    hwaddr iova_begin;
> +
> +    /* The last addressable allowed in the allocation */
> +    hwaddr iova_last;
> +
> +    /* Previously-to-last iterated map, can be NULL in the first node */
> +    const DMAMap *hole_left;
> +
> +    /* Last iterated map */
> +    const DMAMap *hole_right;


Any reason we can move those to IOVATree structure, it can simplify a 
lot of things.


> +};
> +
> +/**
> + * Iterate args to tne next hole
> + *
> + * @args  The alloc arguments
> + * @next  The next mapping in the tree. Can be NULL to signal the last one
> + */
> +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> +                                         const DMAMap *next) {
> +    args->hole_left = args->hole_right;
> +    args->hole_right = next;
> +}
> +
>   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
>   {
>       const DMAMap *m1 = a, *m2 = b;
> @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
>       return IOVA_OK;
>   }
>   
> +/**
> + * Try to accomodate a map of size ret->size in a hole between
> + * max(end(hole_left), iova_start).
> + *
> + * @args Arguments to allocation
> + */
> +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> +{
> +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> +    uint64_t hole_start, hole_last;
> +
> +    if (right && right->iova + right->size < args->iova_begin) {
> +        return false;
> +    }
> +
> +    if (left && left->iova > args->iova_last) {
> +        return false;
> +    }
> +
> +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> +
> +    if (hole_last - hole_start > args->new_size) {
> +        /* We found a valid hole. */
> +        return true;
> +    }
> +
> +    /* Keep iterating */
> +    return false;
> +}
> +
> +/**
> + * Foreach dma node in the tree, compare if there is a hole wit its previous
> + * node (or minimum iova address allowed) and the node.
> + *
> + * @key   Node iterating
> + * @value Node iterating
> + * @pargs Struct to communicate with the outside world
> + *
> + * Return: false to keep iterating, true if needs break.
> + */
> +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> +                                         gpointer pargs)
> +{
> +    struct IOVATreeAllocArgs *args = pargs;
> +    DMAMap *node = value;
> +
> +    assert(key == value);
> +
> +    iova_tree_alloc_args_iterate(args, node);
> +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> +        return true;
> +    }
> +
> +    if (iova_tree_alloc_map_in_hole(args)) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }
> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.
> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.


To say the truth, the codes in iova_tree_alloc_traverse() is hard to be 
reviewed. I think it would be easy to use first/next/last. What we 
really need is to calculate the hole between two ranges with handmade 
first, last.

Thanks


> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {
> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}
> +
>   void iova_tree_destroy(IOVATree *tree)
>   {
>       g_tree_destroy(tree->tree);



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-10-29 18:35 ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Eugenio Pérez
@ 2021-11-02  7:25     ` Juan Quintela
  2021-11-02  7:40     ` Juan Quintela
  1 sibling, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> The -1 assumes that all devices with no cvq have an spare vq allocated
> for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> case, and the device may have a pair number of queues.
                                  ^^^^
even

I know, I know, I am Spanish myself O:-)

> To fix this, just resort to the lower even number of queues.

I don't understand what you try to achieve here.

> Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> virtio device")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/net/vhost_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 0d888f29a6..edf56a597f 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>      NetClientState *peer;
>  
>      if (!cvq) {
> -        last_index -= 1;
> +        last_index &= ~1ULL;
>      }

As far as I can see, that is a nop. last_index is defined as an int.

$ cat kk.c
#include <stdio.h>

int main(void)
{
	int i = 7;
	i &= -1ULL;
	printf("%d\n", i);
	i = 8;
	i &= -1ULL;
	printf("%d\n", i);
	i = 0;
	i &= -1ULL;
	printf("%d\n", i);
	i = -2;
	i &= -1ULL;
	printf("%d\n", i);
	return 0;
}
$ ./kk
7
8
0
-2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
@ 2021-11-02  7:25     ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-devel, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> The -1 assumes that all devices with no cvq have an spare vq allocated
> for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> case, and the device may have a pair number of queues.
                                  ^^^^
even

I know, I know, I am Spanish myself O:-)

> To fix this, just resort to the lower even number of queues.

I don't understand what you try to achieve here.

> Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> virtio device")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/net/vhost_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 0d888f29a6..edf56a597f 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>      NetClientState *peer;
>  
>      if (!cvq) {
> -        last_index -= 1;
> +        last_index &= ~1ULL;
>      }

As far as I can see, that is a nop. last_index is defined as an int.

$ cat kk.c
#include <stdio.h>

int main(void)
{
	int i = 7;
	i &= -1ULL;
	printf("%d\n", i);
	i = 8;
	i &= -1ULL;
	printf("%d\n", i);
	i = 0;
	i &= -1ULL;
	printf("%d\n", i);
	i = -2;
	i &= -1ULL;
	printf("%d\n", i);
	return 0;
}
$ ./kk
7
8
0
-2



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-11-02  7:25     ` Juan Quintela
@ 2021-11-02  7:32       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 82+ messages in thread
From: Michael S. Tsirkin @ 2021-11-02  7:32 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Eduardo Habkost, Richard Henderson, qemu-devel,
	Markus Armbruster, Eugenio Pérez, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Tue, Nov 02, 2021 at 08:25:27AM +0100, Juan Quintela wrote:
> Eugenio Pérez <eperezma@redhat.com> wrote:
> > The -1 assumes that all devices with no cvq have an spare vq allocated
> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> > case, and the device may have a pair number of queues.
>                                   ^^^^
> even
> 
> I know, I know, I am Spanish myself O:-)

Nobody expects the Spanish ;)

> > To fix this, just resort to the lower even number of queues.
> 
> I don't understand what you try to achieve here.
> 
> > Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> > virtio device")
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/net/vhost_net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 0d888f29a6..edf56a597f 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >      NetClientState *peer;
> >  
> >      if (!cvq) {
> > -        last_index -= 1;
> > +        last_index &= ~1ULL;
> >      }
> 
> As far as I can see, that is a nop. last_index is defined as an int.
> 
> $ cat kk.c
> #include <stdio.h>
> 
> int main(void)
> {
> 	int i = 7;
> 	i &= -1ULL;

Stefano's patch has ~1ULL , not -1ULL here.

> 	printf("%d\n", i);
> 	i = 8;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	i = 0;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	i = -2;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	return 0;
> }
> $ ./kk
> 7
> 8
> 0
> -2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
@ 2021-11-02  7:32       ` Michael S. Tsirkin
  0 siblings, 0 replies; 82+ messages in thread
From: Michael S. Tsirkin @ 2021-11-02  7:32 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Eduardo Habkost, Jason Wang, Richard Henderson,
	qemu-devel, Peter Xu, Markus Armbruster, Eugenio Pérez,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Tue, Nov 02, 2021 at 08:25:27AM +0100, Juan Quintela wrote:
> Eugenio Pérez <eperezma@redhat.com> wrote:
> > The -1 assumes that all devices with no cvq have an spare vq allocated
> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> > case, and the device may have a pair number of queues.
>                                   ^^^^
> even
> 
> I know, I know, I am Spanish myself O:-)

Nobody expects the Spanish ;)

> > To fix this, just resort to the lower even number of queues.
> 
> I don't understand what you try to achieve here.
> 
> > Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> > virtio device")
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/net/vhost_net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 0d888f29a6..edf56a597f 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >      NetClientState *peer;
> >  
> >      if (!cvq) {
> > -        last_index -= 1;
> > +        last_index &= ~1ULL;
> >      }
> 
> As far as I can see, that is a nop. last_index is defined as an int.
> 
> $ cat kk.c
> #include <stdio.h>
> 
> int main(void)
> {
> 	int i = 7;
> 	i &= -1ULL;

Stefano's patch has ~1ULL , not -1ULL here.

> 	printf("%d\n", i);
> 	i = 8;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	i = 0;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	i = -2;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	return 0;
> }
> $ ./kk
> 7
> 8
> 0
> -2



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue
  2021-11-02  5:36     ` Jason Wang
  (?)
@ 2021-11-02  7:35     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02  7:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 6:36 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > +/**
> > + * Enable or disable shadow virtqueue in a vhost vdpa device.
> > + *
> > + * This function is idempotent, to call it many times with the same value for
> > + * enable_svq will simply return success.
> > + *
> > + * @v       Vhost vdpa device
> > + * @enable  True to set SVQ mode
> > + * @errp    Error pointer
> > + */
> > +void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
> > +{
>
>
> What happens if vhost_vpda is not stated when we try to enable svq?
> Another note is that, the vhost device could be stopped and started
> after svq is enabled/disabled. We need to deal with them.
>

Right, I didn't append it to the TODO list but it is in development.

Thanks!

> Thanks
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp
  2021-10-29 18:35 ` [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp Eugenio Pérez
@ 2021-11-02  7:36     ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:36 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> Command to set shadow virtqueue mode.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

You need to keep care of:

 Markus Armbruster      ] [PATCH v2 0/9] Configurable policy for handling unstable interfaces

When this hit the tree, you need to drop the x- and mark it as unstable.

Later, Juan.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp
@ 2021-11-02  7:36     ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:36 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-devel, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> Command to set shadow virtqueue mode.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

You need to keep care of:

 Markus Armbruster      ] [PATCH v2 0/9] Configurable policy for handling unstable interfaces

When this hit the tree, you need to drop the x- and mark it as unstable.

Later, Juan.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-11-02  7:32       ` Michael S. Tsirkin
@ 2021-11-02  7:39         ` Juan Quintela
  -1 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Laurent Vivier, Eduardo Habkost, Richard Henderson, qemu-devel,
	Markus Armbruster, Eugenio Pérez, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 02, 2021 at 08:25:27AM +0100, Juan Quintela wrote:
>> Eugenio Pérez <eperezma@redhat.com> wrote:
>> > The -1 assumes that all devices with no cvq have an spare vq allocated
>> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
>> > case, and the device may have a pair number of queues.
>>                                   ^^^^
>> even
>> 
>> I know, I know, I am Spanish myself O:-)
>
> Nobody expects the Spanish ;)

O:-)

>> int main(void)
>> {
>> 	int i = 7;
>> 	i &= -1ULL;
>
> Stefano's patch has ~1ULL , not -1ULL here.
>

Stupid eyes.

Thanks, Juan.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
@ 2021-11-02  7:39         ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Laurent Vivier, Eduardo Habkost, Jason Wang, Richard Henderson,
	qemu-devel, Peter Xu, Markus Armbruster, Eugenio Pérez,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 02, 2021 at 08:25:27AM +0100, Juan Quintela wrote:
>> Eugenio Pérez <eperezma@redhat.com> wrote:
>> > The -1 assumes that all devices with no cvq have an spare vq allocated
>> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
>> > case, and the device may have a pair number of queues.
>>                                   ^^^^
>> even
>> 
>> I know, I know, I am Spanish myself O:-)
>
> Nobody expects the Spanish ;)

O:-)

>> int main(void)
>> {
>> 	int i = 7;
>> 	i &= -1ULL;
>
> Stefano's patch has ~1ULL , not -1ULL here.
>

Stupid eyes.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-10-29 18:35 ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Eugenio Pérez
@ 2021-11-02  7:40     ` Juan Quintela
  2021-11-02  7:40     ` Juan Quintela
  1 sibling, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:40 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> The -1 assumes that all devices with no cvq have an spare vq allocated
> for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> case, and the device may have a pair number of queues.
>
> To fix this, just resort to the lower even number of queues.
>
> Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> virtio device")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
@ 2021-11-02  7:40     ` Juan Quintela
  0 siblings, 0 replies; 82+ messages in thread
From: Juan Quintela @ 2021-11-02  7:40 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-devel, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> The -1 assumes that all devices with no cvq have an spare vq allocated
> for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> case, and the device may have a pair number of queues.
>
> To fix this, just resort to the lower even number of queues.
>
> Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> virtio device")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
  2021-10-29 18:35 ` [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ Eugenio Pérez
@ 2021-11-02  7:54     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  7:54 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> If device supports host notifiers, this makes one jump less (kernel) to
> deliver SVQ notifications to it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 30ab9643b9..eb0a54f954 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> +
>   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>                        VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fda60d11db..e3dcc039b6 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
>        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>        */
>       EventNotifier svq_kick;
> +
> +    /* Device's host notifier memory region. NULL means no region */
> +    void *host_notifier_mr;
> +
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
>   } VhostShadowVirtqueue;
>   
>   /**
> @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>           return;
>       }
>   
> -    event_notifier_set(&svq->hdev_kick);
> +    if (svq->host_notifier_mr) {
> +        uint16_t *mr = svq->host_notifier_mr;
> +        *mr = virtio_get_queue_index(svq->vq);


Do we need barriers around the possible MMIO here?

To avoid those complicated stuff, I'd rather simply go with eventfd path.

Note mmio and eventfd are not mutually exclusive.

Thanks


> +    } else {
> +        event_notifier_set(&svq->hdev_kick);
> +    }
> +}
> +
> +/*
> + * Set the device's memory region notifier. addr = NULL clear it.
> + */
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +{
> +    svq->host_notifier_mr = addr;
>   }
>   
>   /**
> @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>    */
>   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>   {
> +    int vq_idx = dev->vq_index + idx;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
>   
> @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>           goto err_init_hdev_call;
>       }
>   
> +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
@ 2021-11-02  7:54     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  7:54 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Juan Quintela,
	Richard Henderson, Stefan Hajnoczi, Peter Xu, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> If device supports host notifiers, this makes one jump less (kernel) to
> deliver SVQ notifications to it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 30ab9643b9..eb0a54f954 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> +
>   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>                        VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fda60d11db..e3dcc039b6 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
>        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>        */
>       EventNotifier svq_kick;
> +
> +    /* Device's host notifier memory region. NULL means no region */
> +    void *host_notifier_mr;
> +
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
>   } VhostShadowVirtqueue;
>   
>   /**
> @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>           return;
>       }
>   
> -    event_notifier_set(&svq->hdev_kick);
> +    if (svq->host_notifier_mr) {
> +        uint16_t *mr = svq->host_notifier_mr;
> +        *mr = virtio_get_queue_index(svq->vq);


Do we need barriers around the possible MMIO here?

To avoid those complicated stuff, I'd rather simply go with eventfd path.

Note mmio and eventfd are not mutually exclusive.

Thanks


> +    } else {
> +        event_notifier_set(&svq->hdev_kick);
> +    }
> +}
> +
> +/*
> + * Set the device's memory region notifier. addr = NULL clear it.
> + */
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +{
> +    svq->host_notifier_mr = addr;
>   }
>   
>   /**
> @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>    */
>   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>   {
> +    int vq_idx = dev->vq_index + idx;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
>   
> @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>           goto err_init_hdev_call;
>       }
>   
> +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding
  2021-10-29 18:35 ` [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
@ 2021-11-02  7:59     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  7:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> are no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review. Later commits add simpler
> ones.
>
> However to forwarding buffers have some particular pieces: One of the
> most unexpected ones is that a guest's buffer can expand through more
> than one descriptor in SVQ. While this is handled gracefully by qemu's
> emulated virtio devices, it may cause unexpected SVQ queue full. This
> patch also solves it checking for this condition at both guest's kicks
> and device's calls. The code may be more elegant in the future if SVQ
> code runs in its own iocontext.
>
> Note that vhost_vdpa_get_vq_state trust the device to write its status
> to used_idx at pause(), finishing all in-flight descriptors. This may
> not be enough for complex devices, but other development like usage of
> inflight_fd on top of this solution may extend the usage in the future.
>
> In particular, SVQ trust it to recover guest's virtqueue at start, and
> to mark as used the latest descriptors used by the device in the
> meantime.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   qapi/net.json                      |   5 +-
>   hw/virtio/vhost-shadow-virtqueue.c | 400 +++++++++++++++++++++++++++--
>   hw/virtio/vhost-vdpa.c             | 144 ++++++++++-
>   3 files changed, 521 insertions(+), 28 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index fca2f6ebca..1c6d3b2179 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -84,12 +84,9 @@
>   #
>   # Use vhost shadow virtqueue.
>   #
> -# SVQ can just forward notifications between the device and the guest at this
> -# moment. This will expand in future changes.
> -#
>   # @name: the device name of the VirtIO device
>   #
> -# @set: true to use the alternate shadow VQ notifications
> +# @set: true to use the alternate shadow VQ
>   #
>   # Since: 6.2
>   #
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index cb9ffcb015..ad1b2342be 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,6 +9,9 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
> +
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -45,6 +48,27 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Virtio device */
>       VirtIODevice *vdev;
> +
> +    /* Map for returning guest's descriptors */
> +    VirtQueueElement **ring_id_maps;
> +
> +    /* Next VirtQueue element that guest made available */
> +    VirtQueueElement *next_guest_avail_elem;
> +
> +    /* Next head to expose to device */
> +    uint16_t avail_idx_shadow;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Last seen used idx */
> +    uint16_t shadow_used_idx;
> +
> +    /* Next head to consume from device */
> +    uint16_t last_used_idx;
> +
> +    /* Cache for the exposed notification flag */
> +    bool notification;
>   } VhostShadowVirtqueue;
>   
>   /**
> @@ -56,25 +80,174 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> -/* If the device is using some of these, SVQ cannot communicate */
> +/**
> + * VirtIO transport device feature acknowledge
> + *
> + * @dev_features  The device features. If success, the acknowledged features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
>   bool vhost_svq_valid_device_features(uint64_t *dev_features)
>   {
> -    return true;
> +    uint64_t b;
> +    bool r = true;
> +
> +    for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
> +
> +        case VIRTIO_F_ACCESS_PLATFORM:
> +            /* SVQ does not know how to translate addresses */
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +                r = false;
> +            }
> +            break;
> +
> +        case VIRTIO_F_VERSION_1:
> +            /* SVQ trust that guest vring is little endian */
> +            if (!(*dev_features & BIT_ULL(b))) {
> +                set_bit(b, dev_features);
> +                r = false;
> +            }
> +            continue;
> +
> +        default:
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +            }
> +        }
> +    }
> +
> +    return r;
>   }
>   
> -/* If the guest is using some of these, SVQ cannot communicate */
> +/**
> + * Check of guest's acknowledge features.
> + *
> + * @guest_features  The guest's acknowledged features
> + *
> + * Returns true if SVQ can handle them, false otherwise.
> + */
>   bool vhost_svq_valid_guest_features(uint64_t *guest_features)
>   {
> -    return true;
> +    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
> +                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
> +
> +    /* These transport features are handled by VirtQueue */
> +    static const uint64_t valid = (BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) |
> +                                   BIT_ULL(VIRTIO_F_VERSION_1));
> +
> +    /* We are only interested in transport-related feature bits */
> +    uint64_t guest_transport_features = (*guest_features) & transport;
> +
> +    *guest_features &= (valid | ~transport);
> +    return !(guest_transport_features & (transport ^ valid));
>   }
>   
> -/* Forward guest notifications */
> -static void vhost_handle_guest_kick(EventNotifier *n)
> +/**
> + * Number of descriptors that SVQ can make available from the guest.
> + *
> + * @svq   The svq
> + */
> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>   {
> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> -                                             svq_kick);
> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> +}
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> +    uint16_t notification_flag;
> +
> +    if (svq->notification == enable) {
> +        return;
> +    }
> +
> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> +    svq->notification = enable;
> +    if (enable) {
> +        svq->vring.avail->flags &= ~notification_flag;
> +    } else {
> +        svq->vring.avail->flags |= notification_flag;
> +    }
> +}
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    const struct iovec *iovec,
> +                                    size_t num, bool more_descs, bool write)
> +{
> +    uint16_t i = svq->free_head, last = svq->free_head;
> +    unsigned n;
> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> +    vring_desc_t *descs = svq->vring.desc;
> +
> +    if (num == 0) {
> +        return;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        if (more_descs || (n + 1 < num)) {
> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> +        last = i;
> +        i = cpu_to_le16(descs[i].next);
> +    }
> +
> +    svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                    VirtQueueElement *elem)
> +{
> +    int head;
> +    unsigned avail_idx;
> +    vring_avail_t *avail = svq->vring.avail;
> +
> +    head = svq->free_head;
> +
> +    /* We need some descriptors here */
> +    assert(elem->out_num || elem->in_num);
> +
> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +                            elem->in_num > 0, false);
> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +    /*
> +     * Put entry in available array (but don't update avail->idx until they
> +     * do sync).
> +     */
> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> +    avail->ring[avail_idx] = cpu_to_le16(head);
> +    svq->avail_idx_shadow++;
> +
> +    /* Update avail index after the descriptor is wrote */
> +    smp_wmb();


A question, since we may talk with the real hardware, is smp_wmb() 
sufficient in this case or do we need to honer VIRTIO_F_ORDER_PLATFORM?


> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> +    return head;
> +
> +}
> +
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> +    svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* We need to expose available array entries before checking used flags */
> +    smp_mb();
> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>           return;
>       }
>   
> @@ -86,25 +259,188 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>       }
>   }
>   
> -/*
> - * Set the device's memory region notifier. addr = NULL clear it.
> +/**
> + * Forward available buffers.
> + *
> + * @svq Shadow VirtQueue
> + *
> + * Note that this function does not guarantee that all guest's available
> + * buffers are available to the device in SVQ avail ring. The guest may have
> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> + * vaddr.
> + *
> + * If that happens, guest's kick notifications will be disabled until device
> + * makes some buffers used.
>    */
> -void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   {
> -    svq->host_notifier_mr = addr;
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(&svq->svq_kick);
> +
> +    /* Make available as many buffers as possible */
> +    do {
> +        if (virtio_queue_get_notification(svq->vq)) {
> +            virtio_queue_set_notification(svq->vq, false);
> +        }
> +
> +        while (true) {
> +            VirtQueueElement *elem;
> +
> +            if (svq->next_guest_avail_elem) {
> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +            } else {
> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> +            }
> +
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (elem->out_num + elem->in_num >
> +                vhost_svq_available_slots(svq)) {
> +                /*
> +                 * This condition is possible since a contiguous buffer in GPA
> +                 * does not imply a contiguous buffer in qemu's VA
> +                 * scatter-gather segments. If that happen, the buffer exposed
> +                 * to the device needs to be a chain of descriptors at this
> +                 * moment.
> +                 *
> +                 * SVQ cannot hold more available buffers if we are here:
> +                 * queue the current guest descriptor and ignore further kicks
> +                 * until some elements are used.
> +                 */


I wonder what's the advantage of tracking the pending elem like this. It 
looks to me we can simply rewind last_avail_idx in this case?


> +                svq->next_guest_avail_elem = elem;
> +                return;
> +            }
> +
> +            vhost_svq_add(svq, elem);
> +            vhost_svq_kick(svq);
> +        }
> +
> +        virtio_queue_set_notification(svq->vq, true);
> +    } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +/**
> + * Handle guest's kick.
> + *
> + * @n guest kick event notifier, the one that guest set to notify svq.
> + */
> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +    vhost_handle_guest_kick(svq);
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> +        return true;
> +    }
> +
> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> +    return svq->last_used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> +    vring_desc_t *descs = svq->vring.desc;
> +    const vring_used_t *used = svq->vring.used;
> +    vring_used_elem_t used_elem;
> +    uint16_t last_used;
> +
> +    if (!vhost_svq_more_used(svq)) {
> +        return NULL;
> +    }
> +
> +    /* Only get used array entries after they have been exposed by dev */
> +    smp_rmb();
> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> +    svq->last_used_idx++;
> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> +        error_report("Device %s says index %u is used", svq->vdev->name,
> +                     used_elem.id);
> +        return NULL;
> +    }
> +
> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +        error_report(
> +            "Device %s says index %u is used, but it was not available",
> +            svq->vdev->name, used_elem.id);
> +        return NULL;
> +    }
> +
> +    descs[used_elem.id].next = svq->free_head;
> +    svq->free_head = used_elem.id;
> +
> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>   }
>   
> -/* Forward vhost notifications */
> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                            bool check_for_avail_queue)
> +{
> +    VirtQueue *vq = svq->vq;
> +
> +    /* Make as many buffers as possible used. */
> +    do {
> +        unsigned i = 0;
> +
> +        vhost_svq_set_notification(svq, false);
> +        while (true) {
> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (unlikely(i >= svq->vring.num)) {
> +                virtio_error(svq->vdev,
> +                         "More than %u used buffers obtained in a %u size SVQ",
> +                         i, svq->vring.num);
> +                virtqueue_fill(vq, elem, elem->len, i);
> +                virtqueue_flush(vq, i);
> +                i = 0;
> +            }
> +            virtqueue_fill(vq, elem, elem->len, i++);
> +        }
> +
> +        virtqueue_flush(vq, i);
> +        event_notifier_set(&svq->svq_call);
> +
> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> +            /*
> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> +             * good moment to make more descriptors available if possible
> +             */
> +            vhost_handle_guest_kick(svq);
> +        }
> +
> +        vhost_svq_set_notification(svq, true);
> +    } while (vhost_svq_more_used(svq));


So this actually doesn't make sure all the buffers were processed by the 
device? Is this intended (I see it was called by the vhost_svq_stop()).

Note that it means some buffers might not be submitted to the device 
after migration?


> +}
> +
> +/**
> + * Forward used buffers.
> + *
> + * @n hdev call event notifier, the one that device set to notify svq.
> + *
> + * Note that we are not making any buffers available in the loop, there is no
> + * way that it runs more than virtqueue size times.
> + */
>   static void vhost_svq_handle_call(EventNotifier *n)
>   {
>       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>                                                hdev_call);
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> -        return;
> -    }
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(n);
>   
> -    event_notifier_set(&svq->svq_call);
> +    vhost_svq_flush(svq, true);
>   }
>   
>   /*
> @@ -132,6 +468,14 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
>       event_notifier_init_fd(&svq->svq_call, call_fd);
>   }
>   
> +/*
> + * Set the device's memory region notifier. addr = NULL clear it.
> + */
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +{
> +    svq->host_notifier_mr = addr;
> +}
> +
>   /*
>    * Get the shadow vq vring address.
>    * @svq Shadow virtqueue
> @@ -185,7 +529,8 @@ static void vhost_svq_set_svq_kick_fd_internal(VhostShadowVirtqueue *svq,
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> +    event_notifier_set_handler(&svq->svq_kick,
> +                               vhost_handle_guest_kick_notifier);
>   
>       /*
>        * !check_old means that we are starting SVQ, taking the descriptor from
> @@ -233,7 +578,16 @@ void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>                       VhostShadowVirtqueue *svq)
>   {
> +    unsigned i;
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    vhost_svq_flush(svq, false);
> +
> +    for (i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
>   }
>   
>   /*
> @@ -248,7 +602,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>       size_t driver_size;
>       size_t device_size;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> -    int r;
> +    int r, i;
>   
>       r = event_notifier_init(&svq->hdev_kick, 0);
>       if (r != 0) {
> @@ -274,6 +628,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> +    for (i = 0; i < num - 1; i++) {
> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +    }
> +
> +    svq->ring_id_maps = g_new0(VirtQueueElement *, num);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -292,6 +651,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq->ring_id_maps);
>       qemu_vfree(vq->vring.desc);
>       qemu_vfree(vq->vring.used);
>       g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index fc8396ba8a..e1c55e43e7 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
>   #include "hw/virtio/virtio-net.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost-vdpa.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "exec/address-spaces.h"
>   #include "qemu/main-loop.h"
>   #include "cpu.h"
> @@ -821,6 +822,19 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
> +{
> +    int r;
> +    uint8_t status;
> +
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
> +    do {
> +        r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> +    } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
> +
> +    return 0;
> +}
> +
>   /*
>    * Start or stop a shadow virtqueue in a vdpa device
>    *
> @@ -844,7 +858,14 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           .index = vq_index,
>       };
>       struct vhost_vring_file vhost_call_file = {
> -        .index = idx + dev->vq_index,
> +        .index = vq_index,
> +    };
> +    struct vhost_vring_addr addr = {
> +        .index = vq_index,
> +    };
> +    struct vhost_vring_state num = {
> +        .index = vq_index,
> +        .num = virtio_queue_get_num(dev->vdev, vq_index),
>       };
>       int r;
>   
> @@ -852,6 +873,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
>           const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
>   
> +        vhost_svq_get_vring_addr(svq, &addr);
>           if (n->addr) {
>               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
>                                                     false);
> @@ -870,8 +892,20 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
>           vhost_call_file.fd = event_notifier_get_fd(vhost_call);
>       } else {
> +        struct vhost_vring_state state = {
> +            .index = vq_index,
> +        };
> +
>           vhost_svq_stop(dev, idx, svq);
>   
> +        state.num = virtio_queue_get_last_avail_idx(dev->vdev, idx);
> +        r = vhost_vdpa_set_vring_base(dev, &state);
> +        if (unlikely(r)) {
> +            error_setg_errno(errp, -r, "vhost_set_vring_base failed");
> +            return false;
> +        }
> +
> +        vhost_vdpa_vq_get_addr(dev, &addr, &dev->vqs[idx]);
>           if (n->addr) {
>               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
>                                                     true);
> @@ -885,6 +919,17 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           vhost_call_file.fd = v->call_fd[idx];
>       }
>   
> +    r = vhost_vdpa_set_vring_addr(dev, &addr);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "vhost_set_vring_addr failed");
> +        return false;
> +    }
> +    r = vhost_vdpa_set_vring_num(dev, &num);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "vhost_set_vring_num failed");
> +        return false;
> +    }
> +
>       r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
>       if (unlikely(r)) {
>           error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
> @@ -899,6 +944,50 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>       return true;
>   }
>   
> +static void vhost_vdpa_get_vq_state(struct vhost_dev *dev, unsigned idx)
> +{
> +    struct VirtIODevice *vdev = dev->vdev;
> +
> +    virtio_queue_restore_last_avail_idx(vdev, idx);
> +    virtio_queue_invalidate_signalled_used(vdev, idx);
> +    virtio_queue_update_used_idx(vdev, idx);
> +}


Do we need to change vhost_vdpa_get_vring_base() to return 
vq->last_avail_idx as well?

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding
@ 2021-11-02  7:59     ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-02  7:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin, Juan Quintela,
	Richard Henderson, Stefan Hajnoczi, Peter Xu, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> are no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review. Later commits add simpler
> ones.
>
> However to forwarding buffers have some particular pieces: One of the
> most unexpected ones is that a guest's buffer can expand through more
> than one descriptor in SVQ. While this is handled gracefully by qemu's
> emulated virtio devices, it may cause unexpected SVQ queue full. This
> patch also solves it checking for this condition at both guest's kicks
> and device's calls. The code may be more elegant in the future if SVQ
> code runs in its own iocontext.
>
> Note that vhost_vdpa_get_vq_state trust the device to write its status
> to used_idx at pause(), finishing all in-flight descriptors. This may
> not be enough for complex devices, but other development like usage of
> inflight_fd on top of this solution may extend the usage in the future.
>
> In particular, SVQ trust it to recover guest's virtqueue at start, and
> to mark as used the latest descriptors used by the device in the
> meantime.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   qapi/net.json                      |   5 +-
>   hw/virtio/vhost-shadow-virtqueue.c | 400 +++++++++++++++++++++++++++--
>   hw/virtio/vhost-vdpa.c             | 144 ++++++++++-
>   3 files changed, 521 insertions(+), 28 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index fca2f6ebca..1c6d3b2179 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -84,12 +84,9 @@
>   #
>   # Use vhost shadow virtqueue.
>   #
> -# SVQ can just forward notifications between the device and the guest at this
> -# moment. This will expand in future changes.
> -#
>   # @name: the device name of the VirtIO device
>   #
> -# @set: true to use the alternate shadow VQ notifications
> +# @set: true to use the alternate shadow VQ
>   #
>   # Since: 6.2
>   #
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index cb9ffcb015..ad1b2342be 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,6 +9,9 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
> +
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -45,6 +48,27 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Virtio device */
>       VirtIODevice *vdev;
> +
> +    /* Map for returning guest's descriptors */
> +    VirtQueueElement **ring_id_maps;
> +
> +    /* Next VirtQueue element that guest made available */
> +    VirtQueueElement *next_guest_avail_elem;
> +
> +    /* Next head to expose to device */
> +    uint16_t avail_idx_shadow;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Last seen used idx */
> +    uint16_t shadow_used_idx;
> +
> +    /* Next head to consume from device */
> +    uint16_t last_used_idx;
> +
> +    /* Cache for the exposed notification flag */
> +    bool notification;
>   } VhostShadowVirtqueue;
>   
>   /**
> @@ -56,25 +80,174 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> -/* If the device is using some of these, SVQ cannot communicate */
> +/**
> + * VirtIO transport device feature acknowledge
> + *
> + * @dev_features  The device features. If success, the acknowledged features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
>   bool vhost_svq_valid_device_features(uint64_t *dev_features)
>   {
> -    return true;
> +    uint64_t b;
> +    bool r = true;
> +
> +    for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
> +
> +        case VIRTIO_F_ACCESS_PLATFORM:
> +            /* SVQ does not know how to translate addresses */
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +                r = false;
> +            }
> +            break;
> +
> +        case VIRTIO_F_VERSION_1:
> +            /* SVQ trust that guest vring is little endian */
> +            if (!(*dev_features & BIT_ULL(b))) {
> +                set_bit(b, dev_features);
> +                r = false;
> +            }
> +            continue;
> +
> +        default:
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +            }
> +        }
> +    }
> +
> +    return r;
>   }
>   
> -/* If the guest is using some of these, SVQ cannot communicate */
> +/**
> + * Check of guest's acknowledge features.
> + *
> + * @guest_features  The guest's acknowledged features
> + *
> + * Returns true if SVQ can handle them, false otherwise.
> + */
>   bool vhost_svq_valid_guest_features(uint64_t *guest_features)
>   {
> -    return true;
> +    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
> +                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
> +
> +    /* These transport features are handled by VirtQueue */
> +    static const uint64_t valid = (BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) |
> +                                   BIT_ULL(VIRTIO_F_VERSION_1));
> +
> +    /* We are only interested in transport-related feature bits */
> +    uint64_t guest_transport_features = (*guest_features) & transport;
> +
> +    *guest_features &= (valid | ~transport);
> +    return !(guest_transport_features & (transport ^ valid));
>   }
>   
> -/* Forward guest notifications */
> -static void vhost_handle_guest_kick(EventNotifier *n)
> +/**
> + * Number of descriptors that SVQ can make available from the guest.
> + *
> + * @svq   The svq
> + */
> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>   {
> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> -                                             svq_kick);
> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> +}
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> +    uint16_t notification_flag;
> +
> +    if (svq->notification == enable) {
> +        return;
> +    }
> +
> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> +    svq->notification = enable;
> +    if (enable) {
> +        svq->vring.avail->flags &= ~notification_flag;
> +    } else {
> +        svq->vring.avail->flags |= notification_flag;
> +    }
> +}
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    const struct iovec *iovec,
> +                                    size_t num, bool more_descs, bool write)
> +{
> +    uint16_t i = svq->free_head, last = svq->free_head;
> +    unsigned n;
> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> +    vring_desc_t *descs = svq->vring.desc;
> +
> +    if (num == 0) {
> +        return;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        if (more_descs || (n + 1 < num)) {
> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> +        last = i;
> +        i = cpu_to_le16(descs[i].next);
> +    }
> +
> +    svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                    VirtQueueElement *elem)
> +{
> +    int head;
> +    unsigned avail_idx;
> +    vring_avail_t *avail = svq->vring.avail;
> +
> +    head = svq->free_head;
> +
> +    /* We need some descriptors here */
> +    assert(elem->out_num || elem->in_num);
> +
> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +                            elem->in_num > 0, false);
> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +    /*
> +     * Put entry in available array (but don't update avail->idx until they
> +     * do sync).
> +     */
> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> +    avail->ring[avail_idx] = cpu_to_le16(head);
> +    svq->avail_idx_shadow++;
> +
> +    /* Update avail index after the descriptor is wrote */
> +    smp_wmb();


A question, since we may talk with the real hardware, is smp_wmb() 
sufficient in this case or do we need to honer VIRTIO_F_ORDER_PLATFORM?


> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> +    return head;
> +
> +}
> +
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> +    svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* We need to expose available array entries before checking used flags */
> +    smp_mb();
> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>           return;
>       }
>   
> @@ -86,25 +259,188 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>       }
>   }
>   
> -/*
> - * Set the device's memory region notifier. addr = NULL clear it.
> +/**
> + * Forward available buffers.
> + *
> + * @svq Shadow VirtQueue
> + *
> + * Note that this function does not guarantee that all guest's available
> + * buffers are available to the device in SVQ avail ring. The guest may have
> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> + * vaddr.
> + *
> + * If that happens, guest's kick notifications will be disabled until device
> + * makes some buffers used.
>    */
> -void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   {
> -    svq->host_notifier_mr = addr;
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(&svq->svq_kick);
> +
> +    /* Make available as many buffers as possible */
> +    do {
> +        if (virtio_queue_get_notification(svq->vq)) {
> +            virtio_queue_set_notification(svq->vq, false);
> +        }
> +
> +        while (true) {
> +            VirtQueueElement *elem;
> +
> +            if (svq->next_guest_avail_elem) {
> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +            } else {
> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> +            }
> +
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (elem->out_num + elem->in_num >
> +                vhost_svq_available_slots(svq)) {
> +                /*
> +                 * This condition is possible since a contiguous buffer in GPA
> +                 * does not imply a contiguous buffer in qemu's VA
> +                 * scatter-gather segments. If that happen, the buffer exposed
> +                 * to the device needs to be a chain of descriptors at this
> +                 * moment.
> +                 *
> +                 * SVQ cannot hold more available buffers if we are here:
> +                 * queue the current guest descriptor and ignore further kicks
> +                 * until some elements are used.
> +                 */


I wonder what's the advantage of tracking the pending elem like this. It 
looks to me we can simply rewind last_avail_idx in this case?


> +                svq->next_guest_avail_elem = elem;
> +                return;
> +            }
> +
> +            vhost_svq_add(svq, elem);
> +            vhost_svq_kick(svq);
> +        }
> +
> +        virtio_queue_set_notification(svq->vq, true);
> +    } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +/**
> + * Handle guest's kick.
> + *
> + * @n guest kick event notifier, the one that guest set to notify svq.
> + */
> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +    vhost_handle_guest_kick(svq);
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> +        return true;
> +    }
> +
> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> +    return svq->last_used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> +    vring_desc_t *descs = svq->vring.desc;
> +    const vring_used_t *used = svq->vring.used;
> +    vring_used_elem_t used_elem;
> +    uint16_t last_used;
> +
> +    if (!vhost_svq_more_used(svq)) {
> +        return NULL;
> +    }
> +
> +    /* Only get used array entries after they have been exposed by dev */
> +    smp_rmb();
> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> +    svq->last_used_idx++;
> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> +        error_report("Device %s says index %u is used", svq->vdev->name,
> +                     used_elem.id);
> +        return NULL;
> +    }
> +
> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +        error_report(
> +            "Device %s says index %u is used, but it was not available",
> +            svq->vdev->name, used_elem.id);
> +        return NULL;
> +    }
> +
> +    descs[used_elem.id].next = svq->free_head;
> +    svq->free_head = used_elem.id;
> +
> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>   }
>   
> -/* Forward vhost notifications */
> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                            bool check_for_avail_queue)
> +{
> +    VirtQueue *vq = svq->vq;
> +
> +    /* Make as many buffers as possible used. */
> +    do {
> +        unsigned i = 0;
> +
> +        vhost_svq_set_notification(svq, false);
> +        while (true) {
> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (unlikely(i >= svq->vring.num)) {
> +                virtio_error(svq->vdev,
> +                         "More than %u used buffers obtained in a %u size SVQ",
> +                         i, svq->vring.num);
> +                virtqueue_fill(vq, elem, elem->len, i);
> +                virtqueue_flush(vq, i);
> +                i = 0;
> +            }
> +            virtqueue_fill(vq, elem, elem->len, i++);
> +        }
> +
> +        virtqueue_flush(vq, i);
> +        event_notifier_set(&svq->svq_call);
> +
> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> +            /*
> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> +             * good moment to make more descriptors available if possible
> +             */
> +            vhost_handle_guest_kick(svq);
> +        }
> +
> +        vhost_svq_set_notification(svq, true);
> +    } while (vhost_svq_more_used(svq));


So this actually doesn't make sure all the buffers were processed by the 
device? Is this intended (I see it was called by the vhost_svq_stop()).

Note that it means some buffers might not be submitted to the device 
after migration?


> +}
> +
> +/**
> + * Forward used buffers.
> + *
> + * @n hdev call event notifier, the one that device set to notify svq.
> + *
> + * Note that we are not making any buffers available in the loop, there is no
> + * way that it runs more than virtqueue size times.
> + */
>   static void vhost_svq_handle_call(EventNotifier *n)
>   {
>       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>                                                hdev_call);
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> -        return;
> -    }
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(n);
>   
> -    event_notifier_set(&svq->svq_call);
> +    vhost_svq_flush(svq, true);
>   }
>   
>   /*
> @@ -132,6 +468,14 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
>       event_notifier_init_fd(&svq->svq_call, call_fd);
>   }
>   
> +/*
> + * Set the device's memory region notifier. addr = NULL clear it.
> + */
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +{
> +    svq->host_notifier_mr = addr;
> +}
> +
>   /*
>    * Get the shadow vq vring address.
>    * @svq Shadow virtqueue
> @@ -185,7 +529,8 @@ static void vhost_svq_set_svq_kick_fd_internal(VhostShadowVirtqueue *svq,
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> +    event_notifier_set_handler(&svq->svq_kick,
> +                               vhost_handle_guest_kick_notifier);
>   
>       /*
>        * !check_old means that we are starting SVQ, taking the descriptor from
> @@ -233,7 +578,16 @@ void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>                       VhostShadowVirtqueue *svq)
>   {
> +    unsigned i;
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    vhost_svq_flush(svq, false);
> +
> +    for (i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
>   }
>   
>   /*
> @@ -248,7 +602,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>       size_t driver_size;
>       size_t device_size;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> -    int r;
> +    int r, i;
>   
>       r = event_notifier_init(&svq->hdev_kick, 0);
>       if (r != 0) {
> @@ -274,6 +628,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> +    for (i = 0; i < num - 1; i++) {
> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +    }
> +
> +    svq->ring_id_maps = g_new0(VirtQueueElement *, num);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -292,6 +651,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq->ring_id_maps);
>       qemu_vfree(vq->vring.desc);
>       qemu_vfree(vq->vring.used);
>       g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index fc8396ba8a..e1c55e43e7 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
>   #include "hw/virtio/virtio-net.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost-vdpa.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "exec/address-spaces.h"
>   #include "qemu/main-loop.h"
>   #include "cpu.h"
> @@ -821,6 +822,19 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
> +{
> +    int r;
> +    uint8_t status;
> +
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
> +    do {
> +        r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> +    } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
> +
> +    return 0;
> +}
> +
>   /*
>    * Start or stop a shadow virtqueue in a vdpa device
>    *
> @@ -844,7 +858,14 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           .index = vq_index,
>       };
>       struct vhost_vring_file vhost_call_file = {
> -        .index = idx + dev->vq_index,
> +        .index = vq_index,
> +    };
> +    struct vhost_vring_addr addr = {
> +        .index = vq_index,
> +    };
> +    struct vhost_vring_state num = {
> +        .index = vq_index,
> +        .num = virtio_queue_get_num(dev->vdev, vq_index),
>       };
>       int r;
>   
> @@ -852,6 +873,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
>           const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
>   
> +        vhost_svq_get_vring_addr(svq, &addr);
>           if (n->addr) {
>               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
>                                                     false);
> @@ -870,8 +892,20 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
>           vhost_call_file.fd = event_notifier_get_fd(vhost_call);
>       } else {
> +        struct vhost_vring_state state = {
> +            .index = vq_index,
> +        };
> +
>           vhost_svq_stop(dev, idx, svq);
>   
> +        state.num = virtio_queue_get_last_avail_idx(dev->vdev, idx);
> +        r = vhost_vdpa_set_vring_base(dev, &state);
> +        if (unlikely(r)) {
> +            error_setg_errno(errp, -r, "vhost_set_vring_base failed");
> +            return false;
> +        }
> +
> +        vhost_vdpa_vq_get_addr(dev, &addr, &dev->vqs[idx]);
>           if (n->addr) {
>               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
>                                                     true);
> @@ -885,6 +919,17 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           vhost_call_file.fd = v->call_fd[idx];
>       }
>   
> +    r = vhost_vdpa_set_vring_addr(dev, &addr);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "vhost_set_vring_addr failed");
> +        return false;
> +    }
> +    r = vhost_vdpa_set_vring_num(dev, &num);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "vhost_set_vring_num failed");
> +        return false;
> +    }
> +
>       r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
>       if (unlikely(r)) {
>           error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
> @@ -899,6 +944,50 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>       return true;
>   }
>   
> +static void vhost_vdpa_get_vq_state(struct vhost_dev *dev, unsigned idx)
> +{
> +    struct VirtIODevice *vdev = dev->vdev;
> +
> +    virtio_queue_restore_last_avail_idx(vdev, idx);
> +    virtio_queue_invalidate_signalled_used(vdev, idx);
> +    virtio_queue_update_used_idx(vdev, idx);
> +}


Do we need to change vhost_vdpa_get_vring_base() to return 
vq->last_avail_idx as well?

Thanks



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2021-11-02  5:25     ` Jason Wang
  (?)
@ 2021-11-02  8:09     ` Eugenio Perez Martin
  2021-11-03  3:18         ` Jason Wang
  -1 siblings, 1 reply; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02  8:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > This allows it to test if the guest has aknowledge an invalid transport
> > feature for SVQ. This will include packed vq layout or event_idx,
> > where VirtIO device needs help from SVQ.
> >
> > There is not needed at this moment, but since SVQ will not re-negotiate
> > features again with the guest, a failure in acknowledge them is fatal
> > for SVQ.
> >
>
> It's not clear to me why we need this. Maybe you can give me an
> example. E.g isn't it sufficient to filter out the device with
> event_idx?
>

If the guest did negotiate _F_EVENT_IDX, it expects to be notified
only when device marks as used a specific number of descriptors.

If we use VirtQueue notification, the VirtQueue code handles it
transparently. But if we want to be able to change the guest VQ's
call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
code. And that is still not implemented.

Of course in the event_idx case we could just ignore it and notify in
all used descriptors, but it seems not polite to me :). I will develop
event_idx on top of this, either exposing the needed pieces in
VirtQueue (I prefer this) or rolling our own in SVQ.

Same reasoning can be applied to unknown transport features.

Thanks!

> Thanks
>
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 946b2c6295..ac55588009 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -16,6 +16,7 @@
> >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> >  bool vhost_svq_valid_device_features(uint64_t *features);
> > +bool vhost_svq_valid_guest_features(uint64_t *features);
> >
> >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 6e0508a231..cb9ffcb015 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> >      return true;
> >  }
> >
> > +/* If the guest is using some of these, SVQ cannot communicate */
> > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > +{
> > +    return true;
> > +}
> > +
> >  /* Forward guest notifications */
> >  static void vhost_handle_guest_kick(EventNotifier *n)
> >  {
> > --
> > 2.27.0
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-11-02  6:35     ` Jason Wang
  (?)
@ 2021-11-02  8:28     ` Eugenio Perez Martin
  2021-11-03  3:10         ` Jason Wang
  -1 siblings, 1 reply; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02  8:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 7:35 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > This iova tree function allows it to look for a hole in allocated
> > regions and return a totally new translation for a given translated
> > address.
> >
> > It's usage is mainly to allow devices to access qemu address space,
> > remapping guest's one into a new iova space where qemu can add chunks of
> > addresses.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/qemu/iova-tree.h |  17 +++++
> >   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 156 insertions(+)
> >
> > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > index 8249edd764..33f9b2e13f 100644
> > --- a/include/qemu/iova-tree.h
> > +++ b/include/qemu/iova-tree.h
> > @@ -29,6 +29,7 @@
> >   #define  IOVA_OK           (0)
> >   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> >   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
>
>
> I think we need a better name other than "NOMEM", since it's actually
> means there's no sufficient hole for the range?
>

Actually, yes. I'm totally fine with changing it, but "the
inspiration" is that ENOMEM is also the error that malloc sets in
errno if not enough contiguous VM can be allocated.

What would be a more descriptive name?

>
> >
> >   typedef struct IOVATree IOVATree;
> >   typedef struct DMAMap {
> > @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
> >    */
> >   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> >
> > +/**
> > + * iova_tree_alloc:
> > + *
> > + * @tree: the iova tree to allocate from
> > + * @map: the new map (as translated addr & size) to allocate in iova region
> > + * @iova_begin: the minimum address of the allocation
> > + * @iova_end: the maximum addressable direction of the allocation
> > + *
> > + * Allocates a new region of a given size, between iova_min and iova_max.
> > + *
> > + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> > + * free contiguous range. Caller can get the assigned iova in map->iova.
> > + */
> > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > +                    hwaddr iova_end);
> > +
>
>
> "iova_tree_alloc_map" seems better.
>

Right, I changed in vhost but I forgot to change here.

>
> >   /**
> >    * iova_tree_destroy:
> >    *
> > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > index 23ea35b7a4..27c921c4e2 100644
> > --- a/util/iova-tree.c
> > +++ b/util/iova-tree.c
> > @@ -16,6 +16,36 @@ struct IOVATree {
> >       GTree *tree;
> >   };
> >
> > +/* Args to pass to iova_tree_alloc foreach function. */
> > +struct IOVATreeAllocArgs {
> > +    /* Size of the desired allocation */
> > +    size_t new_size;
> > +
> > +    /* The minimum address allowed in the allocation */
> > +    hwaddr iova_begin;
> > +
> > +    /* The last addressable allowed in the allocation */
> > +    hwaddr iova_last;
> > +
> > +    /* Previously-to-last iterated map, can be NULL in the first node */
> > +    const DMAMap *hole_left;
> > +
> > +    /* Last iterated map */
> > +    const DMAMap *hole_right;
>
>
> Any reason we can move those to IOVATree structure, it can simplify a
> lot of things.
>

I can move for the next version for sure, but then it needs to be
clear enough that these fields are alloc arguments.

>
> > +};
> > +
> > +/**
> > + * Iterate args to tne next hole

s/tne/the/

> > + *
> > + * @args  The alloc arguments
> > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > + */
> > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > +                                         const DMAMap *next) {
> > +    args->hole_left = args->hole_right;
> > +    args->hole_right = next;
> > +}
> > +
> >   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
> >   {
> >       const DMAMap *m1 = a, *m2 = b;
> > @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> >       return IOVA_OK;
> >   }
> >
> > +/**
> > + * Try to accomodate a map of size ret->size in a hole between
> > + * max(end(hole_left), iova_start).
> > + *
> > + * @args Arguments to allocation
> > + */
> > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > +{
> > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > +    uint64_t hole_start, hole_last;
> > +
> > +    if (right && right->iova + right->size < args->iova_begin) {
> > +        return false;
> > +    }
> > +
> > +    if (left && left->iova > args->iova_last) {
> > +        return false;
> > +    }
> > +
> > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> > +
> > +    if (hole_last - hole_start > args->new_size) {
> > +        /* We found a valid hole. */
> > +        return true;
> > +    }
> > +
> > +    /* Keep iterating */
> > +    return false;
> > +}
> > +
> > +/**
> > + * Foreach dma node in the tree, compare if there is a hole wit its previous
> > + * node (or minimum iova address allowed) and the node.
> > + *
> > + * @key   Node iterating
> > + * @value Node iterating
> > + * @pargs Struct to communicate with the outside world
> > + *
> > + * Return: false to keep iterating, true if needs break.
> > + */
> > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > +                                         gpointer pargs)
> > +{
> > +    struct IOVATreeAllocArgs *args = pargs;
> > +    DMAMap *node = value;
> > +
> > +    assert(key == value);
> > +
> > +    iova_tree_alloc_args_iterate(args, node);
> > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> > +        return true;
> > +    }
> > +
> > +    if (iova_tree_alloc_map_in_hole(args)) {
> > +        return true;
> > +    }
> > +
> > +    return false;
> > +}
> > +
> > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > +                    hwaddr iova_last)
> > +{
> > +    struct IOVATreeAllocArgs args = {
> > +        .new_size = map->size,
> > +        .iova_begin = iova_begin,
> > +        .iova_last = iova_last,
> > +    };
> > +
> > +    if (iova_begin == 0) {
> > +        /* Some devices does not like addr 0 */
> > +        iova_begin += qemu_real_host_page_size;
> > +    }
> > +
> > +    assert(iova_begin < iova_last);
> > +
> > +    /*
> > +     * Find a valid hole for the mapping
> > +     *
> > +     * Assuming low iova_begin, so no need to do a binary search to
> > +     * locate the first node.
> > +     *
> > +     * TODO: We can improve the search speed if we save the beginning and the
> > +     * end of holes, so we don't iterate over the previous saved ones.
> > +     *
> > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > +     * code a lot.
>
>
> To say the truth, the codes in iova_tree_alloc_traverse() is hard to be
> reviewed. I think it would be easy to use first/next/last. What we
> really need is to calculate the hole between two ranges with handmade
> first, last.
>

I totally agree on that, but we don't have first/next/last in GTree
until glib 2.68. Can we raise the minimum version required?

Another possibility that comes to my mind is to either have a list /
tree of free regions, or directly a custom allocator for this.

> Thanks
>
>
> > +     *
> > +     */
> > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > +    if (!iova_tree_alloc_map_in_hole(&args)) {
> > +        /*
> > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > +         * (right, end) hole needs to be checked too
> > +         */
> > +        iova_tree_alloc_args_iterate(&args, NULL);
> > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > +            return IOVA_ERR_NOMEM;
> > +        }
> > +    }
> > +
> > +    map->iova = MAX(iova_begin,
> > +                    args.hole_left ?
> > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > +    return iova_tree_insert(tree, map);
> > +}
> > +
> >   void iova_tree_destroy(IOVATree *tree)
> >   {
> >       g_tree_destroy(tree->tree);
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp
  2021-11-02  7:36     ` Juan Quintela
  (?)
@ 2021-11-02  8:29     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02  8:29 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-level, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Tue, Nov 2, 2021 at 8:36 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Eugenio Pérez <eperezma@redhat.com> wrote:
> > Command to set shadow virtqueue mode.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> You need to keep care of:
>
>  Markus Armbruster      ] [PATCH v2 0/9] Configurable policy for handling unstable interfaces
>
> When this hit the tree, you need to drop the x- and mark it as unstable.
>

Oh, very good point, I will take it into account for the next revision.

Thanks!

> Later, Juan.
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-11-02  7:25     ` Juan Quintela
  (?)
  (?)
@ 2021-11-02  8:34     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02  8:34 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Richard Henderson, qemu-level, Peter Xu, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Tue, Nov 2, 2021 at 8:25 AM Juan Quintela <quintela@redhat.com> wrote:
>
> Eugenio Pérez <eperezma@redhat.com> wrote:
> > The -1 assumes that all devices with no cvq have an spare vq allocated
> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> > case, and the device may have a pair number of queues.
>                                   ^^^^
> even
>
> I know, I know, I am Spanish myself O:-)
>

Ouch! Good catch! :).

> > To fix this, just resort to the lower even number of queues.
>
> I don't understand what you try to achieve here.
>
> > Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> > virtio device")
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/net/vhost_net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 0d888f29a6..edf56a597f 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >      NetClientState *peer;
> >
> >      if (!cvq) {
> > -        last_index -= 1;
> > +        last_index &= ~1ULL;
> >      }
>
> As far as I can see, that is a nop. last_index is defined as an int.
>
> $ cat kk.c
> #include <stdio.h>
>
> int main(void)
> {
>         int i = 7;
>         i &= -1ULL;
>         printf("%d\n", i);
>         i = 8;
>         i &= -1ULL;
>         printf("%d\n", i);
>         i = 0;
>         i &= -1ULL;
>         printf("%d\n", i);
>         i = -2;
>         i &= -1ULL;
>         printf("%d\n", i);
>         return 0;
> }
> $ ./kk
> 7
> 8
> 0
> -2
>

(Already answered by MST in another thread, but to be consistent here)
it's actually a ~1ULL :).

I will rewrite the patch message,

Thanks!



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
  2021-11-02  7:54     ` Jason Wang
  (?)
@ 2021-11-02  8:46     ` Eugenio Perez Martin
       [not found]       ` <CACGkMEvOxUMo1WA4tUfDhw+FOJVW87JJGPw=U3JvUSQTU_ogWQ@mail.gmail.com>
  -1 siblings, 1 reply; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02  8:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 8:55 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > If device supports host notifiers, this makes one jump less (kernel) to
> > deliver SVQ notifications to it.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> >   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
> >   2 files changed, 24 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 30ab9643b9..eb0a54f954 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >                                                 const VhostShadowVirtqueue *svq);
> > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> > +
> >   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> >                        VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index fda60d11db..e3dcc039b6 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
> >        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> >        */
> >       EventNotifier svq_kick;
> > +
> > +    /* Device's host notifier memory region. NULL means no region */
> > +    void *host_notifier_mr;
> > +
> > +    /* Virtio queue shadowing */
> > +    VirtQueue *vq;
> >   } VhostShadowVirtqueue;
> >
> >   /**
> > @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> >           return;
> >       }
> >
> > -    event_notifier_set(&svq->hdev_kick);
> > +    if (svq->host_notifier_mr) {
> > +        uint16_t *mr = svq->host_notifier_mr;
> > +        *mr = virtio_get_queue_index(svq->vq);
>
>
> Do we need barriers around the possible MMIO here?

That's right, I missed them.

>
> To avoid those complicated stuff, I'd rather simply go with eventfd path.
>
> Note mmio and eventfd are not mutually exclusive.

Actually we cannot ignore them since they are set in the guest. If SVQ
does nothing about them, the guest's notification will travel directly
to the vdpa device, and SVQ cannot intercept them.

Taking that into account, it's actually less changes to move them to
SVQ (like in this series) than to disable them (like in previous
series). But we can go with disabling them for sure.

Thanks!

>
> Thanks
>
>
> > +    } else {
> > +        event_notifier_set(&svq->hdev_kick);
> > +    }
> > +}
> > +
> > +/*
> > + * Set the device's memory region notifier. addr = NULL clear it.
> > + */
> > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> > +{
> > +    svq->host_notifier_mr = addr;
> >   }
> >
> >   /**
> > @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> >    */
> >   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> >   {
> > +    int vq_idx = dev->vq_index + idx;
> >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> >       int r;
> >
> > @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> >           goto err_init_hdev_call;
> >       }
> >
> > +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> >       return g_steal_pointer(&svq);
> >
> >   err_init_hdev_call:
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding
  2021-11-02  7:59     ` Jason Wang
  (?)
@ 2021-11-02 10:22     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02 10:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > Initial version of shadow virtqueue that actually forward buffers. There
> > are no iommu support at the moment, and that will be addressed in future
> > patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
> > this means that SVQ is not usable at this point of the series on any
> > device.
> >
> > For simplicity it only supports modern devices, that expects vring
> > in little endian, with split ring and no event idx or indirect
> > descriptors. Support for them will not be added in this series.
> >
> > It reuses the VirtQueue code for the device part. The driver part is
> > based on Linux's virtio_ring driver, but with stripped functionality
> > and optimizations so it's easier to review. Later commits add simpler
> > ones.
> >
> > However to forwarding buffers have some particular pieces: One of the
> > most unexpected ones is that a guest's buffer can expand through more
> > than one descriptor in SVQ. While this is handled gracefully by qemu's
> > emulated virtio devices, it may cause unexpected SVQ queue full. This
> > patch also solves it checking for this condition at both guest's kicks
> > and device's calls. The code may be more elegant in the future if SVQ
> > code runs in its own iocontext.
> >
> > Note that vhost_vdpa_get_vq_state trust the device to write its status
> > to used_idx at pause(), finishing all in-flight descriptors. This may
> > not be enough for complex devices, but other development like usage of
> > inflight_fd on top of this solution may extend the usage in the future.
> >
> > In particular, SVQ trust it to recover guest's virtqueue at start, and
> > to mark as used the latest descriptors used by the device in the
> > meantime.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   qapi/net.json                      |   5 +-
> >   hw/virtio/vhost-shadow-virtqueue.c | 400 +++++++++++++++++++++++++++--
> >   hw/virtio/vhost-vdpa.c             | 144 ++++++++++-
> >   3 files changed, 521 insertions(+), 28 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index fca2f6ebca..1c6d3b2179 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -84,12 +84,9 @@
> >   #
> >   # Use vhost shadow virtqueue.
> >   #
> > -# SVQ can just forward notifications between the device and the guest at this
> > -# moment. This will expand in future changes.
> > -#
> >   # @name: the device name of the VirtIO device
> >   #
> > -# @set: true to use the alternate shadow VQ notifications
> > +# @set: true to use the alternate shadow VQ
> >   #
> >   # Since: 6.2
> >   #
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index cb9ffcb015..ad1b2342be 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -9,6 +9,9 @@
> >
> >   #include "qemu/osdep.h"
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> > +#include "hw/virtio/vhost.h"
> > +#include "hw/virtio/virtio-access.h"
> > +
> >   #include "standard-headers/linux/vhost_types.h"
> >
> >   #include "qemu/error-report.h"
> > @@ -45,6 +48,27 @@ typedef struct VhostShadowVirtqueue {
> >
> >       /* Virtio device */
> >       VirtIODevice *vdev;
> > +
> > +    /* Map for returning guest's descriptors */
> > +    VirtQueueElement **ring_id_maps;
> > +
> > +    /* Next VirtQueue element that guest made available */
> > +    VirtQueueElement *next_guest_avail_elem;
> > +
> > +    /* Next head to expose to device */
> > +    uint16_t avail_idx_shadow;
> > +
> > +    /* Next free descriptor */
> > +    uint16_t free_head;
> > +
> > +    /* Last seen used idx */
> > +    uint16_t shadow_used_idx;
> > +
> > +    /* Next head to consume from device */
> > +    uint16_t last_used_idx;
> > +
> > +    /* Cache for the exposed notification flag */
> > +    bool notification;
> >   } VhostShadowVirtqueue;
> >
> >   /**
> > @@ -56,25 +80,174 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >       return &svq->hdev_kick;
> >   }
> >
> > -/* If the device is using some of these, SVQ cannot communicate */
> > +/**
> > + * VirtIO transport device feature acknowledge
> > + *
> > + * @dev_features  The device features. If success, the acknowledged features.
> > + *
> > + * Returns true if SVQ can go with a subset of these, false otherwise.
> > + */
> >   bool vhost_svq_valid_device_features(uint64_t *dev_features)
> >   {
> > -    return true;
> > +    uint64_t b;
> > +    bool r = true;
> > +
> > +    for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
> > +        switch (b) {
> > +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> > +        case VIRTIO_F_ANY_LAYOUT:
> > +            continue;
> > +
> > +        case VIRTIO_F_ACCESS_PLATFORM:
> > +            /* SVQ does not know how to translate addresses */
> > +            if (*dev_features & BIT_ULL(b)) {
> > +                clear_bit(b, dev_features);
> > +                r = false;
> > +            }
> > +            break;
> > +
> > +        case VIRTIO_F_VERSION_1:
> > +            /* SVQ trust that guest vring is little endian */
> > +            if (!(*dev_features & BIT_ULL(b))) {
> > +                set_bit(b, dev_features);
> > +                r = false;
> > +            }
> > +            continue;
> > +
> > +        default:
> > +            if (*dev_features & BIT_ULL(b)) {
> > +                clear_bit(b, dev_features);
> > +            }
> > +        }
> > +    }
> > +
> > +    return r;
> >   }
> >
> > -/* If the guest is using some of these, SVQ cannot communicate */
> > +/**
> > + * Check of guest's acknowledge features.
> > + *
> > + * @guest_features  The guest's acknowledged features
> > + *
> > + * Returns true if SVQ can handle them, false otherwise.
> > + */
> >   bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> >   {
> > -    return true;
> > +    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
> > +                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
> > +
> > +    /* These transport features are handled by VirtQueue */
> > +    static const uint64_t valid = (BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) |
> > +                                   BIT_ULL(VIRTIO_F_VERSION_1));
> > +
> > +    /* We are only interested in transport-related feature bits */
> > +    uint64_t guest_transport_features = (*guest_features) & transport;
> > +
> > +    *guest_features &= (valid | ~transport);
> > +    return !(guest_transport_features & (transport ^ valid));
> >   }
> >
> > -/* Forward guest notifications */
> > -static void vhost_handle_guest_kick(EventNotifier *n)
> > +/**
> > + * Number of descriptors that SVQ can make available from the guest.
> > + *
> > + * @svq   The svq
> > + */
> > +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> >   {
> > -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > -                                             svq_kick);
> > +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> > +}
> >
> > -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > +{
> > +    uint16_t notification_flag;
> > +
> > +    if (svq->notification == enable) {
> > +        return;
> > +    }
> > +
> > +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> > +
> > +    svq->notification = enable;
> > +    if (enable) {
> > +        svq->vring.avail->flags &= ~notification_flag;
> > +    } else {
> > +        svq->vring.avail->flags |= notification_flag;
> > +    }
> > +}
> > +
> > +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > +                                    const struct iovec *iovec,
> > +                                    size_t num, bool more_descs, bool write)
> > +{
> > +    uint16_t i = svq->free_head, last = svq->free_head;
> > +    unsigned n;
> > +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > +    vring_desc_t *descs = svq->vring.desc;
> > +
> > +    if (num == 0) {
> > +        return;
> > +    }
> > +
> > +    for (n = 0; n < num; n++) {
> > +        if (more_descs || (n + 1 < num)) {
> > +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > +        } else {
> > +            descs[i].flags = flags;
> > +        }
> > +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > +
> > +        last = i;
> > +        i = cpu_to_le16(descs[i].next);
> > +    }
> > +
> > +    svq->free_head = le16_to_cpu(descs[last].next);
> > +}
> > +
> > +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > +                                    VirtQueueElement *elem)
> > +{
> > +    int head;
> > +    unsigned avail_idx;
> > +    vring_avail_t *avail = svq->vring.avail;
> > +
> > +    head = svq->free_head;
> > +
> > +    /* We need some descriptors here */
> > +    assert(elem->out_num || elem->in_num);
> > +
> > +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > +                            elem->in_num > 0, false);
> > +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > +
> > +    /*
> > +     * Put entry in available array (but don't update avail->idx until they
> > +     * do sync).
> > +     */
> > +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > +    avail->ring[avail_idx] = cpu_to_le16(head);
> > +    svq->avail_idx_shadow++;
> > +
> > +    /* Update avail index after the descriptor is wrote */
> > +    smp_wmb();
>
>
> A question, since we may talk with the real hardware, is smp_wmb()
> sufficient in this case or do we need to honer VIRTIO_F_ORDER_PLATFORM?
>

I didn't take that into account, please let me look better about
qemu's barriers and I will come back for this.

>
> > +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> > +
> > +    return head;
> > +
> > +}
> > +
> > +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > +{
> > +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > +
> > +    svq->ring_id_maps[qemu_head] = elem;
> > +}
> > +
> > +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> > +{
> > +    /* We need to expose available array entries before checking used flags */
> > +    smp_mb();
> > +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
> >           return;
> >       }
> >
> > @@ -86,25 +259,188 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> >       }
> >   }
> >
> > -/*
> > - * Set the device's memory region notifier. addr = NULL clear it.
> > +/**
> > + * Forward available buffers.
> > + *
> > + * @svq Shadow VirtQueue
> > + *
> > + * Note that this function does not guarantee that all guest's available
> > + * buffers are available to the device in SVQ avail ring. The guest may have
> > + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> > + * vaddr.
> > + *
> > + * If that happens, guest's kick notifications will be disabled until device
> > + * makes some buffers used.
> >    */
> > -void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> > +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >   {
> > -    svq->host_notifier_mr = addr;
> > +    /* Clear event notifier */
> > +    event_notifier_test_and_clear(&svq->svq_kick);
> > +
> > +    /* Make available as many buffers as possible */
> > +    do {
> > +        if (virtio_queue_get_notification(svq->vq)) {
> > +            virtio_queue_set_notification(svq->vq, false);
> > +        }
> > +
> > +        while (true) {
> > +            VirtQueueElement *elem;
> > +
> > +            if (svq->next_guest_avail_elem) {
> > +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > +            } else {
> > +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > +            }
> > +
> > +            if (!elem) {
> > +                break;
> > +            }
> > +
> > +            if (elem->out_num + elem->in_num >
> > +                vhost_svq_available_slots(svq)) {
> > +                /*
> > +                 * This condition is possible since a contiguous buffer in GPA
> > +                 * does not imply a contiguous buffer in qemu's VA
> > +                 * scatter-gather segments. If that happen, the buffer exposed
> > +                 * to the device needs to be a chain of descriptors at this
> > +                 * moment.
> > +                 *
> > +                 * SVQ cannot hold more available buffers if we are here:
> > +                 * queue the current guest descriptor and ignore further kicks
> > +                 * until some elements are used.
> > +                 */
>
>
> I wonder what's the advantage of tracking the pending elem like this. It
> looks to me we can simply rewind last_avail_idx in this case?
>

If we do that, we have no way to know if we should check for more
avail buffers at the end of making more used buffers.

We could rewind + use a boolean flag, but I think it would be somehow
equivalent to checking for next_guest_avail_elem != NULL, and then
having to pop (map, etc) everything again.

Another option is to always check for more available buffers at the
end of used buffers. The check should be somehow free with
shadow_avail_idx, but qemu needs to map the descriptor's memory again
as said before.

>
> > +                svq->next_guest_avail_elem = elem;
> > +                return;
> > +            }
> > +
> > +            vhost_svq_add(svq, elem);
> > +            vhost_svq_kick(svq);
> > +        }
> > +
> > +        virtio_queue_set_notification(svq->vq, true);
> > +    } while (!virtio_queue_empty(svq->vq));
> > +}
> > +
> > +/**
> > + * Handle guest's kick.
> > + *
> > + * @n guest kick event notifier, the one that guest set to notify svq.
> > + */
> > +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> > +{
> > +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > +                                             svq_kick);
> > +    vhost_handle_guest_kick(svq);
> > +}
> > +
> > +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > +{
> > +    if (svq->last_used_idx != svq->shadow_used_idx) {
> > +        return true;
> > +    }
> > +
> > +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> > +
> > +    return svq->last_used_idx != svq->shadow_used_idx;
> > +}
> > +
> > +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > +{
> > +    vring_desc_t *descs = svq->vring.desc;
> > +    const vring_used_t *used = svq->vring.used;
> > +    vring_used_elem_t used_elem;
> > +    uint16_t last_used;
> > +
> > +    if (!vhost_svq_more_used(svq)) {
> > +        return NULL;
> > +    }
> > +
> > +    /* Only get used array entries after they have been exposed by dev */
> > +    smp_rmb();
> > +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> > +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> > +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> > +
> > +    svq->last_used_idx++;
> > +    if (unlikely(used_elem.id >= svq->vring.num)) {
> > +        error_report("Device %s says index %u is used", svq->vdev->name,
> > +                     used_elem.id);
> > +        return NULL;
> > +    }
> > +
> > +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> > +        error_report(
> > +            "Device %s says index %u is used, but it was not available",
> > +            svq->vdev->name, used_elem.id);
> > +        return NULL;
> > +    }
> > +
> > +    descs[used_elem.id].next = svq->free_head;
> > +    svq->free_head = used_elem.id;
> > +
> > +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> >   }
> >
> > -/* Forward vhost notifications */
> > +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > +                            bool check_for_avail_queue)
> > +{
> > +    VirtQueue *vq = svq->vq;
> > +
> > +    /* Make as many buffers as possible used. */
> > +    do {
> > +        unsigned i = 0;
> > +
> > +        vhost_svq_set_notification(svq, false);
> > +        while (true) {
> > +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > +            if (!elem) {
> > +                break;
> > +            }
> > +
> > +            if (unlikely(i >= svq->vring.num)) {
> > +                virtio_error(svq->vdev,
> > +                         "More than %u used buffers obtained in a %u size SVQ",
> > +                         i, svq->vring.num);
> > +                virtqueue_fill(vq, elem, elem->len, i);
> > +                virtqueue_flush(vq, i);
> > +                i = 0;
> > +            }
> > +            virtqueue_fill(vq, elem, elem->len, i++);
> > +        }
> > +
> > +        virtqueue_flush(vq, i);
> > +        event_notifier_set(&svq->svq_call);
> > +
> > +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> > +            /*
> > +             * Avail ring was full when vhost_svq_flush was called, so it's a
> > +             * good moment to make more descriptors available if possible
> > +             */
> > +            vhost_handle_guest_kick(svq);
> > +        }
> > +
> > +        vhost_svq_set_notification(svq, true);
> > +    } while (vhost_svq_more_used(svq));
>
>
> So this actually doesn't make sure all the buffers were processed by the
> device? Is this intended (I see it was called by the vhost_svq_stop()).
>
> Note that it means some buffers might not be submitted to the device
> after migration?
>

Not really,

At the do{}while exit, the SVQ has marked all guest's avail buffer as
used. If the device is *not* paused (normal operation), the device
could mark another descriptor as used right after the do{}while
condition, and call() SVQ right after that.

What could happen is that we can pause the device with *those* buffers
pending. That's why a last flush is needed. Since that flush happens
after the pause, the device is not allowed to mark more descriptors as
used, and must have flushed them to SVQ vring after pause() return.

Since the device is going to be reset, it makes no sense to make more
buffers available for it, so we skip that part with
check_for_avail_queue == false.

Is that clear?

>
> > +}
> > +
> > +/**
> > + * Forward used buffers.
> > + *
> > + * @n hdev call event notifier, the one that device set to notify svq.
> > + *
> > + * Note that we are not making any buffers available in the loop, there is no
> > + * way that it runs more than virtqueue size times.
> > + */
> >   static void vhost_svq_handle_call(EventNotifier *n)
> >   {
> >       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> >                                                hdev_call);
> >
> > -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > -        return;
> > -    }
> > +    /* Clear event notifier */
> > +    event_notifier_test_and_clear(n);
> >
> > -    event_notifier_set(&svq->svq_call);
> > +    vhost_svq_flush(svq, true);
> >   }
> >
> >   /*
> > @@ -132,6 +468,14 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
> >       event_notifier_init_fd(&svq->svq_call, call_fd);
> >   }
> >
> > +/*
> > + * Set the device's memory region notifier. addr = NULL clear it.
> > + */
> > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> > +{
> > +    svq->host_notifier_mr = addr;
> > +}
> > +
> >   /*
> >    * Get the shadow vq vring address.
> >    * @svq Shadow virtqueue
> > @@ -185,7 +529,8 @@ static void vhost_svq_set_svq_kick_fd_internal(VhostShadowVirtqueue *svq,
> >        * need to explicitely check for them.
> >        */
> >       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> > +    event_notifier_set_handler(&svq->svq_kick,
> > +                               vhost_handle_guest_kick_notifier);
> >
> >       /*
> >        * !check_old means that we are starting SVQ, taking the descriptor from
> > @@ -233,7 +578,16 @@ void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> >                       VhostShadowVirtqueue *svq)
> >   {
> > +    unsigned i;
> >       event_notifier_set_handler(&svq->svq_kick, NULL);
> > +    vhost_svq_flush(svq, false);
> > +
> > +    for (i = 0; i < svq->vring.num; ++i) {
> > +        g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> > +        if (elem) {
> > +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > +        }
> > +    }
> >   }
> >
> >   /*
> > @@ -248,7 +602,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> >       size_t driver_size;
> >       size_t device_size;
> >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > -    int r;
> > +    int r, i;
> >
> >       r = event_notifier_init(&svq->hdev_kick, 0);
> >       if (r != 0) {
> > @@ -274,6 +628,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> >       memset(svq->vring.desc, 0, driver_size);
> >       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> >       memset(svq->vring.used, 0, device_size);
> > +    for (i = 0; i < num - 1; i++) {
> > +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > +    }
> > +
> > +    svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >       return g_steal_pointer(&svq);
> >
> > @@ -292,6 +651,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> >       event_notifier_cleanup(&vq->hdev_kick);
> >       event_notifier_set_handler(&vq->hdev_call, NULL);
> >       event_notifier_cleanup(&vq->hdev_call);
> > +    g_free(vq->ring_id_maps);
> >       qemu_vfree(vq->vring.desc);
> >       qemu_vfree(vq->vring.used);
> >       g_free(vq);
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index fc8396ba8a..e1c55e43e7 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -19,6 +19,7 @@
> >   #include "hw/virtio/virtio-net.h"
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >   #include "hw/virtio/vhost-vdpa.h"
> > +#include "hw/virtio/vhost-shadow-virtqueue.h"
> >   #include "exec/address-spaces.h"
> >   #include "qemu/main-loop.h"
> >   #include "cpu.h"
> > @@ -821,6 +822,19 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
> >       return true;
> >   }
> >
> > +static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
> > +{
> > +    int r;
> > +    uint8_t status;
> > +
> > +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
> > +    do {
> > +        r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > +    } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
> > +
> > +    return 0;
> > +}
> > +
> >   /*
> >    * Start or stop a shadow virtqueue in a vdpa device
> >    *
> > @@ -844,7 +858,14 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
> >           .index = vq_index,
> >       };
> >       struct vhost_vring_file vhost_call_file = {
> > -        .index = idx + dev->vq_index,
> > +        .index = vq_index,
> > +    };
> > +    struct vhost_vring_addr addr = {
> > +        .index = vq_index,
> > +    };
> > +    struct vhost_vring_state num = {
> > +        .index = vq_index,
> > +        .num = virtio_queue_get_num(dev->vdev, vq_index),
> >       };
> >       int r;
> >
> > @@ -852,6 +873,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
> >           const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
> >           const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
> >
> > +        vhost_svq_get_vring_addr(svq, &addr);
> >           if (n->addr) {
> >               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
> >                                                     false);
> > @@ -870,8 +892,20 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
> >           vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
> >           vhost_call_file.fd = event_notifier_get_fd(vhost_call);
> >       } else {
> > +        struct vhost_vring_state state = {
> > +            .index = vq_index,
> > +        };
> > +
> >           vhost_svq_stop(dev, idx, svq);
> >
> > +        state.num = virtio_queue_get_last_avail_idx(dev->vdev, idx);
> > +        r = vhost_vdpa_set_vring_base(dev, &state);
> > +        if (unlikely(r)) {
> > +            error_setg_errno(errp, -r, "vhost_set_vring_base failed");
> > +            return false;
> > +        }
> > +
> > +        vhost_vdpa_vq_get_addr(dev, &addr, &dev->vqs[idx]);
> >           if (n->addr) {
> >               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
> >                                                     true);
> > @@ -885,6 +919,17 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
> >           vhost_call_file.fd = v->call_fd[idx];
> >       }
> >
> > +    r = vhost_vdpa_set_vring_addr(dev, &addr);
> > +    if (unlikely(r)) {
> > +        error_setg_errno(errp, -r, "vhost_set_vring_addr failed");
> > +        return false;
> > +    }
> > +    r = vhost_vdpa_set_vring_num(dev, &num);
> > +    if (unlikely(r)) {
> > +        error_setg_errno(errp, -r, "vhost_set_vring_num failed");
> > +        return false;
> > +    }
> > +
> >       r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
> >       if (unlikely(r)) {
> >           error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
> > @@ -899,6 +944,50 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
> >       return true;
> >   }
> >
> > +static void vhost_vdpa_get_vq_state(struct vhost_dev *dev, unsigned idx)
> > +{
> > +    struct VirtIODevice *vdev = dev->vdev;
> > +
> > +    virtio_queue_restore_last_avail_idx(vdev, idx);
> > +    virtio_queue_invalidate_signalled_used(vdev, idx);
> > +    virtio_queue_update_used_idx(vdev, idx);
> > +}
>
>
> Do we need to change vhost_vdpa_get_vring_base() to return
> vq->last_avail_idx as well?
>

Yes. To support things like a full reset of the device by the guest in
SVQ mode, we need to control a lot more: address, etc. I think it's
better to replace vhost_ops callbacks, as you proposed in previous
series.

They will be addressed in the next revision.

> Thanks
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 00/26] vDPA shadow virtqueue
  2021-11-02  4:25   ` Jason Wang
  (?)
@ 2021-11-02 11:21   ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-02 11:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 5:26 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/10/30 上午2:34, Eugenio Pérez 写道:
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> >
> > When SVQ is enabled qemu offers a new virtual address space to the
> > device to read and write into, and it maps new vrings and the guest
> > memory in it. SVQ also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked, but at this RFC SVQ is not enabled on migration automatically.
> >
> > Thanks of being a buffers relay system, SVQ can be used also to
> > communicate devices and drivers with different capabilities, like
> > devices that only supports packed vring and not split and old guest
> > with no driver packed support.
> >
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> >
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx.
> >
> > SVQ needs to be enabled with QMP command:
> >
> > { "execute": "x-vhost-set-shadow-vq",
> >        "arguments": { "name": "vhost-vdpa0", "enable": true } }
> >
> > This series includes some patches to delete in the final version that
> > helps with its testing. The first two of the series have been sent
> > sepparately but they haven't been included in qemu main branch.
> >
> > The two after them adds the feature to stop the device and be able to
> > set and get its status. It's intended to be used with vp_vpda driver in
> > a nested environment, so they are also external to this series. The
> > vp_vdpa driver also need modifications to forward the new status bit,
> > they will be proposed sepparately
> >
> > Patches 5-12 prepares the SVQ and QMP command to support guest to host
> > notifications forwarding. If the SVQ is enabled with these ones
> > applied and the device supports it, that part can be tested in
> > isolation (for example, with networking), hopping through SVQ.
> >
> > Same thing is true with patches 13-17, but with device to guest
> > notifications.
> >
> > Based on them, patches from 18 to 22 implement the actual buffer
> > forwarding, using some features already introduced in previous.
> > However, they will need a host device with no iommu, something that
> > is not available at the moment.
> >
> > The last part of the series uses properly the host iommu, so the driver
> > can access this new virtual address space created.
> >
> > Comments are welcome.
>
>
> I think we need do some benchmark to see the performance impact.
>
> Thanks
>

Ok, I will add them for the next revision.

Thanks!

>
> >
> > TODO:
> > * Event, indirect, packed, and others features of virtio.
> > * To sepparate buffers forwarding in its own AIO context, so we can
> >    throw more threads to that task and we don't need to stop the main
> >    event loop.
> > * Support multiqueue virtio-net vdpa.
> > * Proper documentation.
> >
> > Changes from v4 RFC:
> > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> >    already present iova-tree for that.
> > * Proper validation of guest features. Now SVQ can negotiate a
> >    different set of features with the device when enabled.
> > * Support of host notifiers memory regions
> > * Handling of SVQ full queue in case guest's descriptors span to
> >    different memory regions (qemu's VA chunks).
> > * Flush pending used buffers at end of SVQ operation.
> > * QMP command now looks by NetClientState name. Other devices will need
> >    to implement it's way to enable vdpa.
> > * Rename QMP command to set, so it looks more like a way of working
> > * Better use of qemu error system
> > * Make a few assertions proper error-handling paths.
> > * Add more documentation
> > * Less coupling of virtio / vhost, that could cause friction on changes
> > * Addressed many other small comments and small fixes.
> >
> > Changes from v3 RFC:
> >    * Move everything to vhost-vdpa backend. A big change, this allowed
> >      some cleanup but more code has been added in other places.
> >    * More use of glib utilities, especially to manage memory.
> > v3 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> >
> > Changes from v2 RFC:
> >    * Adding vhost-vdpa devices support
> >    * Fixed some memory leaks pointed by different comments
> > v2 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> >
> > Changes from v1 RFC:
> >    * Use QMP instead of migration to start SVQ mode.
> >    * Only accepting IOMMU devices, closer behavior with target devices
> >      (vDPA)
> >    * Fix invalid masking/unmasking of vhost call fd.
> >    * Use of proper methods for synchronization.
> >    * No need to modify VirtIO device code, all of the changes are
> >      contained in vhost code.
> >    * Delete superfluous code.
> >    * An intermediate RFC was sent with only the notifications forwarding
> >      changes. It can be seen in
> >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > v1 link:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> >
> > Eugenio Pérez (20):
> >        virtio: Add VIRTIO_F_QUEUE_STATE
> >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> >        virtio: Add virtio_queue_is_host_notifier_enabled
> >        vhost: Make vhost_virtqueue_{start,stop} public
> >        vhost: Add x-vhost-enable-shadow-vq qmp
> >        vhost: Add VhostShadowVirtqueue
> >        vdpa: Register vdpa devices in a list
> >        vhost: Route guest->host notification through shadow virtqueue
> >        Add vhost_svq_get_svq_call_notifier
> >        Add vhost_svq_set_guest_call_notifier
> >        vdpa: Save call_fd in vhost-vdpa
> >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >        vhost: Route host->guest notification through shadow virtqueue
> >        virtio: Add vhost_shadow_vq_get_vring_addr
> >        vdpa: Save host and guest features
> >        vhost: Add vhost_svq_valid_device_features to shadow vq
> >        vhost: Shadow virtqueue buffers forwarding
> >        vhost: Add VhostIOVATree
> >        vhost: Use a tree to store memory mappings
> >        vdpa: Add custom IOTLB translations to SVQ
> >
> > Eugenio Pérez (26):
> >    util: Make some iova_tree parameters const
> >    vhost: Fix last queue index of devices with no cvq
> >    virtio: Add VIRTIO_F_QUEUE_STATE
> >    virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> >    vhost: Add x-vhost-set-shadow-vq qmp
> >    vhost: Add VhostShadowVirtqueue
> >    vdpa: Save kick_fd in vhost-vdpa
> >    vdpa: Add vhost_svq_get_dev_kick_notifier
> >    vdpa: Add vhost_svq_set_svq_kick_fd
> >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> >    vhost: Handle host notifiers in SVQ
> >    vhost: Route guest->host notification through shadow virtqueue
> >    Add vhost_svq_get_svq_call_notifier
> >    Add vhost_svq_set_guest_call_notifier
> >    vdpa: Save call_fd in vhost-vdpa
> >    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >    vhost: Route host->guest notification through shadow virtqueue
> >    virtio: Add vhost_shadow_vq_get_vring_addr
> >    vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
> >    vhost: Add vhost_svq_valid_device_features to shadow vq
> >    vhost: Add vhost_svq_valid_guest_features to shadow vq
> >    vhost: Shadow virtqueue buffers forwarding
> >    util: Add iova_tree_alloc
> >    vhost: Add VhostIOVATree
> >    vhost: Use a tree to store memory mappings
> >    vdpa: Add custom IOTLB translations to SVQ
> >
> >   qapi/net.json                                 |  20 +
> >   hw/virtio/vhost-iova-tree.h                   |  27 +
> >   hw/virtio/vhost-shadow-virtqueue.h            |  44 ++
> >   hw/virtio/virtio-pci.h                        |   1 +
> >   include/hw/virtio/vhost-vdpa.h                |  12 +
> >   include/hw/virtio/virtio.h                    |   4 +-
> >   include/qemu/iova-tree.h                      |  25 +-
> >   .../standard-headers/linux/virtio_config.h    |   5 +
> >   include/standard-headers/linux/virtio_pci.h   |   2 +
> >   hw/i386/intel_iommu.c                         |   2 +-
> >   hw/net/vhost_net.c                            |   2 +-
> >   hw/net/virtio-net.c                           |   6 +-
> >   hw/virtio/vhost-iova-tree.c                   | 157 ++++
> >   hw/virtio/vhost-shadow-virtqueue.c            | 746 ++++++++++++++++++
> >   hw/virtio/vhost-vdpa.c                        | 437 +++++++++-
> >   hw/virtio/virtio-pci.c                        |  16 +-
> >   net/vhost-vdpa.c                              |  28 +
> >   util/iova-tree.c                              | 151 +++-
> >   hw/virtio/meson.build                         |   2 +-
> >   19 files changed, 1664 insertions(+), 23 deletions(-)
> >   create mode 100644 hw/virtio/vhost-iova-tree.h
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> >   create mode 100644 hw/virtio/vhost-iova-tree.c
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-11-02  8:28     ` Eugenio Perez Martin
@ 2021-11-03  3:10         ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-03  3:10 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 4:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 7:35 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > This iova tree function allows it to look for a hole in allocated
> > > regions and return a totally new translation for a given translated
> > > address.
> > >
> > > It's usage is mainly to allow devices to access qemu address space,
> > > remapping guest's one into a new iova space where qemu can add chunks of
> > > addresses.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   include/qemu/iova-tree.h |  17 +++++
> > >   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 156 insertions(+)
> > >
> > > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > > index 8249edd764..33f9b2e13f 100644
> > > --- a/include/qemu/iova-tree.h
> > > +++ b/include/qemu/iova-tree.h
> > > @@ -29,6 +29,7 @@
> > >   #define  IOVA_OK           (0)
> > >   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> > >   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > > +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
> >
> >
> > I think we need a better name other than "NOMEM", since it's actually
> > means there's no sufficient hole for the range?
> >
>
> Actually, yes. I'm totally fine with changing it, but "the
> inspiration" is that ENOMEM is also the error that malloc sets in
> errno if not enough contiguous VM can be allocated.

Ok, then I think it's fine.

>
> What would be a more descriptive name?
>
> >
> > >
> > >   typedef struct IOVATree IOVATree;
> > >   typedef struct DMAMap {
> > > @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
> > >    */
> > >   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> > >
> > > +/**
> > > + * iova_tree_alloc:
> > > + *
> > > + * @tree: the iova tree to allocate from
> > > + * @map: the new map (as translated addr & size) to allocate in iova region
> > > + * @iova_begin: the minimum address of the allocation
> > > + * @iova_end: the maximum addressable direction of the allocation
> > > + *
> > > + * Allocates a new region of a given size, between iova_min and iova_max.
> > > + *
> > > + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> > > + * free contiguous range. Caller can get the assigned iova in map->iova.
> > > + */
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > +                    hwaddr iova_end);
> > > +
> >
> >
> > "iova_tree_alloc_map" seems better.
> >
>
> Right, I changed in vhost but I forgot to change here.
>
> >
> > >   /**
> > >    * iova_tree_destroy:
> > >    *
> > > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > > index 23ea35b7a4..27c921c4e2 100644
> > > --- a/util/iova-tree.c
> > > +++ b/util/iova-tree.c
> > > @@ -16,6 +16,36 @@ struct IOVATree {
> > >       GTree *tree;
> > >   };
> > >
> > > +/* Args to pass to iova_tree_alloc foreach function. */
> > > +struct IOVATreeAllocArgs {
> > > +    /* Size of the desired allocation */
> > > +    size_t new_size;
> > > +
> > > +    /* The minimum address allowed in the allocation */
> > > +    hwaddr iova_begin;
> > > +
> > > +    /* The last addressable allowed in the allocation */
> > > +    hwaddr iova_last;
> > > +
> > > +    /* Previously-to-last iterated map, can be NULL in the first node */
> > > +    const DMAMap *hole_left;
> > > +
> > > +    /* Last iterated map */
> > > +    const DMAMap *hole_right;
> >
> >
> > Any reason we can move those to IOVATree structure, it can simplify a
> > lot of things.
> >
>
> I can move for the next version for sure, but then it needs to be
> clear enough that these fields are alloc arguments.

Sure.

>
> >
> > > +};
> > > +
> > > +/**
> > > + * Iterate args to tne next hole
>
> s/tne/the/
>
> > > + *
> > > + * @args  The alloc arguments
> > > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > > + */
> > > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > > +                                         const DMAMap *next) {
> > > +    args->hole_left = args->hole_right;
> > > +    args->hole_right = next;
> > > +}
> > > +
> > >   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
> > >   {
> > >       const DMAMap *m1 = a, *m2 = b;
> > > @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> > >       return IOVA_OK;
> > >   }
> > >
> > > +/**
> > > + * Try to accomodate a map of size ret->size in a hole between
> > > + * max(end(hole_left), iova_start).
> > > + *
> > > + * @args Arguments to allocation
> > > + */
> > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > +{
> > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > +    uint64_t hole_start, hole_last;
> > > +
> > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (left && left->iova > args->iova_last) {
> > > +        return false;
> > > +    }
> > > +
> > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> > > +
> > > +    if (hole_last - hole_start > args->new_size) {
> > > +        /* We found a valid hole. */
> > > +        return true;
> > > +    }
> > > +
> > > +    /* Keep iterating */
> > > +    return false;
> > > +}
> > > +
> > > +/**
> > > + * Foreach dma node in the tree, compare if there is a hole wit its previous
> > > + * node (or minimum iova address allowed) and the node.
> > > + *
> > > + * @key   Node iterating
> > > + * @value Node iterating
> > > + * @pargs Struct to communicate with the outside world
> > > + *
> > > + * Return: false to keep iterating, true if needs break.
> > > + */
> > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > +                                         gpointer pargs)
> > > +{
> > > +    struct IOVATreeAllocArgs *args = pargs;
> > > +    DMAMap *node = value;
> > > +
> > > +    assert(key == value);
> > > +
> > > +    iova_tree_alloc_args_iterate(args, node);
> > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> > > +        return true;
> > > +    }
> > > +
> > > +    if (iova_tree_alloc_map_in_hole(args)) {
> > > +        return true;
> > > +    }
> > > +
> > > +    return false;
> > > +}
> > > +
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > +                    hwaddr iova_last)
> > > +{
> > > +    struct IOVATreeAllocArgs args = {
> > > +        .new_size = map->size,
> > > +        .iova_begin = iova_begin,
> > > +        .iova_last = iova_last,
> > > +    };
> > > +
> > > +    if (iova_begin == 0) {
> > > +        /* Some devices does not like addr 0 */
> > > +        iova_begin += qemu_real_host_page_size;
> > > +    }
> > > +
> > > +    assert(iova_begin < iova_last);
> > > +
> > > +    /*
> > > +     * Find a valid hole for the mapping
> > > +     *
> > > +     * Assuming low iova_begin, so no need to do a binary search to
> > > +     * locate the first node.
> > > +     *
> > > +     * TODO: We can improve the search speed if we save the beginning and the
> > > +     * end of holes, so we don't iterate over the previous saved ones.
> > > +     *
> > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > > +     * code a lot.
> >
> >
> > To say the truth, the codes in iova_tree_alloc_traverse() is hard to be
> > reviewed. I think it would be easy to use first/next/last. What we
> > really need is to calculate the hole between two ranges with handmade
> > first, last.
> >
>
> I totally agree on that, but we don't have first/next/last in GTree
> until glib 2.68. Can we raise the minimum version required?

I'm not sure but I guess it's better not. But I wonder if something
like the following would be simpler?

DMAMap first = {
    .iova = iova_begin,
    .size = 0,
};

DMAMap *previous = &first;
DMAMap *this;

static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
                                         gpointer pargs)
{
    struct IOVATreeAllocArgs *args = pargs;
    hwaddr start = previous->iova + previous->size;
    this = value;

    if (this->iova - start >= args->size)
        return true;

    previous = this;
    return false;
}

And we need to deal with the iova_end as you did.

Thanks

>
> Another possibility that comes to my mind is to either have a list /
> tree of free regions, or directly a custom allocator for this.
>
> > Thanks
> >
> >
> > > +     *
> > > +     */
> > > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > > +    if (!iova_tree_alloc_map_in_hole(&args)) {
> > > +        /*
> > > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > > +         * (right, end) hole needs to be checked too
> > > +         */
> > > +        iova_tree_alloc_args_iterate(&args, NULL);
> > > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > > +            return IOVA_ERR_NOMEM;
> > > +        }
> > > +    }
> > > +
> > > +    map->iova = MAX(iova_begin,
> > > +                    args.hole_left ?
> > > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > > +    return iova_tree_insert(tree, map);
> > > +}
> > > +
> > >   void iova_tree_destroy(IOVATree *tree)
> > >   {
> > >       g_tree_destroy(tree->tree);
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
@ 2021-11-03  3:10         ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-03  3:10 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 4:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 7:35 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > This iova tree function allows it to look for a hole in allocated
> > > regions and return a totally new translation for a given translated
> > > address.
> > >
> > > It's usage is mainly to allow devices to access qemu address space,
> > > remapping guest's one into a new iova space where qemu can add chunks of
> > > addresses.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   include/qemu/iova-tree.h |  17 +++++
> > >   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 156 insertions(+)
> > >
> > > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > > index 8249edd764..33f9b2e13f 100644
> > > --- a/include/qemu/iova-tree.h
> > > +++ b/include/qemu/iova-tree.h
> > > @@ -29,6 +29,7 @@
> > >   #define  IOVA_OK           (0)
> > >   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> > >   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > > +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
> >
> >
> > I think we need a better name other than "NOMEM", since it's actually
> > means there's no sufficient hole for the range?
> >
>
> Actually, yes. I'm totally fine with changing it, but "the
> inspiration" is that ENOMEM is also the error that malloc sets in
> errno if not enough contiguous VM can be allocated.

Ok, then I think it's fine.

>
> What would be a more descriptive name?
>
> >
> > >
> > >   typedef struct IOVATree IOVATree;
> > >   typedef struct DMAMap {
> > > @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
> > >    */
> > >   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> > >
> > > +/**
> > > + * iova_tree_alloc:
> > > + *
> > > + * @tree: the iova tree to allocate from
> > > + * @map: the new map (as translated addr & size) to allocate in iova region
> > > + * @iova_begin: the minimum address of the allocation
> > > + * @iova_end: the maximum addressable direction of the allocation
> > > + *
> > > + * Allocates a new region of a given size, between iova_min and iova_max.
> > > + *
> > > + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> > > + * free contiguous range. Caller can get the assigned iova in map->iova.
> > > + */
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > +                    hwaddr iova_end);
> > > +
> >
> >
> > "iova_tree_alloc_map" seems better.
> >
>
> Right, I changed in vhost but I forgot to change here.
>
> >
> > >   /**
> > >    * iova_tree_destroy:
> > >    *
> > > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > > index 23ea35b7a4..27c921c4e2 100644
> > > --- a/util/iova-tree.c
> > > +++ b/util/iova-tree.c
> > > @@ -16,6 +16,36 @@ struct IOVATree {
> > >       GTree *tree;
> > >   };
> > >
> > > +/* Args to pass to iova_tree_alloc foreach function. */
> > > +struct IOVATreeAllocArgs {
> > > +    /* Size of the desired allocation */
> > > +    size_t new_size;
> > > +
> > > +    /* The minimum address allowed in the allocation */
> > > +    hwaddr iova_begin;
> > > +
> > > +    /* The last addressable allowed in the allocation */
> > > +    hwaddr iova_last;
> > > +
> > > +    /* Previously-to-last iterated map, can be NULL in the first node */
> > > +    const DMAMap *hole_left;
> > > +
> > > +    /* Last iterated map */
> > > +    const DMAMap *hole_right;
> >
> >
> > Any reason we can move those to IOVATree structure, it can simplify a
> > lot of things.
> >
>
> I can move for the next version for sure, but then it needs to be
> clear enough that these fields are alloc arguments.

Sure.

>
> >
> > > +};
> > > +
> > > +/**
> > > + * Iterate args to tne next hole
>
> s/tne/the/
>
> > > + *
> > > + * @args  The alloc arguments
> > > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > > + */
> > > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > > +                                         const DMAMap *next) {
> > > +    args->hole_left = args->hole_right;
> > > +    args->hole_right = next;
> > > +}
> > > +
> > >   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
> > >   {
> > >       const DMAMap *m1 = a, *m2 = b;
> > > @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> > >       return IOVA_OK;
> > >   }
> > >
> > > +/**
> > > + * Try to accomodate a map of size ret->size in a hole between
> > > + * max(end(hole_left), iova_start).
> > > + *
> > > + * @args Arguments to allocation
> > > + */
> > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > +{
> > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > +    uint64_t hole_start, hole_last;
> > > +
> > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (left && left->iova > args->iova_last) {
> > > +        return false;
> > > +    }
> > > +
> > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> > > +
> > > +    if (hole_last - hole_start > args->new_size) {
> > > +        /* We found a valid hole. */
> > > +        return true;
> > > +    }
> > > +
> > > +    /* Keep iterating */
> > > +    return false;
> > > +}
> > > +
> > > +/**
> > > + * Foreach dma node in the tree, compare if there is a hole wit its previous
> > > + * node (or minimum iova address allowed) and the node.
> > > + *
> > > + * @key   Node iterating
> > > + * @value Node iterating
> > > + * @pargs Struct to communicate with the outside world
> > > + *
> > > + * Return: false to keep iterating, true if needs break.
> > > + */
> > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > +                                         gpointer pargs)
> > > +{
> > > +    struct IOVATreeAllocArgs *args = pargs;
> > > +    DMAMap *node = value;
> > > +
> > > +    assert(key == value);
> > > +
> > > +    iova_tree_alloc_args_iterate(args, node);
> > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> > > +        return true;
> > > +    }
> > > +
> > > +    if (iova_tree_alloc_map_in_hole(args)) {
> > > +        return true;
> > > +    }
> > > +
> > > +    return false;
> > > +}
> > > +
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > +                    hwaddr iova_last)
> > > +{
> > > +    struct IOVATreeAllocArgs args = {
> > > +        .new_size = map->size,
> > > +        .iova_begin = iova_begin,
> > > +        .iova_last = iova_last,
> > > +    };
> > > +
> > > +    if (iova_begin == 0) {
> > > +        /* Some devices does not like addr 0 */
> > > +        iova_begin += qemu_real_host_page_size;
> > > +    }
> > > +
> > > +    assert(iova_begin < iova_last);
> > > +
> > > +    /*
> > > +     * Find a valid hole for the mapping
> > > +     *
> > > +     * Assuming low iova_begin, so no need to do a binary search to
> > > +     * locate the first node.
> > > +     *
> > > +     * TODO: We can improve the search speed if we save the beginning and the
> > > +     * end of holes, so we don't iterate over the previous saved ones.
> > > +     *
> > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > > +     * code a lot.
> >
> >
> > To say the truth, the codes in iova_tree_alloc_traverse() is hard to be
> > reviewed. I think it would be easy to use first/next/last. What we
> > really need is to calculate the hole between two ranges with handmade
> > first, last.
> >
>
> I totally agree on that, but we don't have first/next/last in GTree
> until glib 2.68. Can we raise the minimum version required?

I'm not sure but I guess it's better not. But I wonder if something
like the following would be simpler?

DMAMap first = {
    .iova = iova_begin,
    .size = 0,
};

DMAMap *previous = &first;
DMAMap *this;

static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
                                         gpointer pargs)
{
    struct IOVATreeAllocArgs *args = pargs;
    hwaddr start = previous->iova + previous->size;
    this = value;

    if (this->iova - start >= args->size)
        return true;

    previous = this;
    return false;
}

And we need to deal with the iova_end as you did.

Thanks

>
> Another possibility that comes to my mind is to either have a list /
> tree of free regions, or directly a custom allocator for this.
>
> > Thanks
> >
> >
> > > +     *
> > > +     */
> > > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > > +    if (!iova_tree_alloc_map_in_hole(&args)) {
> > > +        /*
> > > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > > +         * (right, end) hole needs to be checked too
> > > +         */
> > > +        iova_tree_alloc_args_iterate(&args, NULL);
> > > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > > +            return IOVA_ERR_NOMEM;
> > > +        }
> > > +    }
> > > +
> > > +    map->iova = MAX(iova_begin,
> > > +                    args.hole_left ?
> > > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > > +    return iova_tree_insert(tree, map);
> > > +}
> > > +
> > >   void iova_tree_destroy(IOVATree *tree)
> > >   {
> > >       g_tree_destroy(tree->tree);
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2021-11-02  8:09     ` Eugenio Perez Martin
@ 2021-11-03  3:18         ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-03  3:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > This allows it to test if the guest has aknowledge an invalid transport
> > > feature for SVQ. This will include packed vq layout or event_idx,
> > > where VirtIO device needs help from SVQ.
> > >
> > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > features again with the guest, a failure in acknowledge them is fatal
> > > for SVQ.
> > >
> >
> > It's not clear to me why we need this. Maybe you can give me an
> > example. E.g isn't it sufficient to filter out the device with
> > event_idx?
> >
>
> If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> only when device marks as used a specific number of descriptors.
>
> If we use VirtQueue notification, the VirtQueue code handles it
> transparently. But if we want to be able to change the guest VQ's
> call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> code. And that is still not implemented.
>
> Of course in the event_idx case we could just ignore it and notify in
> all used descriptors, but it seems not polite to me :). I will develop
> event_idx on top of this, either exposing the needed pieces in
> VirtQueue (I prefer this) or rolling our own in SVQ.

Yes, but what I meant is, we can fail the SVQ enabling if the device
supports event_idx. Then we're sure guests won't negotiate event_idx.

Thanks

>
> Same reasoning can be applied to unknown transport features.
>
> Thanks!
>
> > Thanks
> >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > >  2 files changed, 7 insertions(+)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > index 946b2c6295..ac55588009 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -16,6 +16,7 @@
> > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > >
> > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > >
> > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 6e0508a231..cb9ffcb015 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > >      return true;
> > >  }
> > >
> > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > +{
> > > +    return true;
> > > +}
> > > +
> > >  /* Forward guest notifications */
> > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > >  {
> > > --
> > > 2.27.0
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
@ 2021-11-03  3:18         ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-03  3:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > This allows it to test if the guest has aknowledge an invalid transport
> > > feature for SVQ. This will include packed vq layout or event_idx,
> > > where VirtIO device needs help from SVQ.
> > >
> > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > features again with the guest, a failure in acknowledge them is fatal
> > > for SVQ.
> > >
> >
> > It's not clear to me why we need this. Maybe you can give me an
> > example. E.g isn't it sufficient to filter out the device with
> > event_idx?
> >
>
> If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> only when device marks as used a specific number of descriptors.
>
> If we use VirtQueue notification, the VirtQueue code handles it
> transparently. But if we want to be able to change the guest VQ's
> call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> code. And that is still not implemented.
>
> Of course in the event_idx case we could just ignore it and notify in
> all used descriptors, but it seems not polite to me :). I will develop
> event_idx on top of this, either exposing the needed pieces in
> VirtQueue (I prefer this) or rolling our own in SVQ.

Yes, but what I meant is, we can fail the SVQ enabling if the device
supports event_idx. Then we're sure guests won't negotiate event_idx.

Thanks

>
> Same reasoning can be applied to unknown transport features.
>
> Thanks!
>
> > Thanks
> >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > >  2 files changed, 7 insertions(+)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > index 946b2c6295..ac55588009 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -16,6 +16,7 @@
> > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > >
> > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > >
> > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 6e0508a231..cb9ffcb015 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > >      return true;
> > >  }
> > >
> > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > +{
> > > +    return true;
> > > +}
> > > +
> > >  /* Forward guest notifications */
> > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > >  {
> > > --
> > > 2.27.0
> > >
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-11-03  3:10         ` Jason Wang
  (?)
@ 2021-11-03  7:41         ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-03  7:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 4:10 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 4:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 7:35 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > > This iova tree function allows it to look for a hole in allocated
> > > > regions and return a totally new translation for a given translated
> > > > address.
> > > >
> > > > It's usage is mainly to allow devices to access qemu address space,
> > > > remapping guest's one into a new iova space where qemu can add chunks of
> > > > addresses.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > >   include/qemu/iova-tree.h |  17 +++++
> > > >   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
> > > >   2 files changed, 156 insertions(+)
> > > >
> > > > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > > > index 8249edd764..33f9b2e13f 100644
> > > > --- a/include/qemu/iova-tree.h
> > > > +++ b/include/qemu/iova-tree.h
> > > > @@ -29,6 +29,7 @@
> > > >   #define  IOVA_OK           (0)
> > > >   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> > > >   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > > > +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
> > >
> > >
> > > I think we need a better name other than "NOMEM", since it's actually
> > > means there's no sufficient hole for the range?
> > >
> >
> > Actually, yes. I'm totally fine with changing it, but "the
> > inspiration" is that ENOMEM is also the error that malloc sets in
> > errno if not enough contiguous VM can be allocated.
>
> Ok, then I think it's fine.
>
> >
> > What would be a more descriptive name?
> >
> > >
> > > >
> > > >   typedef struct IOVATree IOVATree;
> > > >   typedef struct DMAMap {
> > > > @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
> > > >    */
> > > >   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> > > >
> > > > +/**
> > > > + * iova_tree_alloc:
> > > > + *
> > > > + * @tree: the iova tree to allocate from
> > > > + * @map: the new map (as translated addr & size) to allocate in iova region
> > > > + * @iova_begin: the minimum address of the allocation
> > > > + * @iova_end: the maximum addressable direction of the allocation
> > > > + *
> > > > + * Allocates a new region of a given size, between iova_min and iova_max.
> > > > + *
> > > > + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> > > > + * free contiguous range. Caller can get the assigned iova in map->iova.
> > > > + */
> > > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > > +                    hwaddr iova_end);
> > > > +
> > >
> > >
> > > "iova_tree_alloc_map" seems better.
> > >
> >
> > Right, I changed in vhost but I forgot to change here.
> >
> > >
> > > >   /**
> > > >    * iova_tree_destroy:
> > > >    *
> > > > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > > > index 23ea35b7a4..27c921c4e2 100644
> > > > --- a/util/iova-tree.c
> > > > +++ b/util/iova-tree.c
> > > > @@ -16,6 +16,36 @@ struct IOVATree {
> > > >       GTree *tree;
> > > >   };
> > > >
> > > > +/* Args to pass to iova_tree_alloc foreach function. */
> > > > +struct IOVATreeAllocArgs {
> > > > +    /* Size of the desired allocation */
> > > > +    size_t new_size;
> > > > +
> > > > +    /* The minimum address allowed in the allocation */
> > > > +    hwaddr iova_begin;
> > > > +
> > > > +    /* The last addressable allowed in the allocation */
> > > > +    hwaddr iova_last;
> > > > +
> > > > +    /* Previously-to-last iterated map, can be NULL in the first node */
> > > > +    const DMAMap *hole_left;
> > > > +
> > > > +    /* Last iterated map */
> > > > +    const DMAMap *hole_right;
> > >
> > >
> > > Any reason we can move those to IOVATree structure, it can simplify a
> > > lot of things.
> > >
> >
> > I can move for the next version for sure, but then it needs to be
> > clear enough that these fields are alloc arguments.
>
> Sure.
>
> >
> > >
> > > > +};
> > > > +
> > > > +/**
> > > > + * Iterate args to tne next hole
> >
> > s/tne/the/
> >
> > > > + *
> > > > + * @args  The alloc arguments
> > > > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > > > + */
> > > > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > > > +                                         const DMAMap *next) {
> > > > +    args->hole_left = args->hole_right;
> > > > +    args->hole_right = next;
> > > > +}
> > > > +
> > > >   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
> > > >   {
> > > >       const DMAMap *m1 = a, *m2 = b;
> > > > @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> > > >       return IOVA_OK;
> > > >   }
> > > >
> > > > +/**
> > > > + * Try to accomodate a map of size ret->size in a hole between
> > > > + * max(end(hole_left), iova_start).
> > > > + *
> > > > + * @args Arguments to allocation
> > > > + */
> > > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > > +{
> > > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > > +    uint64_t hole_start, hole_last;
> > > > +
> > > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    if (left && left->iova > args->iova_last) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> > > > +
> > > > +    if (hole_last - hole_start > args->new_size) {
> > > > +        /* We found a valid hole. */
> > > > +        return true;
> > > > +    }
> > > > +
> > > > +    /* Keep iterating */
> > > > +    return false;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Foreach dma node in the tree, compare if there is a hole wit its previous
> > > > + * node (or minimum iova address allowed) and the node.
> > > > + *
> > > > + * @key   Node iterating
> > > > + * @value Node iterating
> > > > + * @pargs Struct to communicate with the outside world
> > > > + *
> > > > + * Return: false to keep iterating, true if needs break.
> > > > + */
> > > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > > +                                         gpointer pargs)
> > > > +{
> > > > +    struct IOVATreeAllocArgs *args = pargs;
> > > > +    DMAMap *node = value;
> > > > +
> > > > +    assert(key == value);
> > > > +
> > > > +    iova_tree_alloc_args_iterate(args, node);
> > > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> > > > +        return true;
> > > > +    }
> > > > +
> > > > +    if (iova_tree_alloc_map_in_hole(args)) {
> > > > +        return true;
> > > > +    }
> > > > +
> > > > +    return false;
> > > > +}
> > > > +
> > > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > > +                    hwaddr iova_last)
> > > > +{
> > > > +    struct IOVATreeAllocArgs args = {
> > > > +        .new_size = map->size,
> > > > +        .iova_begin = iova_begin,
> > > > +        .iova_last = iova_last,
> > > > +    };
> > > > +
> > > > +    if (iova_begin == 0) {
> > > > +        /* Some devices does not like addr 0 */
> > > > +        iova_begin += qemu_real_host_page_size;
> > > > +    }
> > > > +
> > > > +    assert(iova_begin < iova_last);
> > > > +
> > > > +    /*
> > > > +     * Find a valid hole for the mapping
> > > > +     *
> > > > +     * Assuming low iova_begin, so no need to do a binary search to
> > > > +     * locate the first node.
> > > > +     *
> > > > +     * TODO: We can improve the search speed if we save the beginning and the
> > > > +     * end of holes, so we don't iterate over the previous saved ones.
> > > > +     *
> > > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > > > +     * code a lot.
> > >
> > >
> > > To say the truth, the codes in iova_tree_alloc_traverse() is hard to be
> > > reviewed. I think it would be easy to use first/next/last. What we
> > > really need is to calculate the hole between two ranges with handmade
> > > first, last.
> > >
> >
> > I totally agree on that, but we don't have first/next/last in GTree
> > until glib 2.68. Can we raise the minimum version required?
>
> I'm not sure but I guess it's better not. But I wonder if something
> like the following would be simpler?
>
> DMAMap first = {
>     .iova = iova_begin,
>     .size = 0,
> };
>
> DMAMap *previous = &first;
> DMAMap *this;
>
> static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
>                                          gpointer pargs)
> {
>     struct IOVATreeAllocArgs *args = pargs;
>     hwaddr start = previous->iova + previous->size;
>     this = value;
>
>     if (this->iova - start >= args->size)
>         return true;
>
>     previous = this;
>     return false;
> }
>
> And we need to deal with the iova_end as you did.
>

I'll try for the next version and I will come back to you with the results.

Thanks!

> Thanks
>
> >
> > Another possibility that comes to my mind is to either have a list /
> > tree of free regions, or directly a custom allocator for this.
> >
> > > Thanks
> > >
> > >
> > > > +     *
> > > > +     */
> > > > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > > > +    if (!iova_tree_alloc_map_in_hole(&args)) {
> > > > +        /*
> > > > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > > > +         * (right, end) hole needs to be checked too
> > > > +         */
> > > > +        iova_tree_alloc_args_iterate(&args, NULL);
> > > > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > > > +            return IOVA_ERR_NOMEM;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    map->iova = MAX(iova_begin,
> > > > +                    args.hole_left ?
> > > > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > > > +    return iova_tree_insert(tree, map);
> > > > +}
> > > > +
> > > >   void iova_tree_destroy(IOVATree *tree)
> > > >   {
> > > >       g_tree_destroy(tree->tree);
> > >
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2021-11-03  3:18         ` Jason Wang
  (?)
@ 2021-11-03  7:43         ` Eugenio Perez Martin
  2021-11-04  2:34             ` Jason Wang
  -1 siblings, 1 reply; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-03  7:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 4:18 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >
> > > > This allows it to test if the guest has aknowledge an invalid transport
> > > > feature for SVQ. This will include packed vq layout or event_idx,
> > > > where VirtIO device needs help from SVQ.
> > > >
> > > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > > features again with the guest, a failure in acknowledge them is fatal
> > > > for SVQ.
> > > >
> > >
> > > It's not clear to me why we need this. Maybe you can give me an
> > > example. E.g isn't it sufficient to filter out the device with
> > > event_idx?
> > >
> >
> > If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> > only when device marks as used a specific number of descriptors.
> >
> > If we use VirtQueue notification, the VirtQueue code handles it
> > transparently. But if we want to be able to change the guest VQ's
> > call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> > code. And that is still not implemented.
> >
> > Of course in the event_idx case we could just ignore it and notify in
> > all used descriptors, but it seems not polite to me :). I will develop
> > event_idx on top of this, either exposing the needed pieces in
> > VirtQueue (I prefer this) or rolling our own in SVQ.
>
> Yes, but what I meant is, we can fail the SVQ enabling if the device
> supports event_idx. Then we're sure guests won't negotiate event_idx.
>

We can go that way for sure, but then we leave out the scenario where
the device supports event_idx but the guest has not acked it. This is
a valid scenario for SVQ to work in.

Thanks!

> Thanks
>
> >
> > Same reasoning can be applied to unknown transport features.
> >
> > Thanks!
> >
> > > Thanks
> > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > > >  2 files changed, 7 insertions(+)
> > > >
> > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > index 946b2c6295..ac55588009 100644
> > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > @@ -16,6 +16,7 @@
> > > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > >
> > > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > > >
> > > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > index 6e0508a231..cb9ffcb015 100644
> > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > >      return true;
> > > >  }
> > > >
> > > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > > +{
> > > > +    return true;
> > > > +}
> > > > +
> > > >  /* Forward guest notifications */
> > > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > > >  {
> > > > --
> > > > 2.27.0
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
       [not found]         ` <CAJaqyWd4DQwRSL5StCft+3-uq12TW5x1o4DN_YW97D0JzOr2XQ@mail.gmail.com>
@ 2021-11-04  2:31             ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-04  2:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 3:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Nov 3, 2021 at 3:56 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 4:47 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 8:55 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > > > If device supports host notifiers, this makes one jump less (kernel) to
> > > > > deliver SVQ notifications to it.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
> > > > >   2 files changed, 24 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 30ab9643b9..eb0a54f954 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > > > >                                                 const VhostShadowVirtqueue *svq);
> > > > > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> > > > > +
> > > > >   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > >                        VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index fda60d11db..e3dcc039b6 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
> > > > >        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> > > > >        */
> > > > >       EventNotifier svq_kick;
> > > > > +
> > > > > +    /* Device's host notifier memory region. NULL means no region */
> > > > > +    void *host_notifier_mr;
> > > > > +
> > > > > +    /* Virtio queue shadowing */
> > > > > +    VirtQueue *vq;
> > > > >   } VhostShadowVirtqueue;
> > > > >
> > > > >   /**
> > > > > @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > >           return;
> > > > >       }
> > > > >
> > > > > -    event_notifier_set(&svq->hdev_kick);
> > > > > +    if (svq->host_notifier_mr) {
> > > > > +        uint16_t *mr = svq->host_notifier_mr;
> > > > > +        *mr = virtio_get_queue_index(svq->vq);
> > > >
> > > >
> > > > Do we need barriers around the possible MMIO here?
> > >
> > > That's right, I missed them.
> > >
> > > >
> > > > To avoid those complicated stuff, I'd rather simply go with eventfd path.
> > > >
> > > > Note mmio and eventfd are not mutually exclusive.
> > >
> > > Actually we cannot ignore them since they are set in the guest. If SVQ
> > > does nothing about them, the guest's notification will travel directly
> > > to the vdpa device, and SVQ cannot intercept them.
> > >
> > > Taking that into account, it's actually less changes to move them to
> > > SVQ (like in this series) than to disable them (like in previous
> > > series). But we can go with disabling them for sure.
> >
> > I think we can simply disable the memory region for the doorbell, then
> > qemu/kvm will do all the rest for us.
> >
> > If we want to add barriers it would be a lot of architecture specific
> > instructions which looks like a burden for us to maintain in Qemu.
> >
> > So if we disable the memory region, KVM will fallback to the eventfd,
> > then qemu can intercept and we can simply relay it via kickfd. This
> > looks easier to maintain.
> >
> > Thanks
> >
>
> Any reason to go off-list? :).

Hit the wrong button:(

Adding the list back.

>
> I'm fine doing it that way, but it seems to me there must be a way
> since VFIO, UIO, etc would have the same issues. The worst case would
> be that these accesses are resolved through a syscall or similar. How
> does DPDK solve it?

I guess it should have per arch assemblies etc.

> Probably with specific asm as you say...

We can go this way, but to speed up the merging, I'd go with eventfd
first to avoid dependencies. And we can do that in the future as the
performance optimization.

Thanks

>
> Thanks!
>
>
> > >
> > > Thanks!
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > > +    } else {
> > > > > +        event_notifier_set(&svq->hdev_kick);
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Set the device's memory region notifier. addr = NULL clear it.
> > > > > + */
> > > > > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> > > > > +{
> > > > > +    svq->host_notifier_mr = addr;
> > > > >   }
> > > > >
> > > > >   /**
> > > > > @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > >    */
> > > > >   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > >   {
> > > > > +    int vq_idx = dev->vq_index + idx;
> > > > >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >       int r;
> > > > >
> > > > > @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > >           goto err_init_hdev_call;
> > > > >       }
> > > > >
> > > > > +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> > > > >       return g_steal_pointer(&svq);
> > > > >
> > > > >   err_init_hdev_call:
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
@ 2021-11-04  2:31             ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-04  2:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 3:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Nov 3, 2021 at 3:56 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 4:47 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 8:55 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > > > If device supports host notifiers, this makes one jump less (kernel) to
> > > > > deliver SVQ notifications to it.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
> > > > >   2 files changed, 24 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 30ab9643b9..eb0a54f954 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > > > >                                                 const VhostShadowVirtqueue *svq);
> > > > > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> > > > > +
> > > > >   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > >                        VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index fda60d11db..e3dcc039b6 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
> > > > >        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> > > > >        */
> > > > >       EventNotifier svq_kick;
> > > > > +
> > > > > +    /* Device's host notifier memory region. NULL means no region */
> > > > > +    void *host_notifier_mr;
> > > > > +
> > > > > +    /* Virtio queue shadowing */
> > > > > +    VirtQueue *vq;
> > > > >   } VhostShadowVirtqueue;
> > > > >
> > > > >   /**
> > > > > @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > >           return;
> > > > >       }
> > > > >
> > > > > -    event_notifier_set(&svq->hdev_kick);
> > > > > +    if (svq->host_notifier_mr) {
> > > > > +        uint16_t *mr = svq->host_notifier_mr;
> > > > > +        *mr = virtio_get_queue_index(svq->vq);
> > > >
> > > >
> > > > Do we need barriers around the possible MMIO here?
> > >
> > > That's right, I missed them.
> > >
> > > >
> > > > To avoid those complicated stuff, I'd rather simply go with eventfd path.
> > > >
> > > > Note mmio and eventfd are not mutually exclusive.
> > >
> > > Actually we cannot ignore them since they are set in the guest. If SVQ
> > > does nothing about them, the guest's notification will travel directly
> > > to the vdpa device, and SVQ cannot intercept them.
> > >
> > > Taking that into account, it's actually less changes to move them to
> > > SVQ (like in this series) than to disable them (like in previous
> > > series). But we can go with disabling them for sure.
> >
> > I think we can simply disable the memory region for the doorbell, then
> > qemu/kvm will do all the rest for us.
> >
> > If we want to add barriers it would be a lot of architecture specific
> > instructions which looks like a burden for us to maintain in Qemu.
> >
> > So if we disable the memory region, KVM will fallback to the eventfd,
> > then qemu can intercept and we can simply relay it via kickfd. This
> > looks easier to maintain.
> >
> > Thanks
> >
>
> Any reason to go off-list? :).

Hit the wrong button:(

Adding the list back.

>
> I'm fine doing it that way, but it seems to me there must be a way
> since VFIO, UIO, etc would have the same issues. The worst case would
> be that these accesses are resolved through a syscall or similar. How
> does DPDK solve it?

I guess it should have per arch assemblies etc.

> Probably with specific asm as you say...

We can go this way, but to speed up the merging, I'd go with eventfd
first to avoid dependencies. And we can do that in the future as the
performance optimization.

Thanks

>
> Thanks!
>
>
> > >
> > > Thanks!
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > > +    } else {
> > > > > +        event_notifier_set(&svq->hdev_kick);
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Set the device's memory region notifier. addr = NULL clear it.
> > > > > + */
> > > > > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> > > > > +{
> > > > > +    svq->host_notifier_mr = addr;
> > > > >   }
> > > > >
> > > > >   /**
> > > > > @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > >    */
> > > > >   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > >   {
> > > > > +    int vq_idx = dev->vq_index + idx;
> > > > >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >       int r;
> > > > >
> > > > > @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > >           goto err_init_hdev_call;
> > > > >       }
> > > > >
> > > > > +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> > > > >       return g_steal_pointer(&svq);
> > > > >
> > > > >   err_init_hdev_call:
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2021-11-03  7:43         ` Eugenio Perez Martin
@ 2021-11-04  2:34             ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-04  2:34 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 3:44 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Nov 3, 2021 at 4:18 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > This allows it to test if the guest has aknowledge an invalid transport
> > > > > feature for SVQ. This will include packed vq layout or event_idx,
> > > > > where VirtIO device needs help from SVQ.
> > > > >
> > > > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > > > features again with the guest, a failure in acknowledge them is fatal
> > > > > for SVQ.
> > > > >
> > > >
> > > > It's not clear to me why we need this. Maybe you can give me an
> > > > example. E.g isn't it sufficient to filter out the device with
> > > > event_idx?
> > > >
> > >
> > > If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> > > only when device marks as used a specific number of descriptors.
> > >
> > > If we use VirtQueue notification, the VirtQueue code handles it
> > > transparently. But if we want to be able to change the guest VQ's
> > > call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> > > code. And that is still not implemented.
> > >
> > > Of course in the event_idx case we could just ignore it and notify in
> > > all used descriptors, but it seems not polite to me :). I will develop
> > > event_idx on top of this, either exposing the needed pieces in
> > > VirtQueue (I prefer this) or rolling our own in SVQ.
> >
> > Yes, but what I meant is, we can fail the SVQ enabling if the device
> > supports event_idx. Then we're sure guests won't negotiate event_idx.
> >
>
> We can go that way for sure, but then we leave out the scenario where
> the device supports event_idx but the guest has not acked it. This is
> a valid scenario for SVQ to work in.

If SVQ supports event idx in the future, we can remove it from the
blacklist. But I think it should be simpler to let SVQ use the same
features as guests. So in this case SVQ won't use the event index.

Thanks

>
> Thanks!
>
> > Thanks
> >
> > >
> > > Same reasoning can be applied to unknown transport features.
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > > > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > > > >  2 files changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 946b2c6295..ac55588009 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -16,6 +16,7 @@
> > > > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > >
> > > > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > > > >
> > > > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index 6e0508a231..cb9ffcb015 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > >      return true;
> > > > >  }
> > > > >
> > > > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > > > +{
> > > > > +    return true;
> > > > > +}
> > > > > +
> > > > >  /* Forward guest notifications */
> > > > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > > > >  {
> > > > > --
> > > > > 2.27.0
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
@ 2021-11-04  2:34             ` Jason Wang
  0 siblings, 0 replies; 82+ messages in thread
From: Jason Wang @ 2021-11-04  2:34 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-devel, Peter Xu,
	Markus Armbruster, Stefan Hajnoczi, Xiao W Wang,
	Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Stefano Garzarella, Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 3:44 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Nov 3, 2021 at 4:18 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > This allows it to test if the guest has aknowledge an invalid transport
> > > > > feature for SVQ. This will include packed vq layout or event_idx,
> > > > > where VirtIO device needs help from SVQ.
> > > > >
> > > > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > > > features again with the guest, a failure in acknowledge them is fatal
> > > > > for SVQ.
> > > > >
> > > >
> > > > It's not clear to me why we need this. Maybe you can give me an
> > > > example. E.g isn't it sufficient to filter out the device with
> > > > event_idx?
> > > >
> > >
> > > If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> > > only when device marks as used a specific number of descriptors.
> > >
> > > If we use VirtQueue notification, the VirtQueue code handles it
> > > transparently. But if we want to be able to change the guest VQ's
> > > call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> > > code. And that is still not implemented.
> > >
> > > Of course in the event_idx case we could just ignore it and notify in
> > > all used descriptors, but it seems not polite to me :). I will develop
> > > event_idx on top of this, either exposing the needed pieces in
> > > VirtQueue (I prefer this) or rolling our own in SVQ.
> >
> > Yes, but what I meant is, we can fail the SVQ enabling if the device
> > supports event_idx. Then we're sure guests won't negotiate event_idx.
> >
>
> We can go that way for sure, but then we leave out the scenario where
> the device supports event_idx but the guest has not acked it. This is
> a valid scenario for SVQ to work in.

If SVQ supports event idx in the future, we can remove it from the
blacklist. But I think it should be simpler to let SVQ use the same
features as guests. So in this case SVQ won't use the event index.

Thanks

>
> Thanks!
>
> > Thanks
> >
> > >
> > > Same reasoning can be applied to unknown transport features.
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > > > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > > > >  2 files changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 946b2c6295..ac55588009 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -16,6 +16,7 @@
> > > > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > >
> > > > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > > > >
> > > > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index 6e0508a231..cb9ffcb015 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > >      return true;
> > > > >  }
> > > > >
> > > > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > > > +{
> > > > > +    return true;
> > > > > +}
> > > > > +
> > > > >  /* Forward guest notifications */
> > > > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > > > >  {
> > > > > --
> > > > > 2.27.0
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-10-29 18:35 ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Eugenio Pérez
@ 2021-11-23  6:56     ` Peter Xu
  2021-11-23  6:56     ` Peter Xu
  2022-01-27  8:57     ` Peter Xu
  2 siblings, 0 replies; 82+ messages in thread
From: Peter Xu @ 2021-11-23  6:56 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

Hi, Eugenio,

Sorry for the super late response.

On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:

[...]

> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }

Any explanation of why zero is not welcomed?

It would be great if we can move this out of iova-tree.c, because that doesn't
look like a good place to, e.g. reference qemu_real_host_page_size, anyways.
The caller can simply pass in qemu_real_host_page_size as iova_begin when
needed (and I highly doubt it'll be a must for all iova-tree users..)

> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.
> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.
> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {
> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}

Re the algorithm - I totally agree Jason's version is much better.

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
@ 2021-11-23  6:56     ` Peter Xu
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Xu @ 2021-11-23  6:56 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

Hi, Eugenio,

Sorry for the super late response.

On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:

[...]

> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }

Any explanation of why zero is not welcomed?

It would be great if we can move this out of iova-tree.c, because that doesn't
look like a good place to, e.g. reference qemu_real_host_page_size, anyways.
The caller can simply pass in qemu_real_host_page_size as iova_begin when
needed (and I highly doubt it'll be a must for all iova-tree users..)

> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.
> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.
> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {
> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}

Re the algorithm - I totally agree Jason's version is much better.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-11-23  6:56     ` Peter Xu
  (?)
@ 2021-11-23  7:08     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2021-11-23  7:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, qemu-level, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Tue, Nov 23, 2021 at 7:57 AM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Eugenio,
>
> Sorry for the super late response.
>

No problem!

> On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:
>
> [...]
>
> > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > +                    hwaddr iova_last)
> > +{
> > +    struct IOVATreeAllocArgs args = {
> > +        .new_size = map->size,
> > +        .iova_begin = iova_begin,
> > +        .iova_last = iova_last,
> > +    };
> > +
> > +    if (iova_begin == 0) {
> > +        /* Some devices does not like addr 0 */
> > +        iova_begin += qemu_real_host_page_size;
> > +    }
>
> Any explanation of why zero is not welcomed?
>

I didn't investigate too much, but neither vhost-net or qemu device
accepted a ring with address 0. Probably it's because some test like:

if (!vq->desc) { return; }

That assumes 0 == not initialized. Even if we fix that issue in the
devices, the vdpa device backend could be an old version, and since
the iova range should be big anyway I think we should skip 0 anyway.

> It would be great if we can move this out of iova-tree.c, because that doesn't
> look like a good place to, e.g. reference qemu_real_host_page_size, anyways.
> The caller can simply pass in qemu_real_host_page_size as iova_begin when
> needed (and I highly doubt it'll be a must for all iova-tree users..)
>

I think yes, it can be included in iova_begin. I'll rework that part.

> > +
> > +    assert(iova_begin < iova_last);
> > +
> > +    /*
> > +     * Find a valid hole for the mapping
> > +     *
> > +     * Assuming low iova_begin, so no need to do a binary search to
> > +     * locate the first node.
> > +     *
> > +     * TODO: We can improve the search speed if we save the beginning and the
> > +     * end of holes, so we don't iterate over the previous saved ones.
> > +     *
> > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > +     * code a lot.
> > +     *
> > +     */
> > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > +    if (!iova_tree_alloc_map_in_hole(&args)) {
> > +        /*
> > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > +         * (right, end) hole needs to be checked too
> > +         */
> > +        iova_tree_alloc_args_iterate(&args, NULL);
> > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > +            return IOVA_ERR_NOMEM;
> > +        }
> > +    }
> > +
> > +    map->iova = MAX(iova_begin,
> > +                    args.hole_left ?
> > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > +    return iova_tree_insert(tree, map);
> > +}
>
> Re the algorithm - I totally agree Jason's version is much better.
>

I'll try to accommodate it, but (if I understood it correctly) it
needs to deal with deallocation and a few other things. But it should
be doable.

Thanks!

> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2021-10-29 18:35 ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Eugenio Pérez
@ 2022-01-27  8:57     ` Peter Xu
  2021-11-23  6:56     ` Peter Xu
  2022-01-27  8:57     ` Peter Xu
  2 siblings, 0 replies; 82+ messages in thread
From: Peter Xu @ 2022-01-27  8:57 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
> 
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/qemu/iova-tree.h |  17 +++++
>  util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index 8249edd764..33f9b2e13f 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -29,6 +29,7 @@
>  #define  IOVA_OK           (0)
>  #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
>  #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
>  
>  typedef struct IOVATree IOVATree;
>  typedef struct DMAMap {
> @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
>   */
>  void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
>  
> +/**
> + * iova_tree_alloc:
> + *
> + * @tree: the iova tree to allocate from
> + * @map: the new map (as translated addr & size) to allocate in iova region
> + * @iova_begin: the minimum address of the allocation
> + * @iova_end: the maximum addressable direction of the allocation
> + *
> + * Allocates a new region of a given size, between iova_min and iova_max.
> + *
> + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> + * free contiguous range. Caller can get the assigned iova in map->iova.
> + */
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_end);
> +
>  /**
>   * iova_tree_destroy:
>   *
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 23ea35b7a4..27c921c4e2 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -16,6 +16,36 @@ struct IOVATree {
>      GTree *tree;
>  };
>  
> +/* Args to pass to iova_tree_alloc foreach function. */
> +struct IOVATreeAllocArgs {
> +    /* Size of the desired allocation */
> +    size_t new_size;
> +
> +    /* The minimum address allowed in the allocation */
> +    hwaddr iova_begin;
> +
> +    /* The last addressable allowed in the allocation */
> +    hwaddr iova_last;
> +
> +    /* Previously-to-last iterated map, can be NULL in the first node */
> +    const DMAMap *hole_left;
> +
> +    /* Last iterated map */
> +    const DMAMap *hole_right;

I slightly prefer having two more fields to cache the result:

       /* If found, we fill in the IOVA here */
       hwaddr iova_result;
       /* Whether have we found a valid IOVA */
       bool   iova_found;

IMHO they'll help on readability.  More below.

> +};
> +
> +/**
> + * Iterate args to tne next hole
> + *
> + * @args  The alloc arguments
> + * @next  The next mapping in the tree. Can be NULL to signal the last one
> + */
> +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> +                                         const DMAMap *next) {
> +    args->hole_left = args->hole_right;
> +    args->hole_right = next;
> +}
> +
>  static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
>  {
>      const DMAMap *m1 = a, *m2 = b;
> @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
>      return IOVA_OK;
>  }
>  
> +/**
> + * Try to accomodate a map of size ret->size in a hole between
> + * max(end(hole_left), iova_start).

I think this functions need the most comments, and above sentence is more or
less not sounding correct... My try...

/*
 * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
 *
 * There're three cases:
 *
 * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
 *     the 1st element.
 *
 * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
 *     the last element.
 *
 * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
 *     we'll try to find a hole between LEFT and RIGHT mapping.
 */

> + *
> + * @args Arguments to allocation
> + */
> +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> +{
> +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> +    uint64_t hole_start, hole_last;
> +
> +    if (right && right->iova + right->size < args->iova_begin) {
> +        return false;
> +    }
> +
> +    if (left && left->iova > args->iova_last) {
> +        return false;
> +    }
> +
> +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);

I assume these values should be always inclusive, hence

s/right->iova/right->iova + 1/

?

> +
> +    if (hole_last - hole_start > args->new_size) {
> +        /* We found a valid hole. */

IMHO it's cleaner we simply set:

           args->iova_result = hole_start;

Here before stop the iterations.

> +        return true;
> +    }
> +
> +    /* Keep iterating */
> +    return false;
> +}
> +
> +/**
> + * Foreach dma node in the tree, compare if there is a hole wit its previous
> + * node (or minimum iova address allowed) and the node.
> + *
> + * @key   Node iterating
> + * @value Node iterating
> + * @pargs Struct to communicate with the outside world
> + *
> + * Return: false to keep iterating, true if needs break.
> + */
> +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> +                                         gpointer pargs)
> +{
> +    struct IOVATreeAllocArgs *args = pargs;
> +    DMAMap *node = value;
> +
> +    assert(key == value);
> +
> +    iova_tree_alloc_args_iterate(args, node);
> +    if (args->hole_left && args->hole_left->iova > args->iova_last) {

IMHO this check is redundant and can be dropped, as it's already done in
iova_tree_alloc_map_in_hole().

> +        return true;
> +    }
> +
> +    if (iova_tree_alloc_map_in_hole(args)) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }

(This should be dropped as the new version goes)

> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.

We could also mention something like this here:

        *
        * The traversing will cover all the possible holes but except the last
        * hole starting from the last element.  We need to handle it separately
        * below.
        *

> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.
> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {

With iova_found, here it could be (hopefully) more readable:

       if (!args->iova_found) {
           /* If we failed to find a hole in 0..N-1 entries, try the last one */
           iova_tree_alloc_args_iterate(&args, NULL);
           iova_tree_alloc_map_in_hole(&args);
           if (!args->iova_found) {
               return IOVA_ERR_NOMEM;
           }
       }

       map->iova = args->iova_result;
       ...

Thanks,

> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}
> +
>  void iova_tree_destroy(IOVATree *tree)
>  {
>      g_tree_destroy(tree->tree);
> -- 
> 2.27.0
> 

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
@ 2022-01-27  8:57     ` Peter Xu
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Xu @ 2022-01-27  8:57 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
> 
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/qemu/iova-tree.h |  17 +++++
>  util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index 8249edd764..33f9b2e13f 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -29,6 +29,7 @@
>  #define  IOVA_OK           (0)
>  #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
>  #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
>  
>  typedef struct IOVATree IOVATree;
>  typedef struct DMAMap {
> @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
>   */
>  void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
>  
> +/**
> + * iova_tree_alloc:
> + *
> + * @tree: the iova tree to allocate from
> + * @map: the new map (as translated addr & size) to allocate in iova region
> + * @iova_begin: the minimum address of the allocation
> + * @iova_end: the maximum addressable direction of the allocation
> + *
> + * Allocates a new region of a given size, between iova_min and iova_max.
> + *
> + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> + * free contiguous range. Caller can get the assigned iova in map->iova.
> + */
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_end);
> +
>  /**
>   * iova_tree_destroy:
>   *
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 23ea35b7a4..27c921c4e2 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -16,6 +16,36 @@ struct IOVATree {
>      GTree *tree;
>  };
>  
> +/* Args to pass to iova_tree_alloc foreach function. */
> +struct IOVATreeAllocArgs {
> +    /* Size of the desired allocation */
> +    size_t new_size;
> +
> +    /* The minimum address allowed in the allocation */
> +    hwaddr iova_begin;
> +
> +    /* The last addressable allowed in the allocation */
> +    hwaddr iova_last;
> +
> +    /* Previously-to-last iterated map, can be NULL in the first node */
> +    const DMAMap *hole_left;
> +
> +    /* Last iterated map */
> +    const DMAMap *hole_right;

I slightly prefer having two more fields to cache the result:

       /* If found, we fill in the IOVA here */
       hwaddr iova_result;
       /* Whether have we found a valid IOVA */
       bool   iova_found;

IMHO they'll help on readability.  More below.

> +};
> +
> +/**
> + * Iterate args to tne next hole
> + *
> + * @args  The alloc arguments
> + * @next  The next mapping in the tree. Can be NULL to signal the last one
> + */
> +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> +                                         const DMAMap *next) {
> +    args->hole_left = args->hole_right;
> +    args->hole_right = next;
> +}
> +
>  static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
>  {
>      const DMAMap *m1 = a, *m2 = b;
> @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
>      return IOVA_OK;
>  }
>  
> +/**
> + * Try to accomodate a map of size ret->size in a hole between
> + * max(end(hole_left), iova_start).

I think this functions need the most comments, and above sentence is more or
less not sounding correct... My try...

/*
 * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
 *
 * There're three cases:
 *
 * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
 *     the 1st element.
 *
 * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
 *     the last element.
 *
 * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
 *     we'll try to find a hole between LEFT and RIGHT mapping.
 */

> + *
> + * @args Arguments to allocation
> + */
> +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> +{
> +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> +    uint64_t hole_start, hole_last;
> +
> +    if (right && right->iova + right->size < args->iova_begin) {
> +        return false;
> +    }
> +
> +    if (left && left->iova > args->iova_last) {
> +        return false;
> +    }
> +
> +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);

I assume these values should be always inclusive, hence

s/right->iova/right->iova + 1/

?

> +
> +    if (hole_last - hole_start > args->new_size) {
> +        /* We found a valid hole. */

IMHO it's cleaner we simply set:

           args->iova_result = hole_start;

Here before stop the iterations.

> +        return true;
> +    }
> +
> +    /* Keep iterating */
> +    return false;
> +}
> +
> +/**
> + * Foreach dma node in the tree, compare if there is a hole wit its previous
> + * node (or minimum iova address allowed) and the node.
> + *
> + * @key   Node iterating
> + * @value Node iterating
> + * @pargs Struct to communicate with the outside world
> + *
> + * Return: false to keep iterating, true if needs break.
> + */
> +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> +                                         gpointer pargs)
> +{
> +    struct IOVATreeAllocArgs *args = pargs;
> +    DMAMap *node = value;
> +
> +    assert(key == value);
> +
> +    iova_tree_alloc_args_iterate(args, node);
> +    if (args->hole_left && args->hole_left->iova > args->iova_last) {

IMHO this check is redundant and can be dropped, as it's already done in
iova_tree_alloc_map_in_hole().

> +        return true;
> +    }
> +
> +    if (iova_tree_alloc_map_in_hole(args)) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }

(This should be dropped as the new version goes)

> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.

We could also mention something like this here:

        *
        * The traversing will cover all the possible holes but except the last
        * hole starting from the last element.  We need to handle it separately
        * below.
        *

> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.
> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {

With iova_found, here it could be (hopefully) more readable:

       if (!args->iova_found) {
           /* If we failed to find a hole in 0..N-1 entries, try the last one */
           iova_tree_alloc_args_iterate(&args, NULL);
           iova_tree_alloc_map_in_hole(&args);
           if (!args->iova_found) {
               return IOVA_ERR_NOMEM;
           }
       }

       map->iova = args->iova_result;
       ...

Thanks,

> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}
> +
>  void iova_tree_destroy(IOVATree *tree)
>  {
>      g_tree_destroy(tree->tree);
> -- 
> 2.27.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2022-01-27  8:57     ` Peter Xu
  (?)
@ 2022-01-27 10:09     ` Eugenio Perez Martin
  2022-01-27 11:25         ` Peter Xu
  -1 siblings, 1 reply; 82+ messages in thread
From: Eugenio Perez Martin @ 2022-01-27 10:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, qemu-level, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Thu, Jan 27, 2022 at 9:57 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:
> > This iova tree function allows it to look for a hole in allocated
> > regions and return a totally new translation for a given translated
> > address.
> >
> > It's usage is mainly to allow devices to access qemu address space,
> > remapping guest's one into a new iova space where qemu can add chunks of
> > addresses.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  include/qemu/iova-tree.h |  17 +++++
> >  util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 156 insertions(+)
> >
> > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > index 8249edd764..33f9b2e13f 100644
> > --- a/include/qemu/iova-tree.h
> > +++ b/include/qemu/iova-tree.h
> > @@ -29,6 +29,7 @@
> >  #define  IOVA_OK           (0)
> >  #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> >  #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
> >
> >  typedef struct IOVATree IOVATree;
> >  typedef struct DMAMap {
> > @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
> >   */
> >  void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> >
> > +/**
> > + * iova_tree_alloc:
> > + *
> > + * @tree: the iova tree to allocate from
> > + * @map: the new map (as translated addr & size) to allocate in iova region
> > + * @iova_begin: the minimum address of the allocation
> > + * @iova_end: the maximum addressable direction of the allocation
> > + *
> > + * Allocates a new region of a given size, between iova_min and iova_max.
> > + *
> > + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> > + * free contiguous range. Caller can get the assigned iova in map->iova.
> > + */
> > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > +                    hwaddr iova_end);
> > +
> >  /**
> >   * iova_tree_destroy:
> >   *
> > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > index 23ea35b7a4..27c921c4e2 100644
> > --- a/util/iova-tree.c
> > +++ b/util/iova-tree.c
> > @@ -16,6 +16,36 @@ struct IOVATree {
> >      GTree *tree;
> >  };
> >
> > +/* Args to pass to iova_tree_alloc foreach function. */
> > +struct IOVATreeAllocArgs {
> > +    /* Size of the desired allocation */
> > +    size_t new_size;
> > +
> > +    /* The minimum address allowed in the allocation */
> > +    hwaddr iova_begin;
> > +
> > +    /* The last addressable allowed in the allocation */
> > +    hwaddr iova_last;
> > +
> > +    /* Previously-to-last iterated map, can be NULL in the first node */
> > +    const DMAMap *hole_left;
> > +
> > +    /* Last iterated map */
> > +    const DMAMap *hole_right;
>
> I slightly prefer having two more fields to cache the result:
>
>        /* If found, we fill in the IOVA here */
>        hwaddr iova_result;
>        /* Whether have we found a valid IOVA */
>        bool   iova_found;
>
> IMHO they'll help on readability.  More below.
>

Sure, this avoids an extra call.

> > +};
> > +
> > +/**
> > + * Iterate args to tne next hole
> > + *
> > + * @args  The alloc arguments
> > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > + */
> > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > +                                         const DMAMap *next) {
> > +    args->hole_left = args->hole_right;
> > +    args->hole_right = next;
> > +}
> > +
> >  static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
> >  {
> >      const DMAMap *m1 = a, *m2 = b;
> > @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> >      return IOVA_OK;
> >  }
> >
> > +/**
> > + * Try to accomodate a map of size ret->size in a hole between
> > + * max(end(hole_left), iova_start).
>
> I think this functions need the most comments, and above sentence is more or
> less not sounding correct... My try...
>
> /*
>  * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
>  *
>  * There're three cases:
>  *
>  * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
>  *     the 1st element.
>  *
>  * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
>  *     the last element.
>  *
>  * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
>  *     we'll try to find a hole between LEFT and RIGHT mapping.
>  */
>

This is also called with left == NULL and right == NULL in the first
allocation with an empty tree. This allows iova_tree_alloc to have the
same code path both if the three is empty or not.

But I can add the use cases in the doc for sure.

> > + *
> > + * @args Arguments to allocation
> > + */
> > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > +{
> > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > +    uint64_t hole_start, hole_last;
> > +
> > +    if (right && right->iova + right->size < args->iova_begin) {
> > +        return false;
> > +    }
> > +
> > +    if (left && left->iova > args->iova_last) {
> > +        return false;
> > +    }
> > +
> > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
>
> I assume these values should be always inclusive, hence
>
> s/right->iova/right->iova + 1/
>
> ?
>

Right, it is confusing the way it's written. But I think it should be
right->iova - 1 in any case to make it the hole last element, isn't
it?

Would it work better to rename variable hole_last to hole_end? If not,
we have a special case of the second allocation when iova_begin == 0:
- We successfully allocate a DMAMap of size N. By the way the
algorithm works,  it starts at 0, so [0, N] is allocated.
- We try to allocate a second one of size M. At the first iteration,
"right" is the previously allocated DMAMap.
Using the -1 trick we get hole_end == HWADDR_MAX.

> > +
> > +    if (hole_last - hole_start > args->new_size) {
> > +        /* We found a valid hole. */
>
> IMHO it's cleaner we simply set:
>
>            args->iova_result = hole_start;
>
> Here before stop the iterations.
>

I agree.

> > +        return true;
> > +    }
> > +
> > +    /* Keep iterating */
> > +    return false;
> > +}
> > +
> > +/**
> > + * Foreach dma node in the tree, compare if there is a hole wit its previous
> > + * node (or minimum iova address allowed) and the node.
> > + *
> > + * @key   Node iterating
> > + * @value Node iterating
> > + * @pargs Struct to communicate with the outside world
> > + *
> > + * Return: false to keep iterating, true if needs break.
> > + */
> > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > +                                         gpointer pargs)
> > +{
> > +    struct IOVATreeAllocArgs *args = pargs;
> > +    DMAMap *node = value;
> > +
> > +    assert(key == value);
> > +
> > +    iova_tree_alloc_args_iterate(args, node);
> > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
>
> IMHO this check is redundant and can be dropped, as it's already done in
> iova_tree_alloc_map_in_hole().
>

Assuming we add "iova_found" to iova_tree_alloc_map_in_hole to
IOVATreeAllocArgs as you propose, it returns true if we are able to
allocate a DMAMap entry, so no more iterations are needed. But if it
returns false, it simply means that DMAMap cannot be allocated between
left (or iova_begin) and right (iova_end). It doesn't tell if you can
keep iterating or not. In other words, false == keep iterating if you
can.

This other check signals the end of the available hole, and to avoid
iterating beyond iova_last in the (unlikely?) case we have more nodes
to iterate beyond that.

I'll try to make it more explicit.

> > +        return true;
> > +    }
> > +
> > +    if (iova_tree_alloc_map_in_hole(args)) {
> > +        return true;
> > +    }
> > +
> > +    return false;
> > +}
> > +
> > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > +                    hwaddr iova_last)
> > +{
> > +    struct IOVATreeAllocArgs args = {
> > +        .new_size = map->size,
> > +        .iova_begin = iova_begin,
> > +        .iova_last = iova_last,
> > +    };
> > +
> > +    if (iova_begin == 0) {
> > +        /* Some devices does not like addr 0 */
> > +        iova_begin += qemu_real_host_page_size;
> > +    }
>
> (This should be dropped as the new version goes)
>

Agree.

> > +
> > +    assert(iova_begin < iova_last);
> > +
> > +    /*
> > +     * Find a valid hole for the mapping
> > +     *
> > +     * Assuming low iova_begin, so no need to do a binary search to
> > +     * locate the first node.
>
> We could also mention something like this here:
>
>         *
>         * The traversing will cover all the possible holes but except the last
>         * hole starting from the last element.  We need to handle it separately
>         * below.
>         *
>

Ok I will add the comment.

> > +     *
> > +     * TODO: We can improve the search speed if we save the beginning and the
> > +     * end of holes, so we don't iterate over the previous saved ones.
> > +     *
> > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > +     * code a lot.
> > +     *
> > +     */
> > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > +    if (!iova_tree_alloc_map_in_hole(&args)) {
>
> With iova_found, here it could be (hopefully) more readable:
>
>        if (!args->iova_found) {
>            /* If we failed to find a hole in 0..N-1 entries, try the last one */
>            iova_tree_alloc_args_iterate(&args, NULL);
>            iova_tree_alloc_map_in_hole(&args);
>            if (!args->iova_found) {
>                return IOVA_ERR_NOMEM;
>            }
>        }
>
>        map->iova = args->iova_result;
>        ...
>

I also think it's more readable this way.

Thanks!

> Thanks,
>
> > +        /*
> > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > +         * (right, end) hole needs to be checked too
> > +         */
> > +        iova_tree_alloc_args_iterate(&args, NULL);
> > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > +            return IOVA_ERR_NOMEM;
> > +        }
> > +    }
> > +
> > +    map->iova = MAX(iova_begin,
> > +                    args.hole_left ?
> > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > +    return iova_tree_insert(tree, map);
> > +}
> > +
> >  void iova_tree_destroy(IOVATree *tree)
> >  {
> >      g_tree_destroy(tree->tree);
> > --
> > 2.27.0
> >
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2022-01-27 10:09     ` Eugenio Perez Martin
@ 2022-01-27 11:25         ` Peter Xu
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Xu @ 2022-01-27 11:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Eric Blake, virtualization, Parav Pandit

On Thu, Jan 27, 2022 at 11:09:44AM +0100, Eugenio Perez Martin wrote:
> > > +/**
> > > + * Try to accomodate a map of size ret->size in a hole between
> > > + * max(end(hole_left), iova_start).
> >
> > I think this functions need the most comments, and above sentence is more or
> > less not sounding correct... My try...
> >
> > /*
> >  * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
> >  *
> >  * There're three cases:
> >  *
> >  * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
> >  *     the 1st element.
> >  *
> >  * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
> >  *     the last element.
> >  *
> >  * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
> >  *     we'll try to find a hole between LEFT and RIGHT mapping.
> >  */
> >
> 
> This is also called with left == NULL and right == NULL in the first
> allocation with an empty tree. This allows iova_tree_alloc to have the
> same code path both if the three is empty or not.
> 
> But I can add the use cases in the doc for sure.

Ah, right.

> 
> > > + *
> > > + * @args Arguments to allocation
> > > + */
> > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > +{
> > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > +    uint64_t hole_start, hole_last;
> > > +
> > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (left && left->iova > args->iova_last) {
> > > +        return false;
> > > +    }
> > > +
> > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> >
> > I assume these values should be always inclusive, hence
> >
> > s/right->iova/right->iova + 1/
> >
> > ?
> >
> 
> Right, it is confusing the way it's written. But I think it should be
> right->iova - 1 in any case to make it the hole last element, isn't
> it?

I was thinking "-1" but I failed to make it coherent with the thought when
typing.. Heh.

> 
> Would it work better to rename variable hole_last to hole_end? If not,
> we have a special case of the second allocation when iova_begin == 0:
> - We successfully allocate a DMAMap of size N. By the way the
> algorithm works,  it starts at 0, so [0, N] is allocated.

If we're always talking about inclusive ranges, shouldn't it be [0, N-1]?

> - We try to allocate a second one of size M. At the first iteration,
> "right" is the previously allocated DMAMap.
> Using the -1 trick we get hole_end == HWADDR_MAX.

I'm not sure I get the point, but both naming look fine to me.  As long as we
use inclusive ranges, then hole_end/last will be limited to HWADDR_MAX.

> > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > +                                         gpointer pargs)
> > > +{
> > > +    struct IOVATreeAllocArgs *args = pargs;
> > > +    DMAMap *node = value;
> > > +
> > > +    assert(key == value);
> > > +
> > > +    iova_tree_alloc_args_iterate(args, node);
> > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> >
> > IMHO this check is redundant and can be dropped, as it's already done in
> > iova_tree_alloc_map_in_hole().
> >
> 
> Assuming we add "iova_found" to iova_tree_alloc_map_in_hole to
> IOVATreeAllocArgs as you propose, it returns true if we are able to
> allocate a DMAMap entry, so no more iterations are needed. But if it
> returns false, it simply means that DMAMap cannot be allocated between
> left (or iova_begin) and right (iova_end). It doesn't tell if you can
> keep iterating or not. In other words, false == keep iterating if you
> can.
> 
> This other check signals the end of the available hole, and to avoid
> iterating beyond iova_last in the (unlikely?) case we have more nodes
> to iterate beyond that.
> 
> I'll try to make it more explicit.

Makes sense.  Comment works.

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
@ 2022-01-27 11:25         ` Peter Xu
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Xu @ 2022-01-27 11:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, qemu-level, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Thu, Jan 27, 2022 at 11:09:44AM +0100, Eugenio Perez Martin wrote:
> > > +/**
> > > + * Try to accomodate a map of size ret->size in a hole between
> > > + * max(end(hole_left), iova_start).
> >
> > I think this functions need the most comments, and above sentence is more or
> > less not sounding correct... My try...
> >
> > /*
> >  * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
> >  *
> >  * There're three cases:
> >  *
> >  * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
> >  *     the 1st element.
> >  *
> >  * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
> >  *     the last element.
> >  *
> >  * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
> >  *     we'll try to find a hole between LEFT and RIGHT mapping.
> >  */
> >
> 
> This is also called with left == NULL and right == NULL in the first
> allocation with an empty tree. This allows iova_tree_alloc to have the
> same code path both if the three is empty or not.
> 
> But I can add the use cases in the doc for sure.

Ah, right.

> 
> > > + *
> > > + * @args Arguments to allocation
> > > + */
> > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > +{
> > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > +    uint64_t hole_start, hole_last;
> > > +
> > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (left && left->iova > args->iova_last) {
> > > +        return false;
> > > +    }
> > > +
> > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> >
> > I assume these values should be always inclusive, hence
> >
> > s/right->iova/right->iova + 1/
> >
> > ?
> >
> 
> Right, it is confusing the way it's written. But I think it should be
> right->iova - 1 in any case to make it the hole last element, isn't
> it?

I was thinking "-1" but I failed to make it coherent with the thought when
typing.. Heh.

> 
> Would it work better to rename variable hole_last to hole_end? If not,
> we have a special case of the second allocation when iova_begin == 0:
> - We successfully allocate a DMAMap of size N. By the way the
> algorithm works,  it starts at 0, so [0, N] is allocated.

If we're always talking about inclusive ranges, shouldn't it be [0, N-1]?

> - We try to allocate a second one of size M. At the first iteration,
> "right" is the previously allocated DMAMap.
> Using the -1 trick we get hole_end == HWADDR_MAX.

I'm not sure I get the point, but both naming look fine to me.  As long as we
use inclusive ranges, then hole_end/last will be limited to HWADDR_MAX.

> > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > +                                         gpointer pargs)
> > > +{
> > > +    struct IOVATreeAllocArgs *args = pargs;
> > > +    DMAMap *node = value;
> > > +
> > > +    assert(key == value);
> > > +
> > > +    iova_tree_alloc_args_iterate(args, node);
> > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> >
> > IMHO this check is redundant and can be dropped, as it's already done in
> > iova_tree_alloc_map_in_hole().
> >
> 
> Assuming we add "iova_found" to iova_tree_alloc_map_in_hole to
> IOVATreeAllocArgs as you propose, it returns true if we are able to
> allocate a DMAMap entry, so no more iterations are needed. But if it
> returns false, it simply means that DMAMap cannot be allocated between
> left (or iova_begin) and right (iova_end). It doesn't tell if you can
> keep iterating or not. In other words, false == keep iterating if you
> can.
> 
> This other check signals the end of the available hole, and to avoid
> iterating beyond iova_last in the (unlikely?) case we have more nodes
> to iterate beyond that.
> 
> I'll try to make it more explicit.

Makes sense.  Comment works.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
  2022-01-27 11:25         ` Peter Xu
  (?)
@ 2022-01-27 11:45         ` Eugenio Perez Martin
  -1 siblings, 0 replies; 82+ messages in thread
From: Eugenio Perez Martin @ 2022-01-27 11:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin, Jason Wang,
	Juan Quintela, Richard Henderson, qemu-level, Markus Armbruster,
	Stefan Hajnoczi, Xiao W Wang, Harpreet Singh Anand, Eli Cohen,
	Paolo Bonzini, Stefano Garzarella, Eric Blake, virtualization,
	Parav Pandit

On Thu, Jan 27, 2022 at 12:25 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 27, 2022 at 11:09:44AM +0100, Eugenio Perez Martin wrote:
> > > > +/**
> > > > + * Try to accomodate a map of size ret->size in a hole between
> > > > + * max(end(hole_left), iova_start).
> > >
> > > I think this functions need the most comments, and above sentence is more or
> > > less not sounding correct... My try...
> > >
> > > /*
> > >  * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
> > >  *
> > >  * There're three cases:
> > >  *
> > >  * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
> > >  *     the 1st element.
> > >  *
> > >  * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
> > >  *     the last element.
> > >  *
> > >  * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
> > >  *     we'll try to find a hole between LEFT and RIGHT mapping.
> > >  */
> > >
> >
> > This is also called with left == NULL and right == NULL in the first
> > allocation with an empty tree. This allows iova_tree_alloc to have the
> > same code path both if the three is empty or not.
> >
> > But I can add the use cases in the doc for sure.
>
> Ah, right.
>
> >
> > > > + *
> > > > + * @args Arguments to allocation
> > > > + */
> > > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > > +{
> > > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > > +    uint64_t hole_start, hole_last;
> > > > +
> > > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    if (left && left->iova > args->iova_last) {
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> > >
> > > I assume these values should be always inclusive, hence
> > >
> > > s/right->iova/right->iova + 1/
> > >
> > > ?
> > >
> >
> > Right, it is confusing the way it's written. But I think it should be
> > right->iova - 1 in any case to make it the hole last element, isn't
> > it?
>
> I was thinking "-1" but I failed to make it coherent with the thought when
> typing.. Heh.
>
> >
> > Would it work better to rename variable hole_last to hole_end? If not,
> > we have a special case of the second allocation when iova_begin == 0:
> > - We successfully allocate a DMAMap of size N. By the way the
> > algorithm works,  it starts at 0, so [0, N] is allocated.
>
> If we're always talking about inclusive ranges, shouldn't it be [0, N-1]?
>

I meant DMAMap size, which is already inclusive.

> > - We try to allocate a second one of size M. At the first iteration,
> > "right" is the previously allocated DMAMap.
> > Using the -1 trick we get hole_end == HWADDR_MAX.
>
> I'm not sure I get the point, but both naming look fine to me.  As long as we
> use inclusive ranges, then hole_end/last will be limited to HWADDR_MAX.
>

Sorry, I think you were right from the beginning, because with _end we
cannot handle the case of right == NULL well. I'll rewrite with the
-1, taking into account the underflow.

Please let me know if you have more concerns or you come up with more
ideas to improve the patch.

Thanks!

> > > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > > +                                         gpointer pargs)
> > > > +{
> > > > +    struct IOVATreeAllocArgs *args = pargs;
> > > > +    DMAMap *node = value;
> > > > +
> > > > +    assert(key == value);
> > > > +
> > > > +    iova_tree_alloc_args_iterate(args, node);
> > > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> > >
> > > IMHO this check is redundant and can be dropped, as it's already done in
> > > iova_tree_alloc_map_in_hole().
> > >
> >
> > Assuming we add "iova_found" to iova_tree_alloc_map_in_hole to
> > IOVATreeAllocArgs as you propose, it returns true if we are able to
> > allocate a DMAMap entry, so no more iterations are needed. But if it
> > returns false, it simply means that DMAMap cannot be allocated between
> > left (or iova_begin) and right (iova_end). It doesn't tell if you can
> > keep iterating or not. In other words, false == keep iterating if you
> > can.
> >
> > This other check signals the end of the available hole, and to avoid
> > iterating beyond iova_last in the (unlikely?) case we have more nodes
> > to iterate beyond that.
> >
> > I'll try to make it more explicit.
>
> Makes sense.  Comment works.
>
> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2022-01-27 12:06 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29 18:34 [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 01/26] util: Make some iova_tree parameters const Eugenio Pérez
2021-10-31 18:59   ` Juan Quintela
2021-10-31 18:59     ` Juan Quintela
2021-11-01  8:20     ` Eugenio Perez Martin
2021-10-29 18:35 ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Eugenio Pérez
2021-11-02  7:25   ` Juan Quintela
2021-11-02  7:25     ` Juan Quintela
2021-11-02  7:32     ` Michael S. Tsirkin
2021-11-02  7:32       ` Michael S. Tsirkin
2021-11-02  7:39       ` Juan Quintela
2021-11-02  7:39         ` Juan Quintela
2021-11-02  8:34     ` Eugenio Perez Martin
2021-11-02  7:40   ` Juan Quintela
2021-11-02  7:40     ` Juan Quintela
2021-10-29 18:35 ` [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE Eugenio Pérez
2021-11-02  4:57   ` Jason Wang
2021-11-02  4:57     ` Jason Wang
2021-10-29 18:35 ` [RFC PATCH v5 04/26] virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp Eugenio Pérez
2021-11-02  7:36   ` Juan Quintela
2021-11-02  7:36     ` Juan Quintela
2021-11-02  8:29     ` Eugenio Perez Martin
2021-10-29 18:35 ` [RFC PATCH v5 06/26] vhost: Add VhostShadowVirtqueue Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 07/26] vdpa: Save kick_fd in vhost-vdpa Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 08/26] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 09/26] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 10/26] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ Eugenio Pérez
2021-11-02  7:54   ` Jason Wang
2021-11-02  7:54     ` Jason Wang
2021-11-02  8:46     ` Eugenio Perez Martin
     [not found]       ` <CACGkMEvOxUMo1WA4tUfDhw+FOJVW87JJGPw=U3JvUSQTU_ogWQ@mail.gmail.com>
     [not found]         ` <CAJaqyWd4DQwRSL5StCft+3-uq12TW5x1o4DN_YW97D0JzOr2XQ@mail.gmail.com>
2021-11-04  2:31           ` Jason Wang
2021-11-04  2:31             ` Jason Wang
2021-10-29 18:35 ` [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
2021-11-02  5:36   ` Jason Wang
2021-11-02  5:36     ` Jason Wang
2021-11-02  7:35     ` Eugenio Perez Martin
2021-10-29 18:35 ` [RFC PATCH v5 13/26] Add vhost_svq_get_svq_call_notifier Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 14/26] Add vhost_svq_set_guest_call_notifier Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 15/26] vdpa: Save call_fd in vhost-vdpa Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 16/26] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 17/26] vhost: Route host->guest notification through shadow virtqueue Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 18/26] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 19/26] vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 20/26] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
2021-11-02  5:25   ` Jason Wang
2021-11-02  5:25     ` Jason Wang
2021-11-02  8:09     ` Eugenio Perez Martin
2021-11-03  3:18       ` Jason Wang
2021-11-03  3:18         ` Jason Wang
2021-11-03  7:43         ` Eugenio Perez Martin
2021-11-04  2:34           ` Jason Wang
2021-11-04  2:34             ` Jason Wang
2021-10-29 18:35 ` [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
2021-11-02  7:59   ` Jason Wang
2021-11-02  7:59     ` Jason Wang
2021-11-02 10:22     ` Eugenio Perez Martin
2021-10-29 18:35 ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Eugenio Pérez
2021-11-02  6:35   ` Jason Wang
2021-11-02  6:35     ` Jason Wang
2021-11-02  8:28     ` Eugenio Perez Martin
2021-11-03  3:10       ` Jason Wang
2021-11-03  3:10         ` Jason Wang
2021-11-03  7:41         ` Eugenio Perez Martin
2021-11-23  6:56   ` Peter Xu
2021-11-23  6:56     ` Peter Xu
2021-11-23  7:08     ` Eugenio Perez Martin
2022-01-27  8:57   ` Peter Xu
2022-01-27  8:57     ` Peter Xu
2022-01-27 10:09     ` Eugenio Perez Martin
2022-01-27 11:25       ` Peter Xu
2022-01-27 11:25         ` Peter Xu
2022-01-27 11:45         ` Eugenio Perez Martin
2021-10-29 18:35 ` [RFC PATCH v5 24/26] vhost: Add VhostIOVATree Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 25/26] vhost: Use a tree to store memory mappings Eugenio Pérez
2021-10-29 18:35 ` [RFC PATCH v5 26/26] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
2021-11-01  9:06 ` [RFC PATCH v5 00/26] vDPA shadow virtqueue Eugenio Perez Martin
2021-11-02  4:25 ` Jason Wang
2021-11-02  4:25   ` Jason Wang
2021-11-02 11:21   ` Eugenio Perez Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.