All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/31] vDPA shadow virtqueue
@ 2022-01-21 20:27 Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions Eugenio Pérez
                   ` (31 more replies)
  0 siblings, 32 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's emulated virtio device
operation.

When SVQ is enabled qemu offers a new virtual address space to the
device to read and write into, and it maps new vrings and the guest
memory in it. SVQ also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked, but at this RFC SVQ is not enabled on migration automatically.

Thanks of being a buffers relay system, SVQ can be used also to
communicate devices and drivers with different capabilities, like
devices that only support packed vring and not split and old guests with
no driver packed support.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

This version of SVQ is limited in the amount of features it can use with
guest and device, because this series is already very big otherwise.
Features like indirect or event_idx will be addressed in future series.

SVQ needs to be enabled with cmdline parameter x-svq, like:

-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true

In this version it cannot be enabled or disabled in runtime. Further
series will remove this limitation and will enable it only for migration
time.

Some patches are intentionally very small to ease review, but they can
be squashed if preferred.

Patches 1-10 prepares the SVQ and QEMU to support both guest to device
and device to guest notifications forwarding, with the extra qemu hop.
That part can be tested in isolation if cmdline change is reproduced.

Patches from 11 to 18 implement the actual buffer forwarding, but with
no IOMMU support. It requires a vdpa device capable of addressing all
qemu vaddr.

Patches 19 to 23 adds the iommu support, so the device with address
range limitations can access SVQ through this new virtual address space
created.

The rest of the series add the last pieces needed for migration.

Comments are welcome.

TODO:
* Event, indirect, packed, and other features of virtio.
* To separate buffers forwarding in its own AIO context, so we can
  throw more threads to that task and we don't need to stop the main
  event loop.
* Support virtio-net control vq.
* Proper documentation.

Changes from v5 RFC:
* Remove dynamic enablement of SVQ, making less dependent of the device.
* Enable live migration if SVQ is enabled.
* Fix SVQ when driver reset.
* Comments addressed, specially in the iova area.
* Rebase on latest master, adding multiqueue support (but no networking
  control vq processing).
v5 link:
https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html

Changes from v4 RFC:
* Support of allocating / freeing iova ranges in IOVA tree. Extending
  already present iova-tree for that.
* Proper validation of guest features. Now SVQ can negotiate a
  different set of features with the device when enabled.
* Support of host notifiers memory regions
* Handling of SVQ full queue in case guest's descriptors span to
  different memory regions (qemu's VA chunks).
* Flush pending used buffers at end of SVQ operation.
* QMP command now looks by NetClientState name. Other devices will need
  to implement it's way to enable vdpa.
* Rename QMP command to set, so it looks more like a way of working
* Better use of qemu error system
* Make a few assertions proper error-handling paths.
* Add more documentation
* Less coupling of virtio / vhost, that could cause friction on changes
* Addressed many other small comments and small fixes.

Changes from v3 RFC:
  * Move everything to vhost-vdpa backend. A big change, this allowed
    some cleanup but more code has been added in other places.
  * More use of glib utilities, especially to manage memory.
v3 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html

Changes from v2 RFC:
  * Adding vhost-vdpa devices support
  * Fixed some memory leaks pointed by different comments
v2 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html

Changes from v1 RFC:
  * Use QMP instead of migration to start SVQ mode.
  * Only accepting IOMMU devices, closer behavior with target devices
    (vDPA)
  * Fix invalid masking/unmasking of vhost call fd.
  * Use of proper methods for synchronization.
  * No need to modify VirtIO device code, all of the changes are
    contained in vhost code.
  * Delete superfluous code.
  * An intermediate RFC was sent with only the notifications forwarding
    changes. It can be seen in
    https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
v1 link:
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html

Eugenio Pérez (20):
      virtio: Add VIRTIO_F_QUEUE_STATE
      virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
      virtio: Add virtio_queue_is_host_notifier_enabled
      vhost: Make vhost_virtqueue_{start,stop} public
      vhost: Add x-vhost-enable-shadow-vq qmp
      vhost: Add VhostShadowVirtqueue
      vdpa: Register vdpa devices in a list
      vhost: Route guest->host notification through shadow virtqueue
      Add vhost_svq_get_svq_call_notifier
      Add vhost_svq_set_guest_call_notifier
      vdpa: Save call_fd in vhost-vdpa
      vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
      vhost: Route host->guest notification through shadow virtqueue
      virtio: Add vhost_shadow_vq_get_vring_addr
      vdpa: Save host and guest features
      vhost: Add vhost_svq_valid_device_features to shadow vq
      vhost: Shadow virtqueue buffers forwarding
      vhost: Add VhostIOVATree
      vhost: Use a tree to store memory mappings
      vdpa: Add custom IOTLB translations to SVQ

Eugenio Pérez (31):
  vdpa: Reorder virtio/vhost-vdpa.c functions
  vhost: Add VhostShadowVirtqueue
  vdpa: Add vhost_svq_get_dev_kick_notifier
  vdpa: Add vhost_svq_set_svq_kick_fd
  vhost: Add Shadow VirtQueue kick forwarding capabilities
  vhost: Route guest->host notification through shadow virtqueue
  vhost: dd vhost_svq_get_svq_call_notifier
  vhost: Add vhost_svq_set_guest_call_notifier
  vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  vhost: Route host->guest notification through shadow virtqueue
  vhost: Add vhost_svq_valid_device_features to shadow vq
  vhost: Add vhost_svq_valid_guest_features to shadow vq
  vhost: Add vhost_svq_ack_guest_features to shadow vq
  virtio: Add vhost_shadow_vq_get_vring_addr
  vdpa: Add vhost_svq_get_num
  vhost: pass queue index to vhost_vq_get_addr
  vdpa: adapt vhost_ops callbacks to svq
  vhost: Shadow virtqueue buffers forwarding
  utils: Add internal DMAMap to iova-tree
  util: Store DMA entries in a list
  util: Add iova_tree_alloc
  vhost: Add VhostIOVATree
  vdpa: Add custom IOTLB translations to SVQ
  vhost: Add vhost_svq_get_last_used_idx
  vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
  vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
  vdpa: Never set log_base addr if SVQ is enabled
  vdpa: Expose VHOST_F_LOG_ALL on SVQ
  vdpa: Make ncs autofree
  vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
  vdpa: Add x-svq to NetdevVhostVDPAOptions

 qapi/net.json                      |   5 +-
 hw/virtio/vhost-iova-tree.h        |  27 +
 hw/virtio/vhost-shadow-virtqueue.h |  46 ++
 include/hw/virtio/vhost-vdpa.h     |   7 +
 include/qemu/iova-tree.h           |  17 +
 hw/virtio/vhost-iova-tree.c        | 157 ++++++
 hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
 hw/virtio/vhost.c                  |   6 +-
 net/vhost-vdpa.c                   |  58 ++-
 util/iova-tree.c                   | 161 +++++-
 hw/virtio/meson.build              |   2 +-
 12 files changed, 1852 insertions(+), 135 deletions(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-iova-tree.c
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

-- 
2.27.0




^ permalink raw reply	[flat|nested] 182+ messages in thread

* [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-28  5:59     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Eugenio Pérez
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

vhost_vdpa_set_features and vhost_vdpa_init need to use
vhost_vdpa_get_features in svq mode.

vhost_vdpa_dev_start needs to use almost all _set_ functions:
vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.

No functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
 1 file changed, 82 insertions(+), 82 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 04ea43704f..6c10a7f05f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
     return v->index != 0;
 }
 
-static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
-{
-    struct vhost_vdpa *v;
-    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
-    trace_vhost_vdpa_init(dev, opaque);
-    int ret;
-
-    /*
-     * Similar to VFIO, we end up pinning all guest memory and have to
-     * disable discarding of RAM.
-     */
-    ret = ram_block_discard_disable(true);
-    if (ret) {
-        error_report("Cannot set discarding of RAM broken");
-        return ret;
-    }
-
-    v = opaque;
-    v->dev = dev;
-    dev->opaque =  opaque ;
-    v->listener = vhost_vdpa_memory_listener;
-    v->msg_type = VHOST_IOTLB_MSG_V2;
-
-    vhost_vdpa_get_iova_range(v);
-
-    if (vhost_vdpa_one_time_request(dev)) {
-        return 0;
-    }
-
-    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                               VIRTIO_CONFIG_S_DRIVER);
-
-    return 0;
-}
-
 static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
                                             int queue_index)
 {
@@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
     return 0;
 }
 
-static int vhost_vdpa_set_features(struct vhost_dev *dev,
-                                   uint64_t features)
-{
-    int ret;
-
-    if (vhost_vdpa_one_time_request(dev)) {
-        return 0;
-    }
-
-    trace_vhost_vdpa_set_features(dev, features);
-    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
-    if (ret) {
-        return ret;
-    }
-
-    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
-}
-
 static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
     uint64_t features;
@@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
     return ret;
  }
 
-static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
-{
-    struct vhost_vdpa *v = dev->opaque;
-    trace_vhost_vdpa_dev_start(dev, started);
-
-    if (started) {
-        vhost_vdpa_host_notifiers_init(dev);
-        vhost_vdpa_set_vring_ready(dev);
-    } else {
-        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
-    }
-
-    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
-        return 0;
-    }
-
-    if (started) {
-        memory_listener_register(&v->listener, &address_space_memory);
-        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-    } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
-        memory_listener_unregister(&v->listener);
-
-        return 0;
-    }
-}
-
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
                                      struct vhost_log *log)
 {
@@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
     return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }
 
+static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    trace_vhost_vdpa_dev_start(dev, started);
+
+    if (started) {
+        vhost_vdpa_host_notifiers_init(dev);
+        vhost_vdpa_set_vring_ready(dev);
+    } else {
+        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
+    }
+
+    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+        return 0;
+    }
+
+    if (started) {
+        memory_listener_register(&v->listener, &address_space_memory);
+        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+    } else {
+        vhost_vdpa_reset_device(dev);
+        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                                   VIRTIO_CONFIG_S_DRIVER);
+        memory_listener_unregister(&v->listener);
+
+        return 0;
+    }
+}
+
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
                                      uint64_t *features)
 {
@@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
     return ret;
 }
 
+static int vhost_vdpa_set_features(struct vhost_dev *dev,
+                                   uint64_t features)
+{
+    int ret;
+
+    if (vhost_vdpa_one_time_request(dev)) {
+        return 0;
+    }
+
+    trace_vhost_vdpa_set_features(dev, features);
+    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
+    if (ret) {
+        return ret;
+    }
+
+    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+}
+
 static int vhost_vdpa_set_owner(struct vhost_dev *dev)
 {
     if (vhost_vdpa_one_time_request(dev)) {
@@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
     return true;
 }
 
+static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
+{
+    struct vhost_vdpa *v;
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+    trace_vhost_vdpa_init(dev, opaque);
+    int ret;
+
+    /*
+     * Similar to VFIO, we end up pinning all guest memory and have to
+     * disable discarding of RAM.
+     */
+    ret = ram_block_discard_disable(true);
+    if (ret) {
+        error_report("Cannot set discarding of RAM broken");
+        return ret;
+    }
+
+    v = opaque;
+    v->dev = dev;
+    dev->opaque =  opaque ;
+    v->listener = vhost_vdpa_memory_listener;
+    v->msg_type = VHOST_IOTLB_MSG_V2;
+
+    vhost_vdpa_get_iova_range(v);
+
+    if (vhost_vdpa_one_time_request(dev)) {
+        return 0;
+    }
+
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                               VIRTIO_CONFIG_S_DRIVER);
+
+    return 0;
+}
+
 const VhostOps vdpa_ops = {
         .backend_type = VHOST_BACKEND_TYPE_VDPA,
         .vhost_backend_init = vhost_vdpa_init,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 02/31] vhost: Add VhostShadowVirtqueue
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-26  8:53   ` Eugenio Perez Martin
  2022-01-28  6:00     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
                   ` (29 subsequent siblings)
  31 siblings, 2 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
notifications and buffers, allowing qemu to track them. While qemu is
forwarding the buffers and virtqueue changes, it is able to commit the
memory it's being dirtied, the same way regular qemu's VirtIO devices
do.

This commit only exposes basic SVQ allocation and free. Next patches of
the series add functionality like notifications and buffers forwarding.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
 hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
 hw/virtio/meson.build              |  2 +-
 3 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
new file mode 100644
index 0000000000..61ea112002
--- /dev/null
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -0,0 +1,21 @@
+/*
+ * vhost shadow virtqueue
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef VHOST_SHADOW_VIRTQUEUE_H
+#define VHOST_SHADOW_VIRTQUEUE_H
+
+#include "hw/virtio/vhost.h"
+
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+
+VhostShadowVirtqueue *vhost_svq_new(void);
+
+void vhost_svq_free(VhostShadowVirtqueue *vq);
+
+#endif
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
new file mode 100644
index 0000000000..5ee7b401cb
--- /dev/null
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -0,0 +1,64 @@
+/*
+ * vhost shadow virtqueue
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
+
+#include "qemu/error-report.h"
+#include "qemu/event_notifier.h"
+
+/* Shadow virtqueue to relay notifications */
+typedef struct VhostShadowVirtqueue {
+    /* Shadow kick notifier, sent to vhost */
+    EventNotifier hdev_kick;
+    /* Shadow call notifier, sent to vhost */
+    EventNotifier hdev_call;
+} VhostShadowVirtqueue;
+
+/**
+ * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
+ * methods and file descriptors.
+ */
+VhostShadowVirtqueue *vhost_svq_new(void)
+{
+    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
+    int r;
+
+    r = event_notifier_init(&svq->hdev_kick, 0);
+    if (r != 0) {
+        error_report("Couldn't create kick event notifier: %s",
+                     strerror(errno));
+        goto err_init_hdev_kick;
+    }
+
+    r = event_notifier_init(&svq->hdev_call, 0);
+    if (r != 0) {
+        error_report("Couldn't create call event notifier: %s",
+                     strerror(errno));
+        goto err_init_hdev_call;
+    }
+
+    return g_steal_pointer(&svq);
+
+err_init_hdev_call:
+    event_notifier_cleanup(&svq->hdev_kick);
+
+err_init_hdev_kick:
+    return NULL;
+}
+
+/**
+ * Free the resources of the shadow virtqueue.
+ */
+void vhost_svq_free(VhostShadowVirtqueue *vq)
+{
+    event_notifier_cleanup(&vq->hdev_kick);
+    event_notifier_cleanup(&vq->hdev_call);
+    g_free(vq);
+}
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 521f7d64a8..2dc87613bc 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-28  6:03     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
                   ` (28 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Is needed so vhost-vdpa knows the device's kick event fd.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
 hw/virtio/vhost-shadow-virtqueue.c | 10 +++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 61ea112002..400effd9f2 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -11,9 +11,13 @@
 #define VHOST_SHADOW_VIRTQUEUE_H
 
 #include "hw/virtio/vhost.h"
+#include "qemu/event_notifier.h"
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
+const EventNotifier *vhost_svq_get_dev_kick_notifier(
+                                              const VhostShadowVirtqueue *svq);
+
 VhostShadowVirtqueue *vhost_svq_new(void);
 
 void vhost_svq_free(VhostShadowVirtqueue *vq);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 5ee7b401cb..bd87110073 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,7 +11,6 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
 #include "qemu/error-report.h"
-#include "qemu/event_notifier.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
@@ -21,6 +20,15 @@ typedef struct VhostShadowVirtqueue {
     EventNotifier hdev_call;
 } VhostShadowVirtqueue;
 
+/**
+ * The notifier that SVQ will use to notify the device.
+ */
+const EventNotifier *vhost_svq_get_dev_kick_notifier(
+                                               const VhostShadowVirtqueue *svq)
+{
+    return &svq->hdev_kick;
+}
+
 /**
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-28  6:29     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This function allows the vhost-vdpa backend to override kick_fd.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 400effd9f2..a56ecfc09d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,7 @@
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index bd87110073..21534bc94d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,7 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
@@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
     EventNotifier hdev_kick;
     /* Shadow call notifier, sent to vhost */
     EventNotifier hdev_call;
+
+    /*
+     * Borrowed virtqueue's guest to host notifier.
+     * To borrow it in this event notifier allows to register on the event
+     * loop and access the associated shadow virtqueue easily. If we use the
+     * VirtQueue, we don't have an easy way to retrieve it.
+     *
+     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
+     */
+    EventNotifier svq_kick;
 } VhostShadowVirtqueue;
 
+#define INVALID_SVQ_KICK_FD -1
+
 /**
  * The notifier that SVQ will use to notify the device.
  */
@@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+/**
+ * Set a new file descriptor for the guest to kick SVQ and notify for avail
+ *
+ * @svq          The svq
+ * @svq_kick_fd  The new svq kick fd
+ */
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+    EventNotifier tmp;
+    bool check_old = INVALID_SVQ_KICK_FD !=
+                     event_notifier_get_fd(&svq->svq_kick);
+
+    if (check_old) {
+        event_notifier_set_handler(&svq->svq_kick, NULL);
+        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
+    }
+
+    /*
+     * event_notifier_set_handler already checks for guest's notifications if
+     * they arrive to the new file descriptor in the switch, so there is no
+     * need to explicitely check for them.
+     */
+    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
+
+    if (!check_old || event_notifier_test_and_clear(&tmp)) {
+        event_notifier_set(&svq->hdev_kick);
+    }
+}
+
 /**
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
@@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
         goto err_init_hdev_call;
     }
 
+    /* Placeholder descriptor, it should be deleted at set_kick_fd */
+    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
+
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-28  6:32     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

At this mode no buffer forwarding will be performed in SVQ mode: Qemu
will just forward the guest's kicks to the device.

Also, host notifiers must be disabled at SVQ start, and they will not
start if SVQ has been enabled when the device is stopped. This will be
addressed in next patches.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index a56ecfc09d..4c583a9171 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -19,6 +19,8 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
 
+void vhost_svq_stop(VhostShadowVirtqueue *svq);
+
 VhostShadowVirtqueue *vhost_svq_new(void);
 
 void vhost_svq_free(VhostShadowVirtqueue *vq);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 21534bc94d..8991f0b3c3 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -42,11 +42,26 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+/* Forward guest notifications */
+static void vhost_handle_guest_kick(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             svq_kick);
+
+    if (unlikely(!event_notifier_test_and_clear(n))) {
+        return;
+    }
+
+    event_notifier_set(&svq->hdev_kick);
+}
+
 /**
  * Set a new file descriptor for the guest to kick SVQ and notify for avail
  *
  * @svq          The svq
- * @svq_kick_fd  The new svq kick fd
+ * @svq_kick_fd  The svq kick fd
+ *
+ * Note that SVQ will never close the old file descriptor.
  */
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
 {
@@ -65,12 +80,22 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
      * need to explicitely check for them.
      */
     event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
+    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
 
     if (!check_old || event_notifier_test_and_clear(&tmp)) {
         event_notifier_set(&svq->hdev_kick);
     }
 }
 
+/**
+ * Stop shadow virtqueue operation.
+ * @svq Shadow Virtqueue
+ */
+void vhost_svq_stop(VhostShadowVirtqueue *svq)
+{
+    event_notifier_set_handler(&svq->svq_kick, NULL);
+}
+
 /**
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-28  6:56     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier Eugenio Pérez
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

At this moment no buffer forwarding will be performed in SVQ mode: Qemu
just forward the guest's kicks to the device. This commit also set up
SVQs in the vhost device.

Host memory notifiers regions are left out for simplicity, and they will
not be addressed in this series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h |   4 ++
 hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
 2 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 3ce79a646d..009a9f3b6b 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -12,6 +12,8 @@
 #ifndef HW_VIRTIO_VHOST_VDPA_H
 #define HW_VIRTIO_VHOST_VDPA_H
 
+#include <gmodule.h>
+
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
     bool iotlb_batch_begin_sent;
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
+    bool shadow_vqs_enabled;
+    GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 6c10a7f05f..18de14f0fb 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -17,12 +17,14 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio-net.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "exec/address-spaces.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
 #include "trace.h"
 #include "qemu-common.h"
+#include "qapi/error.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -409,8 +411,14 @@ err:
 
 static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int i;
 
+    if (v->shadow_vqs_enabled) {
+        /* SVQ is not compatible with host notifiers mr */
+        return;
+    }
+
     for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
         if (vhost_vdpa_host_notifier_init(dev, i)) {
             goto err;
@@ -424,6 +432,17 @@ err:
     return;
 }
 
+static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    size_t idx;
+
+    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
+        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
+    }
+    g_ptr_array_free(v->shadow_vqs, true);
+}
+
 static int vhost_vdpa_cleanup(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v;
@@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     trace_vhost_vdpa_cleanup(dev, v);
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     memory_listener_unregister(&v->listener);
+    vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
     ram_block_discard_disable(false);
@@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
 
 static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int ret;
     uint8_t status = 0;
 
+    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+        vhost_svq_stop(svq);
+    }
+
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
     return ret;
@@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
     return ret;
 }
 
-static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
-                                       struct vhost_vring_file *file)
+static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
+                                         struct vhost_vring_file *file)
 {
     trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
     return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
 }
 
+static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
+                                       struct vhost_vring_file *file)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
+
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+        vhost_svq_set_svq_kick_fd(svq, file->fd);
+        return 0;
+    } else {
+        return vhost_vdpa_set_vring_dev_kick(dev, file);
+    }
+}
+
 static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
@@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
     return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }
 
+/**
+ * Set shadow virtqueue descriptors to the device
+ *
+ * @dev   The vhost device model
+ * @svq   The shadow virtqueue
+ * @idx   The index of the virtqueue in the vhost device
+ */
+static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
+                                VhostShadowVirtqueue *svq,
+                                unsigned idx)
+{
+    struct vhost_vring_file file = {
+        .index = dev->vq_index + idx,
+    };
+    const EventNotifier *event_notifier;
+    int r;
+
+    event_notifier = vhost_svq_get_dev_kick_notifier(svq);
+    file.fd = event_notifier_get_fd(event_notifier);
+    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
+    if (unlikely(r != 0)) {
+        error_report("Can't set device kick fd (%d)", -r);
+    }
+
+    return r == 0;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
@@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 
     if (started) {
         vhost_vdpa_host_notifiers_init(dev);
+        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
+            if (unlikely(!ok)) {
+                return -1;
+            }
+        }
         vhost_vdpa_set_vring_ready(dev);
     } else {
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
@@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
     return true;
 }
 
+/**
+ * Adaptor function to free shadow virtqueue through gpointer
+ *
+ * @svq   The Shadow Virtqueue
+ */
+static void vhost_psvq_free(gpointer svq)
+{
+    vhost_svq_free(svq);
+}
+
+static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
+                               Error **errp)
+{
+    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
+    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
+                                                           vhost_psvq_free);
+    if (!v->shadow_vqs_enabled) {
+        goto out;
+    }
+
+    for (unsigned n = 0; n < hdev->nvqs; ++n) {
+        VhostShadowVirtqueue *svq = vhost_svq_new();
+
+        if (unlikely(!svq)) {
+            error_setg(errp, "Cannot create svq %u", n);
+            return -1;
+        }
+        g_ptr_array_add(v->shadow_vqs, svq);
+    }
+
+out:
+    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
+    return 0;
+}
+
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 {
     struct vhost_vdpa *v;
@@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     dev->opaque =  opaque ;
     v->listener = vhost_vdpa_memory_listener;
     v->msg_type = VHOST_IOTLB_MSG_V2;
+    ret = vhost_vdpa_init_svq(dev, v, errp);
+    if (ret) {
+        goto err;
+    }
 
     vhost_vdpa_get_iova_range(v);
 
@@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
                                VIRTIO_CONFIG_S_DRIVER);
 
     return 0;
+
+err:
+    ram_block_discard_disable(false);
+    return ret;
 }
 
 const VhostOps vdpa_ops = {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-29  7:57     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 08/31] vhost: Add vhost_svq_set_guest_call_notifier Eugenio Pérez
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This allows vhost-vdpa device to retrieve device -> svq call eventfd.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 4c583a9171..a78234b52b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
+const EventNotifier *vhost_svq_get_svq_call_notifier(
+                                              const VhostShadowVirtqueue *svq);
 
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 8991f0b3c3..25fcdf16ec 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -55,6 +55,18 @@ static void vhost_handle_guest_kick(EventNotifier *n)
     event_notifier_set(&svq->hdev_kick);
 }
 
+/**
+ * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
+ * exists pending used buffers.
+ *
+ * @svq Shadow Virtqueue
+ */
+const EventNotifier *vhost_svq_get_svq_call_notifier(
+                                               const VhostShadowVirtqueue *svq)
+{
+    return &svq->hdev_call;
+}
+
 /**
  * Set a new file descriptor for the guest to kick SVQ and notify for avail
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 08/31] vhost: Add vhost_svq_set_guest_call_notifier
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This allows the vhost-vdpa device to set SVQ -> guest notifier to SVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index a78234b52b..c9ffa11fce 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -16,6 +16,7 @@
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
 const EventNotifier *vhost_svq_get_svq_call_notifier(
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 25fcdf16ec..9c2cf07fd9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -29,6 +29,9 @@ typedef struct VhostShadowVirtqueue {
      * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
      */
     EventNotifier svq_kick;
+
+    /* Guest's call notifier, where SVQ calls guest. */
+    EventNotifier svq_call;
 } VhostShadowVirtqueue;
 
 #define INVALID_SVQ_KICK_FD -1
@@ -67,6 +70,19 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
     return &svq->hdev_call;
 }
 
+/**
+ * Set the call notifier for the SVQ to call the guest
+ *
+ * @svq Shadow virtqueue
+ * @call_fd call notifier
+ *
+ * Called on BQL context.
+ */
+void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
+{
+    event_notifier_init_fd(&svq->svq_call, call_fd);
+}
+
 /**
  * Set a new file descriptor for the guest to kick SVQ and notify for avail
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 08/31] vhost: Add vhost_svq_set_guest_call_notifier Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-29  8:05     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 10/31] vhost: Route host->guest notification through shadow virtqueue Eugenio Pérez
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 18de14f0fb..029f98feee 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
     }
 }
 
-static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
-                                       struct vhost_vring_file *file)
+static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
+                                         struct vhost_vring_file *file)
 {
     trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
     return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }
 
+static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
+                                     struct vhost_vring_file *file)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (v->shadow_vqs_enabled) {
+        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+
+        vhost_svq_set_guest_call_notifier(svq, file->fd);
+        return 0;
+    } else {
+        return vhost_vdpa_set_vring_dev_call(dev, file);
+    }
+}
+
 /**
  * Set shadow virtqueue descriptors to the device
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 10/31] vhost: Route host->guest notification through shadow virtqueue
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This will make qemu aware of the device used buffers, allowing it to
write the guest memory with its contents if needed.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 15 +++++++++++++++
 hw/virtio/vhost-vdpa.c             | 11 +++++++++++
 2 files changed, 26 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 9c2cf07fd9..9619c8082c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -58,6 +58,19 @@ static void vhost_handle_guest_kick(EventNotifier *n)
     event_notifier_set(&svq->hdev_kick);
 }
 
+/* Forward vhost notifications */
+static void vhost_svq_handle_call(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             hdev_call);
+
+    if (unlikely(!event_notifier_test_and_clear(n))) {
+        return;
+    }
+
+    event_notifier_set(&svq->svq_call);
+}
+
 /**
  * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
  * exists pending used buffers.
@@ -150,6 +163,7 @@ VhostShadowVirtqueue *vhost_svq_new(void)
     /* Placeholder descriptor, it should be deleted at set_kick_fd */
     event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
 
+    event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
@@ -165,6 +179,7 @@ err_init_hdev_kick:
 void vhost_svq_free(VhostShadowVirtqueue *vq)
 {
     event_notifier_cleanup(&vq->hdev_kick);
+    event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
     g_free(vq);
 }
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 029f98feee..bdb45c8808 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -716,6 +716,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
  * @dev   The vhost device model
  * @svq   The shadow virtqueue
  * @idx   The index of the virtqueue in the vhost device
+ *
+ * Note that this function does not rewind kick file descriptor if cannot set
+ * call one.
  */
 static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
                                 VhostShadowVirtqueue *svq,
@@ -732,6 +735,14 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
     r = vhost_vdpa_set_vring_dev_kick(dev, &file);
     if (unlikely(r != 0)) {
         error_report("Can't set device kick fd (%d)", -r);
+        return false;
+    }
+
+    event_notifier = vhost_svq_get_svq_call_notifier(svq);
+    file.fd = event_notifier_get_fd(event_notifier);
+    r = vhost_vdpa_set_vring_dev_call(dev, &file);
+    if (unlikely(r != 0)) {
+        error_report("Can't set device call fd (%d)", -r);
     }
 
     return r == 0;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 10/31] vhost: Route host->guest notification through shadow virtqueue Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-29  8:11     ` Jason Wang
  2022-02-26  9:11   ` Liuxiangdong via
  2022-01-21 20:27 ` [PATCH 12/31] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
                   ` (20 subsequent siblings)
  31 siblings, 2 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This allows SVQ to negotiate features with the device. For the device,
SVQ is a driver. While this function needs to bypass all non-transport
features, it needs to disable the features that SVQ does not support
when forwarding buffers. This includes packed vq layout, indirect
descriptors or event idx.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index c9ffa11fce..d963867a04 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,8 @@
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
+bool vhost_svq_valid_device_features(uint64_t *features);
+
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
 const EventNotifier *vhost_svq_get_dev_kick_notifier(
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 9619c8082c..51442b3dbf 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
     return &svq->hdev_kick;
 }
 
+/**
+ * Validate the transport device features that SVQ can use with the device
+ *
+ * @dev_features  The device features. If success, the acknowledged features.
+ *
+ * Returns true if SVQ can go with a subset of these, false otherwise.
+ */
+bool vhost_svq_valid_device_features(uint64_t *dev_features)
+{
+    bool r = true;
+
+    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
+         ++b) {
+        switch (b) {
+        case VIRTIO_F_NOTIFY_ON_EMPTY:
+        case VIRTIO_F_ANY_LAYOUT:
+            continue;
+
+        case VIRTIO_F_ACCESS_PLATFORM:
+            /* SVQ does not know how to translate addresses */
+            if (*dev_features & BIT_ULL(b)) {
+                clear_bit(b, dev_features);
+                r = false;
+            }
+            break;
+
+        case VIRTIO_F_VERSION_1:
+            /* SVQ trust that guest vring is little endian */
+            if (!(*dev_features & BIT_ULL(b))) {
+                set_bit(b, dev_features);
+                r = false;
+            }
+            continue;
+
+        default:
+            if (*dev_features & BIT_ULL(b)) {
+                clear_bit(b, dev_features);
+            }
+        }
+    }
+
+    return r;
+}
+
 /* Forward guest notifications */
 static void vhost_handle_guest_kick(EventNotifier *n)
 {
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index bdb45c8808..9d801cf907 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
     size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
     g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
                                                            vhost_psvq_free);
+    uint64_t dev_features;
+    uint64_t svq_features;
+    int r;
+    bool ok;
+
     if (!v->shadow_vqs_enabled) {
         goto out;
     }
 
+    r = vhost_vdpa_get_features(hdev, &dev_features);
+    if (r != 0) {
+        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
+        return r;
+    }
+
+    svq_features = dev_features;
+    ok = vhost_svq_valid_device_features(&svq_features);
+    if (unlikely(!ok)) {
+        error_setg(errp,
+            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
+            hdev->features, svq_features);
+        return -1;
+    }
+
+    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
         VhostShadowVirtqueue *svq = vhost_svq_new();
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 12/31] vhost: Add vhost_svq_valid_guest_features to shadow vq
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (10 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 13/31] vhost: Add vhost_svq_ack_guest_features " Eugenio Pérez
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This allows it to test if the guest has acknowledged an invalid
transport feature for SVQ. This will include packed vq layout or
event_idx, where the VirtIO device needs help from SVQ.

It is not needed at this moment, but since SVQ will not re-negotiate
features again with the guest, a failure to acknowledge them is fatal
for SVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index d963867a04..1aae6a2297 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -16,6 +16,7 @@
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
 bool vhost_svq_valid_device_features(uint64_t *features);
+bool vhost_svq_valid_guest_features(uint64_t *features);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 51442b3dbf..f70160d7ca 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -89,6 +89,30 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
     return r;
 }
 
+/**
+ * Offers SVQ valid transport features to the guest.
+ *
+ * @guest_features  The device's supported features. Return SVQ's if success.
+ *
+ * Returns true if SVQ can handle them, false otherwise.
+ */
+bool vhost_svq_valid_guest_features(uint64_t *guest_features)
+{
+    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
+
+    /* These transport features are handled by VirtQueue */
+    static const uint64_t valid = BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) |
+                                  BIT_ULL(VIRTIO_F_VERSION_1) |
+                                  BIT_ULL(VIRTIO_F_IOMMU_PLATFORM);
+
+    /* We are only interested in transport-related feature bits */
+    uint64_t guest_transport_features = (*guest_features) & transport;
+
+    *guest_features &= (valid | ~transport);
+    return !(guest_transport_features & (transport ^ valid));
+}
+
 /* Forward guest notifications */
 static void vhost_handle_guest_kick(EventNotifier *n)
 {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 13/31] vhost: Add vhost_svq_ack_guest_features to shadow vq
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (11 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 12/31] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 14/31] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This combines the previous two feature functions, forwarding the guest
ones to the device and setting the transport ones that the SVQ supports
with the device.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  3 +++
 hw/virtio/vhost-shadow-virtqueue.c | 31 ++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 1aae6a2297..af8f8264c0 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,9 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
 bool vhost_svq_valid_device_features(uint64_t *features);
 bool vhost_svq_valid_guest_features(uint64_t *features);
+bool vhost_svq_ack_guest_features(uint64_t dev_features,
+                                  uint64_t guest_features,
+                                  uint64_t *acked_features);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index f70160d7ca..a6fb7e3c8f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -113,6 +113,37 @@ bool vhost_svq_valid_guest_features(uint64_t *guest_features)
     return !(guest_transport_features & (transport ^ valid));
 }
 
+/**
+ * VirtIO features that SVQ must acknowledge to device.
+ *
+ * It combines the SVQ transport compatible features with the guest's device
+ * features.
+ *
+ * @dev_features    The device offered features
+ * @guest_features  The guest acknowledge features
+ * @acked_features  The guest acknowledge features in the device side plus SVQ
+ *                  transport ones.
+ *
+ * Returns true if SVQ can work with this features, false otherwise
+ */
+bool vhost_svq_ack_guest_features(uint64_t dev_features,
+                                  uint64_t guest_features,
+                                  uint64_t *acked_features)
+{
+    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
+
+    bool ok = vhost_svq_valid_device_features(&dev_features) &&
+              vhost_svq_valid_guest_features(&guest_features);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    *acked_features = (dev_features & transport) |
+                      (guest_features & ~transport);
+    return true;
+}
+
 /* Forward guest notifications */
 static void vhost_handle_guest_kick(EventNotifier *n)
 {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 14/31] virtio: Add vhost_shadow_vq_get_vring_addr
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (12 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 13/31] vhost: Add vhost_svq_ack_guest_features " Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 15/31] vdpa: Add vhost_svq_get_num Eugenio Pérez
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

It reports the shadow virtqueue address from qemu virtual address space.

Since this will be different from the guest's vaddr, but the device can
access it, SVQ takes special care about its alignment & lack of garbage
data. It assumes that IOMMU will work in host_page_size ranges for that.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
 hw/virtio/vhost-shadow-virtqueue.c | 33 ++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index af8f8264c0..3521e8094d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -27,6 +27,10 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
                                               const VhostShadowVirtqueue *svq);
 const EventNotifier *vhost_svq_get_svq_call_notifier(
                                               const VhostShadowVirtqueue *svq);
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+                              struct vhost_vring_addr *addr);
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index a6fb7e3c8f..0f2c2403ff 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -9,12 +9,16 @@
 
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
+#include "standard-headers/linux/vhost_types.h"
 
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
+    /* Shadow vring */
+    struct vring vring;
+
     /* Shadow kick notifier, sent to vhost */
     EventNotifier hdev_kick;
     /* Shadow call notifier, sent to vhost */
@@ -195,6 +199,35 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
     event_notifier_init_fd(&svq->svq_call, call_fd);
 }
 
+/*
+ * Get the shadow vq vring address.
+ * @svq Shadow virtqueue
+ * @addr Destination to store address
+ */
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+                              struct vhost_vring_addr *addr)
+{
+    addr->desc_user_addr = (uint64_t)svq->vring.desc;
+    addr->avail_user_addr = (uint64_t)svq->vring.avail;
+    addr->used_user_addr = (uint64_t)svq->vring.used;
+}
+
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
+{
+    size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
+    size_t avail_size = offsetof(vring_avail_t, ring) +
+                                             sizeof(uint16_t) * svq->vring.num;
+
+    return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size);
+}
+
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
+{
+    size_t used_size = offsetof(vring_used_t, ring) +
+                                    sizeof(vring_used_elem_t) * svq->vring.num;
+    return ROUND_UP(used_size, qemu_real_host_page_size);
+}
+
 /**
  * Set a new file descriptor for the guest to kick SVQ and notify for avail
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 15/31] vdpa: Add vhost_svq_get_num
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (13 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 14/31] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-29  8:14     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr Eugenio Pérez
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This reports the guest's visible SVQ effective length, not the device's
one.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 1 +
 hw/virtio/vhost-shadow-virtqueue.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 3521e8094d..035207a469 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -29,6 +29,7 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
                                               const VhostShadowVirtqueue *svq);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
                               struct vhost_vring_addr *addr);
+uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 0f2c2403ff..f129ec8395 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -212,6 +212,11 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
     addr->used_user_addr = (uint64_t)svq->vring.used;
 }
 
+uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq)
+{
+    return svq->vring.num;
+}
+
 size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
 {
     size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (14 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 15/31] vdpa: Add vhost_svq_get_num Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-29  8:20     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
                   ` (15 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Doing that way allows vhost backend to know what address to return.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 7b03efccec..64b955ba0c 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
                                     struct vhost_virtqueue *vq,
                                     unsigned idx, bool enable_log)
 {
-    struct vhost_vring_addr addr;
+    struct vhost_vring_addr addr = {
+        .index = idx,
+    };
     int r;
-    memset(&addr, 0, sizeof(struct vhost_vring_addr));
 
     if (dev->vhost_ops->vhost_vq_get_addr) {
         r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
@@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
         addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
         addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
     }
-    addr.index = idx;
     addr.log_guest_addr = vq->used_phys;
     addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
     r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (15 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  4:03     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
                   ` (14 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

First half of the buffers forwarding part, preparing vhost-vdpa
callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
this is effectively dead code at the moment, but it helps to reduce
patch size.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |   2 +-
 hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
 hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
 3 files changed, 143 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 035207a469..39aef5ffdf 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(void);
+VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
 
 void vhost_svq_free(VhostShadowVirtqueue *vq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index f129ec8395..7c168075d7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 /**
  * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
  * methods and file descriptors.
+ *
+ * @qsize Shadow VirtQueue size
+ *
+ * Returns the new virtqueue or NULL.
+ *
+ * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(void)
+VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
 {
+    size_t desc_size = sizeof(vring_desc_t) * qsize;
+    size_t device_size, driver_size;
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
 
@@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
     /* Placeholder descriptor, it should be deleted at set_kick_fd */
     event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
 
+    svq->vring.num = qsize;
+    driver_size = vhost_svq_driver_area_size(svq);
+    device_size = vhost_svq_device_area_size(svq);
+    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
+    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
+    memset(svq->vring.desc, 0, driver_size);
+    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
+    memset(svq->vring.used, 0, device_size);
+
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
@@ -318,5 +335,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
     event_notifier_cleanup(&vq->hdev_kick);
     event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
+    qemu_vfree(vq->vring.desc);
+    qemu_vfree(vq->vring.used);
     g_free(vq);
 }
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9d801cf907..53e14bafa0 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -641,20 +641,52 @@ static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
     return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
 }
 
-static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
-                                      struct vhost_vring_state *ring)
+static int vhost_vdpa_set_dev_vring_num(struct vhost_dev *dev,
+                                        struct vhost_vring_state *ring)
 {
     trace_vhost_vdpa_set_vring_num(dev, ring->index, ring->num);
     return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
 }
 
-static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
-                                       struct vhost_vring_state *ring)
+static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
+                                    struct vhost_vring_state *ring)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (v->shadow_vqs_enabled) {
+        /*
+         * Vring num was set at device start. SVQ num is handled by VirtQueue
+         * code
+         */
+        return 0;
+    }
+
+    return vhost_vdpa_set_dev_vring_num(dev, ring);
+}
+
+static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
+                                         struct vhost_vring_state *ring)
 {
     trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
     return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
 }
 
+static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
+                                     struct vhost_vring_state *ring)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (v->shadow_vqs_enabled) {
+        /*
+         * Vring base was set at device start. SVQ base is handled by VirtQueue
+         * code
+         */
+        return 0;
+    }
+
+    return vhost_vdpa_set_dev_vring_base(dev, ring);
+}
+
 static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
@@ -784,8 +816,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     }
 }
 
-static int vhost_vdpa_get_features(struct vhost_dev *dev,
-                                     uint64_t *features)
+static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
+                                       uint64_t *features)
 {
     int ret;
 
@@ -794,15 +826,64 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
     return ret;
 }
 
+static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    int ret = vhost_vdpa_get_dev_features(dev, features);
+
+    if (ret == 0 && v->shadow_vqs_enabled) {
+        /* Filter only features that SVQ can offer to guest */
+        vhost_svq_valid_guest_features(features);
+    }
+
+    return ret;
+}
+
 static int vhost_vdpa_set_features(struct vhost_dev *dev,
                                    uint64_t features)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int ret;
 
     if (vhost_vdpa_one_time_request(dev)) {
         return 0;
     }
 
+    if (v->shadow_vqs_enabled) {
+        uint64_t dev_features, svq_features, acked_features;
+        bool ok;
+
+        ret = vhost_vdpa_get_dev_features(dev, &dev_features);
+        if (ret != 0) {
+            error_report("Can't get vdpa device features, got (%d)", ret);
+            return ret;
+        }
+
+        svq_features = dev_features;
+        ok = vhost_svq_valid_device_features(&svq_features);
+        if (unlikely(!ok)) {
+            error_report("SVQ Invalid device feature flags, offer: 0x%"
+                         PRIx64", ok: 0x%"PRIx64, dev->features, svq_features);
+            return -1;
+        }
+
+        ok = vhost_svq_valid_guest_features(&features);
+        if (unlikely(!ok)) {
+            error_report(
+                "Invalid guest acked feature flag, acked: 0x%"
+                PRIx64", ok: 0x%"PRIx64, dev->acked_features, features);
+            return -1;
+        }
+
+        ok = vhost_svq_ack_guest_features(svq_features, features,
+                                          &acked_features);
+        if (unlikely(!ok)) {
+            return -1;
+        }
+
+        features = acked_features;
+    }
+
     trace_vhost_vdpa_set_features(dev, features);
     ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
     if (ret) {
@@ -822,13 +903,31 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
     return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
 }
 
-static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
-                    struct vhost_vring_addr *addr, struct vhost_virtqueue *vq)
+static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
+                                         struct vhost_virtqueue *vq)
 {
-    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
     addr->desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
     addr->avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
     addr->used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
+}
+
+static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
+                                  struct vhost_vring_addr *addr,
+                                  struct vhost_virtqueue *vq)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+    if (v->shadow_vqs_enabled) {
+        int idx = vhost_vdpa_get_vq_index(dev, addr->index);
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
+
+        vhost_svq_get_vring_addr(svq, addr);
+    } else {
+        vhost_vdpa_vq_get_guest_addr(addr, vq);
+    }
+
     trace_vhost_vdpa_vq_get_addr(dev, vq, addr->desc_user_addr,
                                  addr->avail_user_addr, addr->used_user_addr);
     return 0;
@@ -849,6 +948,12 @@ static void vhost_psvq_free(gpointer svq)
     vhost_svq_free(svq);
 }
 
+static int vhost_vdpa_get_max_queue_size(struct vhost_dev *dev,
+                                         uint16_t *qsize)
+{
+    return vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_NUM, qsize);
+}
+
 static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
                                Error **errp)
 {
@@ -857,6 +962,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
                                                            vhost_psvq_free);
     uint64_t dev_features;
     uint64_t svq_features;
+    uint16_t qsize;
     int r;
     bool ok;
 
@@ -864,7 +970,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
         goto out;
     }
 
-    r = vhost_vdpa_get_features(hdev, &dev_features);
+    r = vhost_vdpa_get_dev_features(hdev, &dev_features);
     if (r != 0) {
         error_setg(errp, "Can't get vdpa device features, got (%d)", r);
         return r;
@@ -879,9 +985,14 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
         return -1;
     }
 
+    r = vhost_vdpa_get_max_queue_size(hdev, &qsize);
+    if (unlikely(r)) {
+        qsize = 256;
+    }
+
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        VhostShadowVirtqueue *svq = vhost_svq_new();
+        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (16 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  4:42     ` Jason Wang
  2022-01-30  6:46     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 19/31] utils: Add internal DMAMap to iova-tree Eugenio Pérez
                   ` (13 subsequent siblings)
  31 siblings, 2 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Initial version of shadow virtqueue that actually forward buffers. There
is no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.

For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.

It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review.

However, forwarding buffers have some particular pieces: One of the most
unexpected ones is that a guest's buffer can expand through more than
one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it by checking for this condition at both guest's
kicks and device's calls. The code may be more elegant in the future if
SVQ code runs in its own iocontext.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |   2 +
 hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
 hw/virtio/vhost-vdpa.c             | 111 ++++++++-
 3 files changed, 462 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 39aef5ffdf..19c934af49 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+                     VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 7c168075d7..a1a404f68f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -9,6 +9,8 @@
 
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/virtio-access.h"
 #include "standard-headers/linux/vhost_types.h"
 
 #include "qemu/error-report.h"
@@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
 
     /* Guest's call notifier, where SVQ calls guest. */
     EventNotifier svq_call;
+
+    /* Virtio queue shadowing */
+    VirtQueue *vq;
+
+    /* Virtio device */
+    VirtIODevice *vdev;
+
+    /* Map for returning guest's descriptors */
+    VirtQueueElement **ring_id_maps;
+
+    /* Next VirtQueue element that guest made available */
+    VirtQueueElement *next_guest_avail_elem;
+
+    /* Next head to expose to device */
+    uint16_t avail_idx_shadow;
+
+    /* Next free descriptor */
+    uint16_t free_head;
+
+    /* Last seen used idx */
+    uint16_t shadow_used_idx;
+
+    /* Next head to consume from device */
+    uint16_t last_used_idx;
+
+    /* Cache for the exposed notification flag */
+    bool notification;
 } VhostShadowVirtqueue;
 
 #define INVALID_SVQ_KICK_FD -1
@@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
     return true;
 }
 
-/* Forward guest notifications */
-static void vhost_handle_guest_kick(EventNotifier *n)
+/**
+ * Number of descriptors that SVQ can make available from the guest.
+ *
+ * @svq   The svq
+ */
+static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
 {
-    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
-                                             svq_kick);
+    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
+}
+
+static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
+{
+    uint16_t notification_flag;
 
-    if (unlikely(!event_notifier_test_and_clear(n))) {
+    if (svq->notification == enable) {
+        return;
+    }
+
+    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+
+    svq->notification = enable;
+    if (enable) {
+        svq->vring.avail->flags &= ~notification_flag;
+    } else {
+        svq->vring.avail->flags |= notification_flag;
+    }
+}
+
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+                                    const struct iovec *iovec,
+                                    size_t num, bool more_descs, bool write)
+{
+    uint16_t i = svq->free_head, last = svq->free_head;
+    unsigned n;
+    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
+    vring_desc_t *descs = svq->vring.desc;
+
+    if (num == 0) {
+        return;
+    }
+
+    for (n = 0; n < num; n++) {
+        if (more_descs || (n + 1 < num)) {
+            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
+        } else {
+            descs[i].flags = flags;
+        }
+        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
+        descs[i].len = cpu_to_le32(iovec[n].iov_len);
+
+        last = i;
+        i = cpu_to_le16(descs[i].next);
+    }
+
+    svq->free_head = le16_to_cpu(descs[last].next);
+}
+
+static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
+                                    VirtQueueElement *elem)
+{
+    int head;
+    unsigned avail_idx;
+    vring_avail_t *avail = svq->vring.avail;
+
+    head = svq->free_head;
+
+    /* We need some descriptors here */
+    assert(elem->out_num || elem->in_num);
+
+    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
+                            elem->in_num > 0, false);
+    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
+
+    /*
+     * Put entry in available array (but don't update avail->idx until they
+     * do sync).
+     */
+    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
+    avail->ring[avail_idx] = cpu_to_le16(head);
+    svq->avail_idx_shadow++;
+
+    /* Update avail index after the descriptor is wrote */
+    smp_wmb();
+    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
+
+    return head;
+}
+
+static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+{
+    unsigned qemu_head = vhost_svq_add_split(svq, elem);
+
+    svq->ring_id_maps[qemu_head] = elem;
+}
+
+static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+{
+    /* We need to expose available array entries before checking used flags */
+    smp_mb();
+    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
         return;
     }
 
     event_notifier_set(&svq->hdev_kick);
 }
 
-/* Forward vhost notifications */
+/**
+ * Forward available buffers.
+ *
+ * @svq Shadow VirtQueue
+ *
+ * Note that this function does not guarantee that all guest's available
+ * buffers are available to the device in SVQ avail ring. The guest may have
+ * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
+ * vaddr.
+ *
+ * If that happens, guest's kick notifications will be disabled until device
+ * makes some buffers used.
+ */
+static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
+{
+    /* Clear event notifier */
+    event_notifier_test_and_clear(&svq->svq_kick);
+
+    /* Make available as many buffers as possible */
+    do {
+        if (virtio_queue_get_notification(svq->vq)) {
+            virtio_queue_set_notification(svq->vq, false);
+        }
+
+        while (true) {
+            VirtQueueElement *elem;
+
+            if (svq->next_guest_avail_elem) {
+                elem = g_steal_pointer(&svq->next_guest_avail_elem);
+            } else {
+                elem = virtqueue_pop(svq->vq, sizeof(*elem));
+            }
+
+            if (!elem) {
+                break;
+            }
+
+            if (elem->out_num + elem->in_num >
+                vhost_svq_available_slots(svq)) {
+                /*
+                 * This condition is possible since a contiguous buffer in GPA
+                 * does not imply a contiguous buffer in qemu's VA
+                 * scatter-gather segments. If that happen, the buffer exposed
+                 * to the device needs to be a chain of descriptors at this
+                 * moment.
+                 *
+                 * SVQ cannot hold more available buffers if we are here:
+                 * queue the current guest descriptor and ignore further kicks
+                 * until some elements are used.
+                 */
+                svq->next_guest_avail_elem = elem;
+                return;
+            }
+
+            vhost_svq_add(svq, elem);
+            vhost_svq_kick(svq);
+        }
+
+        virtio_queue_set_notification(svq->vq, true);
+    } while (!virtio_queue_empty(svq->vq));
+}
+
+/**
+ * Handle guest's kick.
+ *
+ * @n guest kick event notifier, the one that guest set to notify svq.
+ */
+static void vhost_handle_guest_kick_notifier(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             svq_kick);
+    vhost_handle_guest_kick(svq);
+}
+
+static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
+{
+    if (svq->last_used_idx != svq->shadow_used_idx) {
+        return true;
+    }
+
+    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
+
+    return svq->last_used_idx != svq->shadow_used_idx;
+}
+
+static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
+{
+    vring_desc_t *descs = svq->vring.desc;
+    const vring_used_t *used = svq->vring.used;
+    vring_used_elem_t used_elem;
+    uint16_t last_used;
+
+    if (!vhost_svq_more_used(svq)) {
+        return NULL;
+    }
+
+    /* Only get used array entries after they have been exposed by dev */
+    smp_rmb();
+    last_used = svq->last_used_idx & (svq->vring.num - 1);
+    used_elem.id = le32_to_cpu(used->ring[last_used].id);
+    used_elem.len = le32_to_cpu(used->ring[last_used].len);
+
+    svq->last_used_idx++;
+    if (unlikely(used_elem.id >= svq->vring.num)) {
+        error_report("Device %s says index %u is used", svq->vdev->name,
+                     used_elem.id);
+        return NULL;
+    }
+
+    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+        error_report(
+            "Device %s says index %u is used, but it was not available",
+            svq->vdev->name, used_elem.id);
+        return NULL;
+    }
+
+    descs[used_elem.id].next = svq->free_head;
+    svq->free_head = used_elem.id;
+
+    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
+    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
+}
+
+static void vhost_svq_flush(VhostShadowVirtqueue *svq,
+                            bool check_for_avail_queue)
+{
+    VirtQueue *vq = svq->vq;
+
+    /* Make as many buffers as possible used. */
+    do {
+        unsigned i = 0;
+
+        vhost_svq_set_notification(svq, false);
+        while (true) {
+            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
+            if (!elem) {
+                break;
+            }
+
+            if (unlikely(i >= svq->vring.num)) {
+                virtio_error(svq->vdev,
+                         "More than %u used buffers obtained in a %u size SVQ",
+                         i, svq->vring.num);
+                virtqueue_fill(vq, elem, elem->len, i);
+                virtqueue_flush(vq, i);
+                i = 0;
+            }
+            virtqueue_fill(vq, elem, elem->len, i++);
+        }
+
+        virtqueue_flush(vq, i);
+        event_notifier_set(&svq->svq_call);
+
+        if (check_for_avail_queue && svq->next_guest_avail_elem) {
+            /*
+             * Avail ring was full when vhost_svq_flush was called, so it's a
+             * good moment to make more descriptors available if possible
+             */
+            vhost_handle_guest_kick(svq);
+        }
+
+        vhost_svq_set_notification(svq, true);
+    } while (vhost_svq_more_used(svq));
+}
+
+/**
+ * Forward used buffers.
+ *
+ * @n hdev call event notifier, the one that device set to notify svq.
+ *
+ * Note that we are not making any buffers available in the loop, there is no
+ * way that it runs more than virtqueue size times.
+ */
 static void vhost_svq_handle_call(EventNotifier *n)
 {
     VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
                                              hdev_call);
 
-    if (unlikely(!event_notifier_test_and_clear(n))) {
-        return;
-    }
+    /* Clear event notifier */
+    event_notifier_test_and_clear(n);
 
-    event_notifier_set(&svq->svq_call);
+    vhost_svq_flush(svq, true);
 }
 
 /**
@@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
      * need to explicitely check for them.
      */
     event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
-    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
+    event_notifier_set_handler(&svq->svq_kick,
+                               vhost_handle_guest_kick_notifier);
 
     if (!check_old || event_notifier_test_and_clear(&tmp)) {
         event_notifier_set(&svq->hdev_kick);
     }
 }
 
+/**
+ * Start shadow virtqueue operation.
+ *
+ * @svq Shadow Virtqueue
+ * @vdev        VirtIO device
+ * @vq          Virtqueue to shadow
+ */
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+                     VirtQueue *vq)
+{
+    svq->next_guest_avail_elem = NULL;
+    svq->avail_idx_shadow = 0;
+    svq->shadow_used_idx = 0;
+    svq->last_used_idx = 0;
+    svq->vdev = vdev;
+    svq->vq = vq;
+
+    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
+    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
+    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
+        svq->vring.desc[i].next = cpu_to_le16(i + 1);
+    }
+}
+
 /**
  * Stop shadow virtqueue operation.
  * @svq Shadow Virtqueue
@@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
 void vhost_svq_stop(VhostShadowVirtqueue *svq)
 {
     event_notifier_set_handler(&svq->svq_kick, NULL);
+    g_autofree VirtQueueElement *next_avail_elem = NULL;
+
+    if (!svq->vq) {
+        return;
+    }
+
+    /* Send all pending used descriptors to guest */
+    vhost_svq_flush(svq, false);
+
+    for (unsigned i = 0; i < svq->vring.num; ++i) {
+        g_autofree VirtQueueElement *elem = NULL;
+        elem = g_steal_pointer(&svq->ring_id_maps[i]);
+        if (elem) {
+            virtqueue_detach_element(svq->vq, elem, elem->len);
+        }
+    }
+
+    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
+    if (next_avail_elem) {
+        virtqueue_detach_element(svq->vq, next_avail_elem,
+                                 next_avail_elem->len);
+    }
 }
 
 /**
@@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
     memset(svq->vring.used, 0, device_size);
-
+    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
@@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
     event_notifier_cleanup(&vq->hdev_kick);
     event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
+    g_free(vq->ring_id_maps);
     qemu_vfree(vq->vring.desc);
     qemu_vfree(vq->vring.used);
     g_free(vq);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 53e14bafa0..0e5c00ed7e 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
  * Note that this function does not rewind kick file descriptor if cannot set
  * call one.
  */
-static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
-                                VhostShadowVirtqueue *svq,
-                                unsigned idx)
+static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
+                                  VhostShadowVirtqueue *svq,
+                                  unsigned idx)
 {
     struct vhost_vring_file file = {
         .index = dev->vq_index + idx,
@@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
     r = vhost_vdpa_set_vring_dev_kick(dev, &file);
     if (unlikely(r != 0)) {
         error_report("Can't set device kick fd (%d)", -r);
-        return false;
+        return r;
     }
 
     event_notifier = vhost_svq_get_svq_call_notifier(svq);
@@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
         error_report("Can't set device call fd (%d)", -r);
     }
 
+    return r;
+}
+
+/**
+ * Unmap SVQ area in the device
+ */
+static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
+                                      hwaddr size)
+{
+    int r;
+
+    size = ROUND_UP(size, qemu_real_host_page_size);
+    r = vhost_vdpa_dma_unmap(v, iova, size);
+    return r == 0;
+}
+
+static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
+                                       const VhostShadowVirtqueue *svq)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    struct vhost_vring_addr svq_addr;
+    size_t device_size = vhost_svq_device_area_size(svq);
+    size_t driver_size = vhost_svq_driver_area_size(svq);
+    bool ok;
+
+    vhost_svq_get_vring_addr(svq, &svq_addr);
+
+    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
+}
+
+/**
+ * Map shadow virtqueue rings in device
+ *
+ * @dev   The vhost device
+ * @svq   The shadow virtqueue
+ */
+static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
+                                     const VhostShadowVirtqueue *svq)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    struct vhost_vring_addr svq_addr;
+    size_t device_size = vhost_svq_device_area_size(svq);
+    size_t driver_size = vhost_svq_driver_area_size(svq);
+    int r;
+
+    vhost_svq_get_vring_addr(svq, &svq_addr);
+
+    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
+                           (void *)svq_addr.desc_user_addr, true);
+    if (unlikely(r != 0)) {
+        return false;
+    }
+
+    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
+                           (void *)svq_addr.used_user_addr, false);
+    return r == 0;
+}
+
+static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
+                                VhostShadowVirtqueue *svq,
+                                unsigned idx)
+{
+    uint16_t vq_index = dev->vq_index + idx;
+    struct vhost_vring_state s = {
+        .index = vq_index,
+    };
+    int r;
+    bool ok;
+
+    r = vhost_vdpa_set_dev_vring_base(dev, &s);
+    if (unlikely(r)) {
+        error_report("Can't set vring base (%d)", r);
+        return false;
+    }
+
+    s.num = vhost_svq_get_num(svq);
+    r = vhost_vdpa_set_dev_vring_num(dev, &s);
+    if (unlikely(r)) {
+        error_report("Can't set vring num (%d)", r);
+        return false;
+    }
+
+    ok = vhost_vdpa_svq_map_rings(dev, svq);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
     return r == 0;
 }
 
@@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     if (started) {
         vhost_vdpa_host_notifiers_init(dev);
         for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
             VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
             bool ok = vhost_vdpa_svq_setup(dev, svq, i);
             if (unlikely(!ok)) {
                 return -1;
             }
+            vhost_svq_start(svq, dev->vdev, vq);
         }
         vhost_vdpa_set_vring_ready(dev);
     } else {
+        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                          i);
+            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
+            if (unlikely(!ok)) {
+                return -1;
+            }
+        }
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 19/31] utils: Add internal DMAMap to iova-tree
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (17 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 20/31] util: Store DMA entries in a list Eugenio Pérez
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

So we can store private data not accessible from outside.

In this case, we will add intrusive linked list members so we can
transverse it for allocation.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 util/iova-tree.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/util/iova-tree.c b/util/iova-tree.c
index 23ea35b7a4..ac089101c4 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -12,13 +12,18 @@
 #include "qemu/osdep.h"
 #include "qemu/iova-tree.h"
 
+typedef struct DMAMapInternal {
+    DMAMap map;
+} DMAMapInternal;
+
 struct IOVATree {
     GTree *tree;
 };
 
 static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
 {
-    const DMAMap *m1 = a, *m2 = b;
+    const DMAMapInternal *i1 = a, *i2 = b;
+    const DMAMap *m1 = &i1->map, *m2 = &i2->map;
 
     if (m1->iova > m2->iova + m2->size) {
         return 1;
@@ -42,9 +47,18 @@ IOVATree *iova_tree_new(void)
     return iova_tree;
 }
 
+static DMAMapInternal *iova_tree_find_internal(const IOVATree *tree,
+                                               const DMAMap *map)
+{
+    const DMAMapInternal map_internal = { .map = *map };
+
+    return g_tree_lookup(tree->tree, &map_internal);
+}
+
 const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map)
 {
-    return g_tree_lookup(tree->tree, map);
+    const DMAMapInternal *ret = iova_tree_find_internal(tree, map);
+    return &ret->map;
 }
 
 const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova)
@@ -54,7 +68,8 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova)
     return iova_tree_find(tree, &map);
 }
 
-static inline void iova_tree_insert_internal(GTree *gtree, DMAMap *range)
+static inline void iova_tree_insert_internal(GTree *gtree,
+                                             DMAMapInternal *range)
 {
     /* Key and value are sharing the same range data */
     g_tree_insert(gtree, range, range);
@@ -62,7 +77,7 @@ static inline void iova_tree_insert_internal(GTree *gtree, DMAMap *range)
 
 int iova_tree_insert(IOVATree *tree, const DMAMap *map)
 {
-    DMAMap *new;
+    DMAMapInternal *new;
 
     if (map->iova + map->size < map->iova || map->perm == IOMMU_NONE) {
         return IOVA_ERR_INVALID;
@@ -73,8 +88,8 @@ int iova_tree_insert(IOVATree *tree, const DMAMap *map)
         return IOVA_ERR_OVERLAP;
     }
 
-    new = g_new0(DMAMap, 1);
-    memcpy(new, map, sizeof(*new));
+    new = g_new0(DMAMapInternal, 1);
+    memcpy(&new->map, map, sizeof(new->map));
     iova_tree_insert_internal(tree->tree, new);
 
     return IOVA_OK;
@@ -84,11 +99,11 @@ static gboolean iova_tree_traverse(gpointer key, gpointer value,
                                 gpointer data)
 {
     iova_tree_iterator iterator = data;
-    DMAMap *map = key;
+    DMAMapInternal *map = key;
 
     g_assert(key == value);
 
-    return iterator(map);
+    return iterator(&map->map);
 }
 
 void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
@@ -98,10 +113,10 @@ void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
 
 int iova_tree_remove(IOVATree *tree, const DMAMap *map)
 {
-    const DMAMap *overlap;
+    DMAMapInternal *overlap_internal;
 
-    while ((overlap = iova_tree_find(tree, map))) {
-        g_tree_remove(tree->tree, overlap);
+    while ((overlap_internal = iova_tree_find_internal(tree, map))) {
+        g_tree_remove(tree->tree, overlap_internal);
     }
 
     return IOVA_OK;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 20/31] util: Store DMA entries in a list
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (18 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 19/31] utils: Add internal DMAMap to iova-tree Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 21/31] util: Add iova_tree_alloc Eugenio Pérez
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

SVQ need to allocate iova entries, traversing the list looking for
holes.

GLib offers methods to both transverse the list and look for entries
upper than a key since version 2.68. However qemu may need to compile
with earlier versions, so we replicate both methods here.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 util/iova-tree.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/util/iova-tree.c b/util/iova-tree.c
index ac089101c4..5063a256dd 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -11,15 +11,44 @@
 
 #include "qemu/osdep.h"
 #include "qemu/iova-tree.h"
+#include "qemu/queue.h"
 
 typedef struct DMAMapInternal {
     DMAMap map;
+    QTAILQ_ENTRY(DMAMapInternal) entry;
 } DMAMapInternal;
 
 struct IOVATree {
     GTree *tree;
+    QTAILQ_HEAD(, DMAMapInternal) list;
 };
 
+/**
+ * Search function for the upper bound of a given needle.
+ *
+ * The upper bound is the first node that has its key strictly greater than the
+ * searched key.
+ *
+ * TODO: A specialized function is available in GTree since Glib 2.68. Replace
+ * when Glib minimal version is raised.
+ */
+static int iova_tree_compare_upper_bound(gconstpointer a, gconstpointer b)
+{
+    const DMAMapInternal *haystack = a, *needle = b, *prev;
+
+    if (needle->map.iova >= haystack->map.iova) {
+        return 1;
+    }
+
+    prev = QTAILQ_PREV(haystack, entry);
+    if (!prev || prev->map.iova < needle->map.iova) {
+        return 0;
+    }
+
+    /* More data to the left or end of data */
+    return -1;
+}
+
 static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
 {
     const DMAMapInternal *i1 = a, *i2 = b;
@@ -43,6 +72,7 @@ IOVATree *iova_tree_new(void)
 
     /* We don't have values actually, no need to free */
     iova_tree->tree = g_tree_new_full(iova_tree_compare, NULL, g_free, NULL);
+    QTAILQ_INIT(&iova_tree->list);
 
     return iova_tree;
 }
@@ -77,7 +107,7 @@ static inline void iova_tree_insert_internal(GTree *gtree,
 
 int iova_tree_insert(IOVATree *tree, const DMAMap *map)
 {
-    DMAMapInternal *new;
+    DMAMapInternal *new, *right;
 
     if (map->iova + map->size < map->iova || map->perm == IOMMU_NONE) {
         return IOVA_ERR_INVALID;
@@ -92,6 +122,15 @@ int iova_tree_insert(IOVATree *tree, const DMAMap *map)
     memcpy(&new->map, map, sizeof(new->map));
     iova_tree_insert_internal(tree->tree, new);
 
+    /* Ordered insertion */
+    right = g_tree_search(tree->tree, iova_tree_compare_upper_bound, new);
+    if (!right) {
+        /* Empty or bigger than any other entry */
+        QTAILQ_INSERT_TAIL(&tree->list, new, entry);
+    } else {
+        QTAILQ_INSERT_BEFORE(right, new, entry);
+    }
+
     return IOVA_OK;
 }
 
@@ -116,6 +155,7 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
     DMAMapInternal *overlap_internal;
 
     while ((overlap_internal = iova_tree_find_internal(tree, map))) {
+        QTAILQ_REMOVE(&tree->list, overlap_internal, entry);
         g_tree_remove(tree->tree, overlap_internal);
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (19 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 20/31] util: Store DMA entries in a list Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-24  4:32     ` Peter Xu
  2022-01-21 20:27 ` [PATCH 22/31] vhost: Add VhostIOVATree Eugenio Pérez
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This iova tree function allows it to look for a hole in allocated
regions and return a totally new translation for a given translated
address.

It's usage is mainly to allow devices to access qemu address space,
remapping guest's one into a new iova space where qemu can add chunks of
addresses.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/qemu/iova-tree.h | 17 ++++++++
 util/iova-tree.c         | 86 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8249edd764..33f9b2e13f 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -29,6 +29,7 @@
 #define  IOVA_OK           (0)
 #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
 #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
 
 typedef struct IOVATree IOVATree;
 typedef struct DMAMap {
@@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
  */
 void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
 
+/**
+ * iova_tree_alloc:
+ *
+ * @tree: the iova tree to allocate from
+ * @map: the new map (as translated addr & size) to allocate in iova region
+ * @iova_begin: the minimum address of the allocation
+ * @iova_end: the maximum addressable direction of the allocation
+ *
+ * Allocates a new region of a given size, between iova_min and iova_max.
+ *
+ * Return: Same as iova_tree_insert, but cannot overlap and can be out of
+ * free contiguous range. Caller can get the assigned iova in map->iova.
+ */
+int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                    hwaddr iova_end);
+
 /**
  * iova_tree_destroy:
  *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 5063a256dd..1439fc9fe2 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -88,7 +88,7 @@ static DMAMapInternal *iova_tree_find_internal(const IOVATree *tree,
 const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map)
 {
     const DMAMapInternal *ret = iova_tree_find_internal(tree, map);
-    return &ret->map;
+    return ret ? &ret->map : NULL;
 }
 
 const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova)
@@ -162,6 +162,90 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
     return IOVA_OK;
 }
 
+/**
+ * Check if there is at minimum "size" iova space between the end of "left" and
+ * the start of "right". If some of them is NULL, iova_begin and iova_end will
+ * be used.
+ */
+static bool iova_tree_alloc_map_in_hole(const DMAMapInternal *l,
+                                        const DMAMapInternal *r,
+                                        hwaddr iova_begin, hwaddr iova_last,
+                                        size_t size)
+{
+    const DMAMap *left = l ? &l->map : NULL;
+    const DMAMap *right = r ? &r->map : NULL;
+    uint64_t hole_start, hole_last;
+
+    if (right && right->iova + right->size < iova_begin) {
+        return false;
+    }
+
+    if (left && left->iova > iova_last) {
+        return false;
+    }
+
+    hole_start = MAX(left ? left->iova + left->size + 1 : 0, iova_begin);
+    hole_last = MIN(right ? right->iova : HWADDR_MAX, iova_last);
+
+    if (hole_last - hole_start > size) {
+        /* We found a valid hole. */
+        return true;
+    }
+
+    /* Keep iterating */
+    return false;
+}
+
+/**
+ * Allocates a new entry in the tree
+ *
+ * The caller specifies the size of the new entry with map->size. The new iova
+ * address is returned in map->iova if allocation success. The map ownership is
+ * always of the caller as in iova_tree_insert.
+ *
+ * More contrains can be specified with iova_begin and iova_last.
+ *
+ * Returns the same as iova_tree_insert, but it can return IOVA_ERR_NOMEM if
+ * cannot find a hole in iova range big enough.
+ */
+int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                    hwaddr iova_last)
+{
+    const DMAMapInternal *last, *i;
+
+    assert(iova_begin < iova_last);
+
+    /*
+     * Find a valid hole for the mapping
+     *
+     * TODO: Replace all this with g_tree_node_first/next/last when available
+     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
+     *
+     * Try to allocate first at the end of the list.
+     */
+    last = QTAILQ_LAST(&tree->list);
+    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
+                                    map->size)) {
+        goto alloc;
+    }
+
+    /* Look for inner hole */
+    last = NULL;
+    for (i = QTAILQ_FIRST(&tree->list); i;
+         last = i, i = QTAILQ_NEXT(i, entry)) {
+        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
+                                        map->size)) {
+            goto alloc;
+        }
+    }
+
+    return IOVA_ERR_NOMEM;
+
+alloc:
+    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
+    return iova_tree_insert(tree, map);
+}
+
 void iova_tree_destroy(IOVATree *tree)
 {
     g_tree_destroy(tree->tree);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 22/31] vhost: Add VhostIOVATree
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (20 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 21/31] util: Add iova_tree_alloc Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  5:21     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
                   ` (9 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This tree is able to look for a translated address from an IOVA address.

At first glance it is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities, like allocating
IOVA chunks or performing reverse translations (qemu addresses to iova).

The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of it.

It duplicates the tree so it can search efficiently both directions,
and it will signal overlap if iova or the translated address is
present in any tree.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  27 +++++++
 hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
 hw/virtio/meson.build       |   2 +-
 3 files changed, 185 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-iova-tree.c

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
new file mode 100644
index 0000000000..610394eaf1
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.h
@@ -0,0 +1,27 @@
+/*
+ * vhost software live migration ring
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
+#define HW_VIRTIO_VHOST_IOVA_TREE_H
+
+#include "qemu/iova-tree.h"
+#include "exec/memory.h"
+
+typedef struct VhostIOVATree VhostIOVATree;
+
+VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
+                                        const DMAMap *map);
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
+
+#endif
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
new file mode 100644
index 0000000000..0021dbaf54
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.c
@@ -0,0 +1,157 @@
+/*
+ * vhost software live migration ring
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iova-tree.h"
+#include "vhost-iova-tree.h"
+
+#define iova_min_addr qemu_real_host_page_size
+
+/**
+ * VhostIOVATree, able to:
+ * - Translate iova address
+ * - Reverse translate iova address (from translated to iova)
+ * - Allocate IOVA regions for translated range (potentially slow operation)
+ *
+ * Note that it cannot remove nodes.
+ */
+struct VhostIOVATree {
+    /* First addresable iova address in the device */
+    uint64_t iova_first;
+
+    /* Last addressable iova address in the device */
+    uint64_t iova_last;
+
+    /* IOVA address to qemu memory maps. */
+    IOVATree *iova_taddr_map;
+
+    /* QEMU virtual memory address to iova maps */
+    GTree *taddr_iova_map;
+};
+
+static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
+                                      gpointer data)
+{
+    const DMAMap *m1 = a, *m2 = b;
+
+    if (m1->translated_addr > m2->translated_addr + m2->size) {
+        return 1;
+    }
+
+    if (m1->translated_addr + m1->size < m2->translated_addr) {
+        return -1;
+    }
+
+    /* Overlapped */
+    return 0;
+}
+
+/**
+ * Create a new IOVA tree
+ *
+ * Returns the new IOVA tree
+ */
+VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
+{
+    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
+
+    /* Some devices does not like 0 addresses */
+    tree->iova_first = MAX(iova_first, iova_min_addr);
+    tree->iova_last = iova_last;
+
+    tree->iova_taddr_map = iova_tree_new();
+    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
+                                           NULL, g_free);
+    return tree;
+}
+
+/**
+ * Delete an iova tree
+ */
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
+{
+    iova_tree_destroy(iova_tree->iova_taddr_map);
+    g_tree_unref(iova_tree->taddr_iova_map);
+    g_free(iova_tree);
+}
+
+/**
+ * Find the IOVA address stored from a memory address
+ *
+ * @tree     The iova tree
+ * @map      The map with the memory address
+ *
+ * Return the stored mapping, or NULL if not found.
+ */
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
+                                        const DMAMap *map)
+{
+    return g_tree_lookup(tree->taddr_iova_map, map);
+}
+
+/**
+ * Allocate a new mapping
+ *
+ * @tree  The iova tree
+ * @map   The iova map
+ *
+ * Returns:
+ * - IOVA_OK if the map fits in the container
+ * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
+ * - IOVA_ERR_OVERLAP if the tree already contains that map
+ * - IOVA_ERR_NOMEM if tree cannot allocate more space.
+ *
+ * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
+ */
+int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
+{
+    /* Some vhost devices does not like addr 0. Skip first page */
+    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
+    DMAMap *new;
+    int r;
+
+    if (map->translated_addr + map->size < map->translated_addr ||
+        map->perm == IOMMU_NONE) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /* Check for collisions in translated addresses */
+    if (vhost_iova_tree_find_iova(tree, map)) {
+        return IOVA_ERR_OVERLAP;
+    }
+
+    /* Allocate a node in IOVA address */
+    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
+                        tree->iova_last);
+    if (r != IOVA_OK) {
+        return r;
+    }
+
+    /* Allocate node in qemu -> iova translations */
+    new = g_malloc(sizeof(*new));
+    memcpy(new, map, sizeof(*new));
+    g_tree_insert(tree->taddr_iova_map, new, new);
+    return IOVA_OK;
+}
+
+/**
+ * Remove existing mappings from iova tree
+ *
+ * @param  iova_tree  The vhost iova tree
+ * @param  map        The map to remove
+ */
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
+{
+    const DMAMap *overlap;
+
+    iova_tree_remove(iova_tree->iova_taddr_map, map);
+    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
+        g_tree_remove(iova_tree->taddr_iova_map, overlap);
+    }
+}
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 2dc87613bc..6047670804 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (21 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 22/31] vhost: Add VhostIOVATree Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  5:57     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 24/31] vhost: Add vhost_svq_get_last_used_idx Eugenio Pérez
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Use translations added in VhostIOVATree in SVQ.

Only introduce usage here, not allocation and deallocation. As with
previous patches, we use the dead code paths of shadow_vqs_enabled to
avoid commiting too many changes at once. These are impossible to take
at the moment.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |   3 +-
 include/hw/virtio/vhost-vdpa.h     |   3 +
 hw/virtio/vhost-shadow-virtqueue.c | 111 ++++++++++++++++----
 hw/virtio/vhost-vdpa.c             | 161 +++++++++++++++++++++++++----
 4 files changed, 238 insertions(+), 40 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 19c934af49..c6f67d6f76 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -12,6 +12,7 @@
 
 #include "hw/virtio/vhost.h"
 #include "qemu/event_notifier.h"
+#include "hw/virtio/vhost-iova-tree.h"
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 
@@ -37,7 +38,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
+VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_map);
 
 void vhost_svq_free(VhostShadowVirtqueue *vq);
 
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 009a9f3b6b..cd2388b3be 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -14,6 +14,7 @@
 
 #include <gmodule.h>
 
+#include "hw/virtio/vhost-iova-tree.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
     bool shadow_vqs_enabled;
+    /* IOVA mapping used by Shadow Virtqueue */
+    VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index a1a404f68f..c7888eb8cf 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,7 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/vhost-iova-tree.h"
 #include "standard-headers/linux/vhost_types.h"
 
 #include "qemu/error-report.h"
@@ -45,6 +46,9 @@ typedef struct VhostShadowVirtqueue {
     /* Virtio device */
     VirtIODevice *vdev;
 
+    /* IOVA mapping */
+    VhostIOVATree *iova_tree;
+
     /* Map for returning guest's descriptors */
     VirtQueueElement **ring_id_maps;
 
@@ -97,13 +101,7 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
             continue;
 
         case VIRTIO_F_ACCESS_PLATFORM:
-            /* SVQ does not know how to translate addresses */
-            if (*dev_features & BIT_ULL(b)) {
-                clear_bit(b, dev_features);
-                r = false;
-            }
-            break;
-
+            /* SVQ trust in host's IOMMU to translate addresses */
         case VIRTIO_F_VERSION_1:
             /* SVQ trust that guest vring is little endian */
             if (!(*dev_features & BIT_ULL(b))) {
@@ -205,7 +203,55 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
     }
 }
 
+/**
+ * Translate addresses between qemu's virtual address and SVQ IOVA
+ *
+ * @svq    Shadow VirtQueue
+ * @vaddr  Translated IOVA addresses
+ * @iovec  Source qemu's VA addresses
+ * @num    Length of iovec and minimum length of vaddr
+ */
+static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
+                                     void **addrs, const struct iovec *iovec,
+                                     size_t num)
+{
+    size_t i;
+
+    if (num == 0) {
+        return true;
+    }
+
+    for (i = 0; i < num; ++i) {
+        DMAMap needle = {
+            .translated_addr = (hwaddr)iovec[i].iov_base,
+            .size = iovec[i].iov_len,
+        };
+        size_t off;
+
+        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
+        /*
+         * Map cannot be NULL since iova map contains all guest space and
+         * qemu already has a physical address mapped
+         */
+        if (unlikely(!map)) {
+            error_report("Invalid address 0x%"HWADDR_PRIx" given by guest",
+                         needle.translated_addr);
+            return false;
+        }
+
+        /*
+         * Map->iova chunk size is ignored. What to do if descriptor
+         * (addr, size) does not fit is delegated to the device.
+         */
+        off = needle.translated_addr - map->translated_addr;
+        addrs[i] = (void *)(map->iova + off);
+    }
+
+    return true;
+}
+
 static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+                                    void * const *vaddr_sg,
                                     const struct iovec *iovec,
                                     size_t num, bool more_descs, bool write)
 {
@@ -224,7 +270,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
         } else {
             descs[i].flags = flags;
         }
-        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
+        descs[i].addr = cpu_to_le64((hwaddr)vaddr_sg[n]);
         descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
         last = i;
@@ -234,42 +280,60 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
     svq->free_head = le16_to_cpu(descs[last].next);
 }
 
-static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                    VirtQueueElement *elem)
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
+                                VirtQueueElement *elem,
+                                unsigned *head)
 {
-    int head;
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
+    bool ok;
+    g_autofree void **sgs = g_new(void *, MAX(elem->out_num, elem->in_num));
 
-    head = svq->free_head;
+    *head = svq->free_head;
 
     /* We need some descriptors here */
     assert(elem->out_num || elem->in_num);
 
-    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
+    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
                             elem->in_num > 0, false);
-    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
+
+
+    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
 
     /*
      * Put entry in available array (but don't update avail->idx until they
      * do sync).
      */
     avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
-    avail->ring[avail_idx] = cpu_to_le16(head);
+    avail->ring[avail_idx] = cpu_to_le16(*head);
     svq->avail_idx_shadow++;
 
     /* Update avail index after the descriptor is wrote */
     smp_wmb();
     avail->idx = cpu_to_le16(svq->avail_idx_shadow);
 
-    return head;
+    return true;
 }
 
-static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
 {
-    unsigned qemu_head = vhost_svq_add_split(svq, elem);
+    unsigned qemu_head;
+    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+    if (unlikely(!ok)) {
+        return false;
+    }
 
     svq->ring_id_maps[qemu_head] = elem;
+    return true;
 }
 
 static void vhost_svq_kick(VhostShadowVirtqueue *svq)
@@ -309,6 +373,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
 
         while (true) {
             VirtQueueElement *elem;
+            bool ok;
 
             if (svq->next_guest_avail_elem) {
                 elem = g_steal_pointer(&svq->next_guest_avail_elem);
@@ -337,7 +402,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 return;
             }
 
-            vhost_svq_add(svq, elem);
+            ok = vhost_svq_add(svq, elem);
+            if (unlikely(!ok)) {
+                /* VQ is broken, just return and ignore any other kicks */
+                return;
+            }
             vhost_svq_kick(svq);
         }
 
@@ -619,12 +688,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * methods and file descriptors.
  *
  * @qsize Shadow VirtQueue size
+ * @iova_tree Tree to perform descriptors translations
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
+VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_tree)
 {
     size_t desc_size = sizeof(vring_desc_t) * qsize;
     size_t device_size, driver_size;
@@ -656,6 +726,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
     memset(svq->vring.used, 0, device_size);
+    svq->iova_tree = iova_tree;
     svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 0e5c00ed7e..276a559649 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -209,6 +209,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                          vaddr, section->readonly);
 
     llsize = int128_sub(llend, int128_make64(iova));
+    if (v->shadow_vqs_enabled) {
+        DMAMap mem_region = {
+            .translated_addr = (hwaddr)vaddr,
+            .size = int128_get64(llsize) - 1,
+            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
+        };
+
+        int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
+        assert(r == IOVA_OK);
+
+        iova = mem_region.iova;
+    }
 
     vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
@@ -261,6 +273,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
 
     llsize = int128_sub(llend, int128_make64(iova));
 
+    if (v->shadow_vqs_enabled) {
+        const DMAMap *result;
+        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+        DMAMap mem_region = {
+            .translated_addr = (hwaddr)vaddr,
+            .size = int128_get64(llsize) - 1,
+        };
+
+        result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
+        iova = result->iova;
+        vhost_iova_tree_remove(v->iova_tree, &mem_region);
+    }
     vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
     if (ret) {
@@ -783,33 +809,70 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
 /**
  * Unmap SVQ area in the device
  */
-static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
-                                      hwaddr size)
+static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
+                                      const DMAMap *needle)
 {
+    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
+    hwaddr size;
     int r;
 
-    size = ROUND_UP(size, qemu_real_host_page_size);
-    r = vhost_vdpa_dma_unmap(v, iova, size);
+    if (unlikely(!result)) {
+        error_report("Unable to find SVQ address to unmap");
+        return false;
+    }
+
+    size = ROUND_UP(result->size, qemu_real_host_page_size);
+    r = vhost_vdpa_dma_unmap(v, result->iova, size);
     return r == 0;
 }
 
 static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
                                        const VhostShadowVirtqueue *svq)
 {
+    DMAMap needle;
     struct vhost_vdpa *v = dev->opaque;
     struct vhost_vring_addr svq_addr;
-    size_t device_size = vhost_svq_device_area_size(svq);
-    size_t driver_size = vhost_svq_driver_area_size(svq);
     bool ok;
 
     vhost_svq_get_vring_addr(svq, &svq_addr);
 
-    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
+    needle = (DMAMap) {
+        .translated_addr = svq_addr.desc_user_addr,
+    };
+    ok = vhost_vdpa_svq_unmap_ring(v, &needle);
     if (unlikely(!ok)) {
         return false;
     }
 
-    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
+    needle = (DMAMap) {
+        .translated_addr = svq_addr.used_user_addr,
+    };
+    return vhost_vdpa_svq_unmap_ring(v, &needle);
+}
+
+/**
+ * Map SVQ area in the device
+ *
+ * @v          Vhost-vdpa device
+ * @needle     The area to search iova
+ * @readonly   Permissions of the area
+ */
+static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, const DMAMap *needle,
+                                    bool readonly)
+{
+    hwaddr off;
+    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
+    int r;
+
+    if (unlikely(!result)) {
+        error_report("Can't locate SVQ ring");
+        return false;
+    }
+
+    off = needle->translated_addr - result->translated_addr;
+    r = vhost_vdpa_dma_map(v, result->iova + off, needle->size,
+                           (void *)needle->translated_addr, readonly);
+    return r == 0;
 }
 
 /**
@@ -821,23 +884,29 @@ static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
 static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
                                      const VhostShadowVirtqueue *svq)
 {
+    DMAMap needle;
     struct vhost_vdpa *v = dev->opaque;
     struct vhost_vring_addr svq_addr;
     size_t device_size = vhost_svq_device_area_size(svq);
     size_t driver_size = vhost_svq_driver_area_size(svq);
-    int r;
+    bool ok;
 
     vhost_svq_get_vring_addr(svq, &svq_addr);
 
-    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
-                           (void *)svq_addr.desc_user_addr, true);
-    if (unlikely(r != 0)) {
+    needle = (DMAMap) {
+        .translated_addr = svq_addr.desc_user_addr,
+        .size = driver_size,
+    };
+    ok = vhost_vdpa_svq_map_ring(v, &needle, true);
+    if (unlikely(!ok)) {
         return false;
     }
 
-    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
-                           (void *)svq_addr.used_user_addr, false);
-    return r == 0;
+    needle = (DMAMap) {
+        .translated_addr = svq_addr.used_user_addr,
+        .size = device_size,
+    };
+    return vhost_vdpa_svq_map_ring(v, &needle, false);
 }
 
 static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
@@ -1006,6 +1075,23 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
     return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
 }
 
+static bool vhost_vdpa_svq_get_vq_region(struct vhost_vdpa *v,
+                                         unsigned long long addr,
+                                         uint64_t *iova_addr)
+{
+    const DMAMap needle = {
+        .translated_addr = addr,
+    };
+    const DMAMap *translation = vhost_iova_tree_find_iova(v->iova_tree,
+                                                          &needle);
+    if (!translation) {
+        return false;
+    }
+
+    *iova_addr = translation->iova + (addr - translation->translated_addr);
+    return true;
+}
+
 static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
                                          struct vhost_virtqueue *vq)
 {
@@ -1023,10 +1109,23 @@ static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
 
     if (v->shadow_vqs_enabled) {
+        struct vhost_vring_addr svq_addr;
         int idx = vhost_vdpa_get_vq_index(dev, addr->index);
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
 
-        vhost_svq_get_vring_addr(svq, addr);
+        vhost_svq_get_vring_addr(svq, &svq_addr);
+        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.desc_user_addr,
+                                          &addr->desc_user_addr)) {
+            return -1;
+        }
+        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.avail_user_addr,
+                                          &addr->avail_user_addr)) {
+            return -1;
+        }
+        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.used_user_addr,
+                                          &addr->used_user_addr)) {
+            return -1;
+        }
     } else {
         vhost_vdpa_vq_get_guest_addr(addr, vq);
     }
@@ -1095,13 +1194,37 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
-
+        DMAMap device_region, driver_region;
+        struct vhost_vring_addr addr;
+        VhostShadowVirtqueue *svq = vhost_svq_new(qsize, v->iova_tree);
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
             return -1;
         }
-        g_ptr_array_add(v->shadow_vqs, svq);
+
+        vhost_svq_get_vring_addr(svq, &addr);
+        driver_region = (DMAMap) {
+            .translated_addr = (hwaddr)addr.desc_user_addr,
+
+            /*
+             * DMAMAp.size include the last byte included in the range, while
+             * sizeof marks one past it. Substract one byte to make them match.
+             */
+            .size = vhost_svq_driver_area_size(svq) - 1,
+            .perm = VHOST_ACCESS_RO,
+        };
+        device_region = (DMAMap) {
+            .translated_addr = (hwaddr)addr.used_user_addr,
+            .size = vhost_svq_device_area_size(svq) - 1,
+            .perm = VHOST_ACCESS_RW,
+        };
+
+        r = vhost_iova_tree_map_alloc(v->iova_tree, &driver_region);
+        assert(r == IOVA_OK);
+        r = vhost_iova_tree_map_alloc(v->iova_tree, &device_region);
+        assert(r == IOVA_OK);
+
+        g_ptr_array_add(shadow_vqs, svq);
     }
 
 out:
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 24/31] vhost: Add vhost_svq_get_last_used_idx
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (22 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 25/31] vdpa: Adapt vhost_vdpa_get_vring_base to SVQ Eugenio Pérez
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This way SVQ queues can be migrated.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 1 +
 hw/virtio/vhost-shadow-virtqueue.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index c6f67d6f76..a2b0c6434d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -30,6 +30,7 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
                                               const VhostShadowVirtqueue *svq);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
                               struct vhost_vring_addr *addr);
+uint16_t vhost_svq_get_last_used_idx(const VhostShadowVirtqueue *svq);
 uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index c7888eb8cf..eb0a3fcb80 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -574,6 +574,14 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
     addr->used_user_addr = (uint64_t)svq->vring.used;
 }
 
+/**
+ * Get the next index that SVQ is going to consume from SVQ used ring.
+ */
+uint16_t vhost_svq_get_last_used_idx(const VhostShadowVirtqueue *svq)
+{
+    return svq->last_used_idx;
+}
+
 uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq)
 {
     return svq->vring.num;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 25/31] vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (23 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 24/31] vhost: Add vhost_svq_get_last_used_idx Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 26/31] vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ Eugenio Pérez
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

This is needed to achieve migration, so destination can restore its
index.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 276a559649..887857c177 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -716,8 +716,17 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
 static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int ret;
 
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                      ring->index);
+
+        ring->num = vhost_svq_get_last_used_idx(svq);
+        return 0;
+    }
+
     ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
     trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
     return ret;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 26/31] vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (24 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 25/31] vdpa: Adapt vhost_vdpa_get_vring_base to SVQ Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 27/31] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Doing so would cause the device to export writes to SVQ addresses, which
are not part of the guest's IOVA.

Like the previous patch, this is currently not possible since SVQ does
not run if the device exports VHOST_VRING_F_LOG. But it's needed to
enable migration with SVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 887857c177..ab729b3371 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -660,10 +660,16 @@ static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
 static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
                                        struct vhost_vring_addr *addr)
 {
+    struct vhost_vdpa *v = dev->opaque;
+
     trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
                                     addr->desc_user_addr, addr->used_user_addr,
                                     addr->avail_user_addr,
                                     addr->log_guest_addr);
+
+    if (v->shadow_vqs_enabled) {
+        addr->flags &= ~BIT_ULL(VHOST_VRING_F_LOG);
+    }
     return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 27/31] vdpa: Never set log_base addr if SVQ is enabled
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (25 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 26/31] vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-21 20:27 ` [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Setting the log address would make the device start reporting invalid
dirty memory because the SVQ vrings are located in qemu's memory.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index ab729b3371..fb0a338baa 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -648,7 +648,8 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
                                      struct vhost_log *log)
 {
-    if (vhost_vdpa_one_time_request(dev)) {
+    struct vhost_vdpa *v = dev->opaque;
+    if (v->shadow_vqs_enabled || vhost_vdpa_one_time_request(dev)) {
         return 0;
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (26 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 27/31] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  6:50     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 29/31] vdpa: Make ncs autofree Eugenio Pérez
                   ` (3 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

SVQ is able to log the dirty bits by itself, so let's use it to not
block migration.

Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
enabled. Even if the device supports it, the reports would be nonsense
because SVQ memory is in the qemu region.

The log region is still allocated. Future changes might skip that, but
this series is already long enough.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index fb0a338baa..75090d65e8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
     if (ret == 0 && v->shadow_vqs_enabled) {
         /* Filter only features that SVQ can offer to guest */
         vhost_svq_valid_guest_features(features);
+
+        /* Add SVQ logging capabilities */
+        *features |= BIT_ULL(VHOST_F_LOG_ALL);
     }
 
     return ret;
@@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
 
     if (v->shadow_vqs_enabled) {
         uint64_t dev_features, svq_features, acked_features;
+        uint8_t status = 0;
         bool ok;
 
+        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
+        if (unlikely(ret)) {
+            return ret;
+        }
+
+        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+            /*
+             * vhost is trying to enable or disable _F_LOG, and the device
+             * would report wrong dirty pages. SVQ handles it.
+             */
+            return 0;
+        }
+
+        /* We must not ack _F_LOG if SVQ is enabled */
+        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
+
         ret = vhost_vdpa_get_dev_features(dev, &dev_features);
         if (ret != 0) {
             error_report("Can't get vdpa device features, got (%d)", ret);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 29/31] vdpa: Make ncs autofree
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (27 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  6:51     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c Eugenio Pérez
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Simplifying memory management.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4125d13118..4befba5cc7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -264,7 +264,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 {
     const NetdevVhostVDPAOptions *opts;
     int vdpa_device_fd;
-    NetClientState **ncs, *nc;
+    g_autofree NetClientState **ncs = NULL;
+    NetClientState *nc;
     int queue_pairs, i, has_cvq = 0;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
@@ -302,7 +303,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
             goto err;
     }
 
-    g_free(ncs);
     return 0;
 
 err:
@@ -310,7 +310,6 @@ err:
         qemu_del_net_client(ncs[0]);
     }
     qemu_close(vdpa_device_fd);
-    g_free(ncs);
 
     return -1;
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (28 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 29/31] vdpa: Make ncs autofree Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-30  6:53     ` Jason Wang
  2022-01-21 20:27 ` [PATCH 31/31] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
  2022-01-28  6:02   ` Jason Wang
  31 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Since it's a device property, it can be done in net/. This helps SVQ to
allocate the rings in vdpa device initialization, rather than delay
that.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 15 ---------------
 net/vhost-vdpa.c       | 32 ++++++++++++++++++++++++--------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 75090d65e8..2491c05d29 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -350,19 +350,6 @@ static int vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
     return 0;
 }
 
-static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
-{
-    int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
-                              &v->iova_range);
-    if (ret != 0) {
-        v->iova_range.first = 0;
-        v->iova_range.last = UINT64_MAX;
-    }
-
-    trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
-                                    v->iova_range.last);
-}
-
 static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v = dev->opaque;
@@ -1295,8 +1282,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
         goto err;
     }
 
-    vhost_vdpa_get_iova_range(v);
-
     if (vhost_vdpa_one_time_request(dev)) {
         return 0;
     }
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4befba5cc7..cc9cecf8d1 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -22,6 +22,7 @@
 #include <sys/ioctl.h>
 #include <err.h>
 #include "standard-headers/linux/virtio_net.h"
+#include "standard-headers/linux/vhost_types.h"
 #include "monitor/monitor.h"
 #include "hw/virtio/vhost.h"
 
@@ -187,13 +188,25 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_get_iova_range(int fd,
+                                      struct vhost_vdpa_iova_range *iova_range)
+{
+    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+    if (ret < 0) {
+        iova_range->first = 0;
+        iova_range->last = UINT64_MAX;
+    }
+}
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-                                           const char *device,
-                                           const char *name,
-                                           int vdpa_device_fd,
-                                           int queue_pair_index,
-                                           int nvqs,
-                                           bool is_datapath)
+                                       const char *device,
+                                       const char *name,
+                                       int vdpa_device_fd,
+                                       int queue_pair_index,
+                                       int nvqs,
+                                       bool is_datapath,
+                                       struct vhost_vdpa_iova_range iova_range)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -211,6 +224,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->vhost_vdpa.iova_range = iova_range;
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         qemu_del_net_client(nc);
@@ -267,6 +281,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, i, has_cvq = 0;
+    struct vhost_vdpa_iova_range iova_range;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -286,19 +301,20 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         qemu_close(vdpa_device_fd);
         return queue_pairs;
     }
+    vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true);
+                                     vdpa_device_fd, i, 2, true, iova_range);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false);
+                                 vdpa_device_fd, i, 1, false, iova_range);
         if (!nc)
             goto err;
     }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* [PATCH 31/31] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
                   ` (29 preceding siblings ...)
  2022-01-21 20:27 ` [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c Eugenio Pérez
@ 2022-01-21 20:27 ` Eugenio Pérez
  2022-01-28  6:02   ` Jason Wang
  31 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-01-21 20:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |  5 ++++-
 net/vhost-vdpa.c | 27 ++++++++++++++++++++++++---
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 7fab2e7cd8..d243701527 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,15 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #          (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
     '*vhostdev':     'str',
-    '*queues':       'int' } }
+    '*queues':       'int',
+    '*x-svq':        'bool' } }
 
 ##
 # @NetClientDriver:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index cc9cecf8d1..9e443fa715 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -128,7 +128,11 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_dev *dev = s->vhost_vdpa.dev;
 
+    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -206,7 +210,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        int queue_pair_index,
                                        int nvqs,
                                        bool is_datapath,
-                                       struct vhost_vdpa_iova_range iova_range)
+                                       bool svq,
+                                       struct vhost_vdpa_iova_range iova_range,
+                                       VhostIOVATree *iova_tree)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -225,6 +231,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
     s->vhost_vdpa.iova_range = iova_range;
+    s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.iova_tree = iova_tree;
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         qemu_del_net_client(nc);
@@ -281,6 +289,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, i, has_cvq = 0;
+    g_autoptr(VhostIOVATree) iova_tree = NULL;
     struct vhost_vdpa_iova_range iova_range;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
@@ -302,29 +311,41 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return queue_pairs;
     }
     vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+    if (opts->x_svq) {
+        if (has_cvq) {
+            error_setg(errp, "vdpa svq does not work with cvq");
+            goto err_svq;
+        }
+        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    }
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true, iova_range);
+                                     vdpa_device_fd, i, 2, true, opts->x_svq,
+                                     iova_range, iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false, iova_range);
+                                 vdpa_device_fd, i, 1, false, opts->x_svq,
+                                 iova_range, iova_tree);
         if (!nc)
             goto err;
     }
 
+    iova_tree = NULL;
     return 0;
 
 err:
     if (i) {
         qemu_del_net_client(ncs[0]);
     }
+
+err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-21 20:27 ` [PATCH 21/31] util: Add iova_tree_alloc Eugenio Pérez
@ 2022-01-24  4:32     ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-24  4:32 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    const DMAMapInternal *last, *i;
> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> +     *
> +     * Try to allocate first at the end of the list.
> +     */
> +    last = QTAILQ_LAST(&tree->list);
> +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> +                                    map->size)) {
> +        goto alloc;
> +    }
> +
> +    /* Look for inner hole */
> +    last = NULL;
> +    for (i = QTAILQ_FIRST(&tree->list); i;
> +         last = i, i = QTAILQ_NEXT(i, entry)) {
> +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> +                                        map->size)) {
> +            goto alloc;
> +        }
> +    }
> +
> +    return IOVA_ERR_NOMEM;
> +
> +alloc:
> +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> +    return iova_tree_insert(tree, map);
> +}

Hi, Eugenio,

Have you tried with what Jason suggested previously?

  https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/

That solution still sounds very sensible to me even without the newly
introduced list in previous two patches.

IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
stucture that was passed into the traverse func though, so it'll naturally work
with threading.

Or is there any blocker for it?

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
@ 2022-01-24  4:32     ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-24  4:32 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-devel,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    const DMAMapInternal *last, *i;
> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> +     *
> +     * Try to allocate first at the end of the list.
> +     */
> +    last = QTAILQ_LAST(&tree->list);
> +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> +                                    map->size)) {
> +        goto alloc;
> +    }
> +
> +    /* Look for inner hole */
> +    last = NULL;
> +    for (i = QTAILQ_FIRST(&tree->list); i;
> +         last = i, i = QTAILQ_NEXT(i, entry)) {
> +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> +                                        map->size)) {
> +            goto alloc;
> +        }
> +    }
> +
> +    return IOVA_ERR_NOMEM;
> +
> +alloc:
> +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> +    return iova_tree_insert(tree, map);
> +}

Hi, Eugenio,

Have you tried with what Jason suggested previously?

  https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/

That solution still sounds very sensible to me even without the newly
introduced list in previous two patches.

IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
stucture that was passed into the traverse func though, so it'll naturally work
with threading.

Or is there any blocker for it?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-24  4:32     ` Peter Xu
  (?)
@ 2022-01-24  9:20     ` Eugenio Perez Martin
  2022-01-24 11:07         ` Peter Xu
  2022-01-30  5:06         ` Jason Wang
  -1 siblings, 2 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-24  9:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-level,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,

I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.

> > +                    hwaddr iova_last)
> > +{
> > +    const DMAMapInternal *last, *i;
> > +
> > +    assert(iova_begin < iova_last);
> > +
> > +    /*
> > +     * Find a valid hole for the mapping
> > +     *
> > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> > +     *
> > +     * Try to allocate first at the end of the list.
> > +     */
> > +    last = QTAILQ_LAST(&tree->list);
> > +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> > +                                    map->size)) {
> > +        goto alloc;
> > +    }
> > +
> > +    /* Look for inner hole */
> > +    last = NULL;
> > +    for (i = QTAILQ_FIRST(&tree->list); i;
> > +         last = i, i = QTAILQ_NEXT(i, entry)) {
> > +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> > +                                        map->size)) {
> > +            goto alloc;
> > +        }
> > +    }
> > +
> > +    return IOVA_ERR_NOMEM;
> > +
> > +alloc:
> > +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> > +    return iova_tree_insert(tree, map);
> > +}
>
> Hi, Eugenio,
>
> Have you tried with what Jason suggested previously?
>
>   https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
>
> That solution still sounds very sensible to me even without the newly
> introduced list in previous two patches.
>
> IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
> stucture that was passed into the traverse func though, so it'll naturally work
> with threading.
>
> Or is there any blocker for it?
>

Hi Peter,

I can try that solution again, but the main problem was the special
cases of the beginning and ending.

For the function to locate a hole, DMAMap first = {.iova = 0, .size =
0} means that it cannot account 0 for the hole.

In other words, with that algorithm, if the only valid hole is [0, N)
and we try to allocate a block of size N, it would fail.

Same happens with iova_end, although in practice it seems that IOMMU
hardware iova upper limit is never UINT64_MAX.

Maybe we could treat .size = 0 as a special case? I see cleaner either
to build the list (but insert needs to take the list into account) or
to explicitly tell that prev == NULL means to use iova_first.

Another solution that comes to my mind: to add both exceptions outside
of transverse function, and skip the first iteration with something
like:

if (prev == NULL) {
  prev = this;
  return false /* continue */
}

So the transverse callback has way less code paths. Would it work for
you if I send a separate RFC from SVQ only to validate this?

Thanks!

> Thanks,

>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-24  9:20     ` Eugenio Perez Martin
@ 2022-01-24 11:07         ` Peter Xu
  2022-01-30  5:06         ` Jason Wang
  1 sibling, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-24 11:07 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Mon, Jan 24, 2022 at 10:20:55AM +0100, Eugenio Perez Martin wrote:
> On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> 
> I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
> 
> > > +                    hwaddr iova_last)
> > > +{
> > > +    const DMAMapInternal *last, *i;
> > > +
> > > +    assert(iova_begin < iova_last);
> > > +
> > > +    /*
> > > +     * Find a valid hole for the mapping
> > > +     *
> > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> > > +     *
> > > +     * Try to allocate first at the end of the list.
> > > +     */
> > > +    last = QTAILQ_LAST(&tree->list);
> > > +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> > > +                                    map->size)) {
> > > +        goto alloc;
> > > +    }
> > > +
> > > +    /* Look for inner hole */
> > > +    last = NULL;
> > > +    for (i = QTAILQ_FIRST(&tree->list); i;
> > > +         last = i, i = QTAILQ_NEXT(i, entry)) {
> > > +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> > > +                                        map->size)) {
> > > +            goto alloc;
> > > +        }
> > > +    }
> > > +
> > > +    return IOVA_ERR_NOMEM;
> > > +
> > > +alloc:
> > > +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> > > +    return iova_tree_insert(tree, map);
> > > +}
> >
> > Hi, Eugenio,
> >
> > Have you tried with what Jason suggested previously?
> >
> >   https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
> >
> > That solution still sounds very sensible to me even without the newly
> > introduced list in previous two patches.
> >
> > IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
> > stucture that was passed into the traverse func though, so it'll naturally work
> > with threading.
> >
> > Or is there any blocker for it?
> >
> 
> Hi Peter,
> 
> I can try that solution again, but the main problem was the special
> cases of the beginning and ending.
> 
> For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> 0} means that it cannot account 0 for the hole.
> 
> In other words, with that algorithm, if the only valid hole is [0, N)
> and we try to allocate a block of size N, it would fail.
> 
> Same happens with iova_end, although in practice it seems that IOMMU
> hardware iova upper limit is never UINT64_MAX.
> 
> Maybe we could treat .size = 0 as a special case? I see cleaner either
> to build the list (but insert needs to take the list into account) or
> to explicitly tell that prev == NULL means to use iova_first.

Sounds good to me.  I didn't mean to copy-paste Jason's code, but IMHO what
Jason wanted to show is the general concept - IOW, the fundamental idea (to me)
is that the tree will be traversed in order, hence maintaining another list
structure is redundant.

> 
> Another solution that comes to my mind: to add both exceptions outside
> of transverse function, and skip the first iteration with something
> like:
> 
> if (prev == NULL) {
>   prev = this;
>   return false /* continue */
> }
> 
> So the transverse callback has way less code paths. Would it work for
> you if I send a separate RFC from SVQ only to validate this?

Sure. :-)

If you want, imho you can also attach the patch when reply, then the discussion
context won't be lost too.

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
@ 2022-01-24 11:07         ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-24 11:07 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-level,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Mon, Jan 24, 2022 at 10:20:55AM +0100, Eugenio Perez Martin wrote:
> On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> 
> I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
> 
> > > +                    hwaddr iova_last)
> > > +{
> > > +    const DMAMapInternal *last, *i;
> > > +
> > > +    assert(iova_begin < iova_last);
> > > +
> > > +    /*
> > > +     * Find a valid hole for the mapping
> > > +     *
> > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> > > +     *
> > > +     * Try to allocate first at the end of the list.
> > > +     */
> > > +    last = QTAILQ_LAST(&tree->list);
> > > +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> > > +                                    map->size)) {
> > > +        goto alloc;
> > > +    }
> > > +
> > > +    /* Look for inner hole */
> > > +    last = NULL;
> > > +    for (i = QTAILQ_FIRST(&tree->list); i;
> > > +         last = i, i = QTAILQ_NEXT(i, entry)) {
> > > +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> > > +                                        map->size)) {
> > > +            goto alloc;
> > > +        }
> > > +    }
> > > +
> > > +    return IOVA_ERR_NOMEM;
> > > +
> > > +alloc:
> > > +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> > > +    return iova_tree_insert(tree, map);
> > > +}
> >
> > Hi, Eugenio,
> >
> > Have you tried with what Jason suggested previously?
> >
> >   https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
> >
> > That solution still sounds very sensible to me even without the newly
> > introduced list in previous two patches.
> >
> > IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
> > stucture that was passed into the traverse func though, so it'll naturally work
> > with threading.
> >
> > Or is there any blocker for it?
> >
> 
> Hi Peter,
> 
> I can try that solution again, but the main problem was the special
> cases of the beginning and ending.
> 
> For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> 0} means that it cannot account 0 for the hole.
> 
> In other words, with that algorithm, if the only valid hole is [0, N)
> and we try to allocate a block of size N, it would fail.
> 
> Same happens with iova_end, although in practice it seems that IOMMU
> hardware iova upper limit is never UINT64_MAX.
> 
> Maybe we could treat .size = 0 as a special case? I see cleaner either
> to build the list (but insert needs to take the list into account) or
> to explicitly tell that prev == NULL means to use iova_first.

Sounds good to me.  I didn't mean to copy-paste Jason's code, but IMHO what
Jason wanted to show is the general concept - IOW, the fundamental idea (to me)
is that the tree will be traversed in order, hence maintaining another list
structure is redundant.

> 
> Another solution that comes to my mind: to add both exceptions outside
> of transverse function, and skip the first iteration with something
> like:
> 
> if (prev == NULL) {
>   prev = this;
>   return false /* continue */
> }
> 
> So the transverse callback has way less code paths. Would it work for
> you if I send a separate RFC from SVQ only to validate this?

Sure. :-)

If you want, imho you can also attach the patch when reply, then the discussion
context won't be lost too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-24 11:07         ` Peter Xu
  (?)
@ 2022-01-25  9:40         ` Eugenio Perez Martin
  2022-01-27  8:06             ` Peter Xu
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-25  9:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-level,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Mon, Jan 24, 2022 at 12:08 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jan 24, 2022 at 10:20:55AM +0100, Eugenio Perez Martin wrote:
> > On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> > > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> >
> > I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
> >
> > > > +                    hwaddr iova_last)
> > > > +{
> > > > +    const DMAMapInternal *last, *i;
> > > > +
> > > > +    assert(iova_begin < iova_last);
> > > > +
> > > > +    /*
> > > > +     * Find a valid hole for the mapping
> > > > +     *
> > > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > > +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> > > > +     *
> > > > +     * Try to allocate first at the end of the list.
> > > > +     */
> > > > +    last = QTAILQ_LAST(&tree->list);
> > > > +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> > > > +                                    map->size)) {
> > > > +        goto alloc;
> > > > +    }
> > > > +
> > > > +    /* Look for inner hole */
> > > > +    last = NULL;
> > > > +    for (i = QTAILQ_FIRST(&tree->list); i;
> > > > +         last = i, i = QTAILQ_NEXT(i, entry)) {
> > > > +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> > > > +                                        map->size)) {
> > > > +            goto alloc;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return IOVA_ERR_NOMEM;
> > > > +
> > > > +alloc:
> > > > +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> > > > +    return iova_tree_insert(tree, map);
> > > > +}
> > >
> > > Hi, Eugenio,
> > >
> > > Have you tried with what Jason suggested previously?
> > >
> > >   https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
> > >
> > > That solution still sounds very sensible to me even without the newly
> > > introduced list in previous two patches.
> > >
> > > IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
> > > stucture that was passed into the traverse func though, so it'll naturally work
> > > with threading.
> > >
> > > Or is there any blocker for it?
> > >
> >
> > Hi Peter,
> >
> > I can try that solution again, but the main problem was the special
> > cases of the beginning and ending.
> >
> > For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> > 0} means that it cannot account 0 for the hole.
> >
> > In other words, with that algorithm, if the only valid hole is [0, N)
> > and we try to allocate a block of size N, it would fail.
> >
> > Same happens with iova_end, although in practice it seems that IOMMU
> > hardware iova upper limit is never UINT64_MAX.
> >
> > Maybe we could treat .size = 0 as a special case? I see cleaner either
> > to build the list (but insert needs to take the list into account) or
> > to explicitly tell that prev == NULL means to use iova_first.
>
> Sounds good to me.  I didn't mean to copy-paste Jason's code, but IMHO what
> Jason wanted to show is the general concept - IOW, the fundamental idea (to me)
> is that the tree will be traversed in order, hence maintaining another list
> structure is redundant.
>

I agree.

My idea with this version was to easily delete all the custom code
once we have GTree with proper first/next/last, or _node functions.
That's why it's simply reimplementing GTree functions in the current
Glib version. I find old code way too complicated, and this one easier
to handle although way more verbose, but let's see if we can improve
the old one.

> >
> > Another solution that comes to my mind: to add both exceptions outside
> > of transverse function, and skip the first iteration with something
> > like:
> >
> > if (prev == NULL) {
> >   prev = this;
> >   return false /* continue */
> > }
> >
> > So the transverse callback has way less code paths. Would it work for
> > you if I send a separate RFC from SVQ only to validate this?
>
> Sure. :-)
>
> If you want, imho you can also attach the patch when reply, then the discussion
> context won't be lost too.
>

Sure,

So I think that the first step to remove complexity from the old one
is to remove iova_begin and iova_end.

As Jason points out, removing iova_end is easier. It has the drawback
of having to traverse all the list beyond iova_end, but a well formed
iova tree should contain none. If the guest can manipulate it, it's
only hurting itself adding nodes to it.

It's possible to extract the check for hole_right (or this in Jason's
proposal) as a special case too.

But removing the iova_begin parameter is more complicated. We cannot
know if it's a valid hole without knowing iova_begin, and we cannot
resume traversing. Could we assume iova_begin will always be 0? I
think not, the vdpa device can return anything through syscall.

Thanks!



> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 02/31] vhost: Add VhostShadowVirtqueue
  2022-01-21 20:27 ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Eugenio Pérez
@ 2022-01-26  8:53   ` Eugenio Perez Martin
  2022-01-28  6:00     ` Jason Wang
  1 sibling, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-26  8:53 UTC (permalink / raw)
  To: qemu-level
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Juan Quintela,
	Jason Wang, Michael S. Tsirkin, Richard Henderson,
	Markus Armbruster, Gautam Dawar, virtualization,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Stefano Garzarella,
	Eric Blake, Eduardo Habkost

On Fri, Jan 21, 2022 at 9:32 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
> notifications and buffers, allowing qemu to track them. While qemu is
> forwarding the buffers and virtqueue changes, it is able to commit the
> memory it's being dirtied, the same way regular qemu's VirtIO devices
> do.
>
> This commit only exposes basic SVQ allocation and free. Next patches of
> the series add functionality like notifications and buffers forwarding.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
>  hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
>  hw/virtio/meson.build              |  2 +-
>  3 files changed, 86 insertions(+), 1 deletion(-)
>  create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>  create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> new file mode 100644
> index 0000000000..61ea112002
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -0,0 +1,21 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef VHOST_SHADOW_VIRTQUEUE_H
> +#define VHOST_SHADOW_VIRTQUEUE_H
> +
> +#include "hw/virtio/vhost.h"
> +
> +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> +
> +VhostShadowVirtqueue *vhost_svq_new(void);
> +
> +void vhost_svq_free(VhostShadowVirtqueue *vq);
> +
> +#endif
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> new file mode 100644
> index 0000000000..5ee7b401cb
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -0,0 +1,64 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
> +
> +#include "qemu/error-report.h"
> +#include "qemu/event_notifier.h"
> +
> +/* Shadow virtqueue to relay notifications */
> +typedef struct VhostShadowVirtqueue {

This is already typedef as VhostShadowVirtqueue in the header, so I
will remove it here for the next version.

> +    /* Shadow kick notifier, sent to vhost */
> +    EventNotifier hdev_kick;
> +    /* Shadow call notifier, sent to vhost */
> +    EventNotifier hdev_call;
> +} VhostShadowVirtqueue;
> +
> +/**
> + * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> + * methods and file descriptors.
> + */
> +VhostShadowVirtqueue *vhost_svq_new(void)
> +{
> +    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> +    int r;
> +
> +    r = event_notifier_init(&svq->hdev_kick, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create kick event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_kick;
> +    }
> +
> +    r = event_notifier_init(&svq->hdev_call, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create call event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_call;
> +    }
> +
> +    return g_steal_pointer(&svq);
> +
> +err_init_hdev_call:
> +    event_notifier_cleanup(&svq->hdev_kick);
> +
> +err_init_hdev_kick:
> +    return NULL;
> +}
> +
> +/**
> + * Free the resources of the shadow virtqueue.
> + */
> +void vhost_svq_free(VhostShadowVirtqueue *vq)
> +{
> +    event_notifier_cleanup(&vq->hdev_kick);
> +    event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq);
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 521f7d64a8..2dc87613bc 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>
>  virtio_ss = ss.source_set()
>  virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
>  virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>  virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>  virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
> --
> 2.27.0
>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-25  9:40         ` Eugenio Perez Martin
@ 2022-01-27  8:06             ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-27  8:06 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> So I think that the first step to remove complexity from the old one
> is to remove iova_begin and iova_end.
> 
> As Jason points out, removing iova_end is easier. It has the drawback
> of having to traverse all the list beyond iova_end, but a well formed
> iova tree should contain none. If the guest can manipulate it, it's
> only hurting itself adding nodes to it.
> 
> It's possible to extract the check for hole_right (or this in Jason's
> proposal) as a special case too.
> 
> But removing the iova_begin parameter is more complicated. We cannot
> know if it's a valid hole without knowing iova_begin, and we cannot
> resume traversing. Could we assume iova_begin will always be 0? I
> think not, the vdpa device can return anything through syscall.

Frankly I don't know what's the syscall you're talking about, but after a 2nd
thought and after I went back and re-read your previous version more carefully
(the one without the list) I think it seems working to me in general.  I should
have tried harder when reviewing the first time!

I mean this one:

https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/

Though this time I have some comments on the details.

Personally I like that one (probably with some amendment upon the old version)
more than the current list-based approach.  But I'd like to know your thoughts
too (including Jason).  I'll further comment in that thread soon.

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
@ 2022-01-27  8:06             ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-27  8:06 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-level,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> So I think that the first step to remove complexity from the old one
> is to remove iova_begin and iova_end.
> 
> As Jason points out, removing iova_end is easier. It has the drawback
> of having to traverse all the list beyond iova_end, but a well formed
> iova tree should contain none. If the guest can manipulate it, it's
> only hurting itself adding nodes to it.
> 
> It's possible to extract the check for hole_right (or this in Jason's
> proposal) as a special case too.
> 
> But removing the iova_begin parameter is more complicated. We cannot
> know if it's a valid hole without knowing iova_begin, and we cannot
> resume traversing. Could we assume iova_begin will always be 0? I
> think not, the vdpa device can return anything through syscall.

Frankly I don't know what's the syscall you're talking about, but after a 2nd
thought and after I went back and re-read your previous version more carefully
(the one without the list) I think it seems working to me in general.  I should
have tried harder when reviewing the first time!

I mean this one:

https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/

Though this time I have some comments on the details.

Personally I like that one (probably with some amendment upon the old version)
more than the current list-based approach.  But I'd like to know your thoughts
too (including Jason).  I'll further comment in that thread soon.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-27  8:06             ` Peter Xu
  (?)
@ 2022-01-27  9:24             ` Eugenio Perez Martin
  2022-01-28  3:57                 ` Peter Xu
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-27  9:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-level,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> > So I think that the first step to remove complexity from the old one
> > is to remove iova_begin and iova_end.
> >
> > As Jason points out, removing iova_end is easier. It has the drawback
> > of having to traverse all the list beyond iova_end, but a well formed
> > iova tree should contain none. If the guest can manipulate it, it's
> > only hurting itself adding nodes to it.
> >
> > It's possible to extract the check for hole_right (or this in Jason's
> > proposal) as a special case too.
> >
> > But removing the iova_begin parameter is more complicated. We cannot
> > know if it's a valid hole without knowing iova_begin, and we cannot
> > resume traversing. Could we assume iova_begin will always be 0? I
> > think not, the vdpa device can return anything through syscall.
>
> Frankly I don't know what's the syscall you're talking about,

I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
range of iova addresses. We get a pair of uint64_t from it, that
indicates the minimum and maximum iova address the device (or iommu)
supports.

We must allocate iova ranges within that address range, which
complicates this algorithm a little bit. Since the SVQ iova addresses
are not GPA, qemu needs extra code to be able to allocate and free
them, creating a new custom iova as.

Please let me know if you want more details or if you prefer me to
give more context in the patch message.

> but after a 2nd
> thought and after I went back and re-read your previous version more carefully
> (the one without the list) I think it seems working to me in general.  I should
> have tried harder when reviewing the first time!
>

I guess I should have added more context so this particular change can
be better understood in isolation.

> I mean this one:
>
> https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
>
> Though this time I have some comments on the details.
>
> Personally I like that one (probably with some amendment upon the old version)
> more than the current list-based approach.  But I'd like to know your thoughts
> too (including Jason).  I'll further comment in that thread soon.
>

Sure, I'm fine with whatever solution we choose, but I'm just running
out of ideas to simplify it. Reading your suggestions on old RFC now.

Overall I feel list-based one is both more convenient and easy to
delete when qemu raises the minimal glib version, but it adds a lot
more code.

It could add less code with this less elegant changes:
* If we just put the list entry in the DMAMap itself, although it
exposes unneeded implementation details.
* We force the iova tree either to be an allocation-based or an
insertion-based, but not both. In other words, you can only either use
iova_tree_alloc or iova_tree_insert on the same tree.

I have a few tests to check the algorithms, but they are not in the
qemu test format. I will post them so we all can understand better
what is expected from this.

Thanks!



Thanks!

> Thanks,
>
> --
> Peter Xu
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-27  9:24             ` Eugenio Perez Martin
@ 2022-01-28  3:57                 ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-28  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> > > So I think that the first step to remove complexity from the old one
> > > is to remove iova_begin and iova_end.
> > >
> > > As Jason points out, removing iova_end is easier. It has the drawback
> > > of having to traverse all the list beyond iova_end, but a well formed
> > > iova tree should contain none. If the guest can manipulate it, it's
> > > only hurting itself adding nodes to it.
> > >
> > > It's possible to extract the check for hole_right (or this in Jason's
> > > proposal) as a special case too.
> > >
> > > But removing the iova_begin parameter is more complicated. We cannot
> > > know if it's a valid hole without knowing iova_begin, and we cannot
> > > resume traversing. Could we assume iova_begin will always be 0? I
> > > think not, the vdpa device can return anything through syscall.
> >
> > Frankly I don't know what's the syscall you're talking about,
> 
> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
> range of iova addresses. We get a pair of uint64_t from it, that
> indicates the minimum and maximum iova address the device (or iommu)
> supports.
> 
> We must allocate iova ranges within that address range, which
> complicates this algorithm a little bit. Since the SVQ iova addresses
> are not GPA, qemu needs extra code to be able to allocate and free
> them, creating a new custom iova as.
> 
> Please let me know if you want more details or if you prefer me to
> give more context in the patch message.

That's good enough, thanks.

> 
> > I mean this one:
> >
> > https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
> >
> > Though this time I have some comments on the details.
> >
> > Personally I like that one (probably with some amendment upon the old version)
> > more than the current list-based approach.  But I'd like to know your thoughts
> > too (including Jason).  I'll further comment in that thread soon.
> >
> 
> Sure, I'm fine with whatever solution we choose, but I'm just running
> out of ideas to simplify it. Reading your suggestions on old RFC now.
> 
> Overall I feel list-based one is both more convenient and easy to
> delete when qemu raises the minimal glib version, but it adds a lot
> more code.
> 
> It could add less code with this less elegant changes:
> * If we just put the list entry in the DMAMap itself, although it
> exposes unneeded implementation details.
> * We force the iova tree either to be an allocation-based or an
> insertion-based, but not both. In other words, you can only either use
> iova_tree_alloc or iova_tree_insert on the same tree.

Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
with either way; it shouldn't block the rest of the code.  It'll be good if
Jason or Michael share their preferences too.

> 
> I have a few tests to check the algorithms, but they are not in the
> qemu test format. I will post them so we all can understand better
> what is expected from this.

Sure.  Thanks.

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
@ 2022-01-28  3:57                 ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-01-28  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Juan Quintela, Richard Henderson, qemu-level,
	Gautam Dawar, Markus Armbruster, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Stefan Hajnoczi, Eli Cohen,
	Paolo Bonzini, Zhu Lingshan, virtualization, Eric Blake,
	Stefano Garzarella

On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> > > So I think that the first step to remove complexity from the old one
> > > is to remove iova_begin and iova_end.
> > >
> > > As Jason points out, removing iova_end is easier. It has the drawback
> > > of having to traverse all the list beyond iova_end, but a well formed
> > > iova tree should contain none. If the guest can manipulate it, it's
> > > only hurting itself adding nodes to it.
> > >
> > > It's possible to extract the check for hole_right (or this in Jason's
> > > proposal) as a special case too.
> > >
> > > But removing the iova_begin parameter is more complicated. We cannot
> > > know if it's a valid hole without knowing iova_begin, and we cannot
> > > resume traversing. Could we assume iova_begin will always be 0? I
> > > think not, the vdpa device can return anything through syscall.
> >
> > Frankly I don't know what's the syscall you're talking about,
> 
> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
> range of iova addresses. We get a pair of uint64_t from it, that
> indicates the minimum and maximum iova address the device (or iommu)
> supports.
> 
> We must allocate iova ranges within that address range, which
> complicates this algorithm a little bit. Since the SVQ iova addresses
> are not GPA, qemu needs extra code to be able to allocate and free
> them, creating a new custom iova as.
> 
> Please let me know if you want more details or if you prefer me to
> give more context in the patch message.

That's good enough, thanks.

> 
> > I mean this one:
> >
> > https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
> >
> > Though this time I have some comments on the details.
> >
> > Personally I like that one (probably with some amendment upon the old version)
> > more than the current list-based approach.  But I'd like to know your thoughts
> > too (including Jason).  I'll further comment in that thread soon.
> >
> 
> Sure, I'm fine with whatever solution we choose, but I'm just running
> out of ideas to simplify it. Reading your suggestions on old RFC now.
> 
> Overall I feel list-based one is both more convenient and easy to
> delete when qemu raises the minimal glib version, but it adds a lot
> more code.
> 
> It could add less code with this less elegant changes:
> * If we just put the list entry in the DMAMap itself, although it
> exposes unneeded implementation details.
> * We force the iova tree either to be an allocation-based or an
> insertion-based, but not both. In other words, you can only either use
> iova_tree_alloc or iova_tree_insert on the same tree.

Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
with either way; it shouldn't block the rest of the code.  It'll be good if
Jason or Michael share their preferences too.

> 
> I have a few tests to check the algorithms, but they are not in the
> qemu test format. I will post them so we all can understand better
> what is expected from this.

Sure.  Thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-28  3:57                 ` Peter Xu
@ 2022-01-28  5:55                   ` Jason Wang
  -1 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  5:55 UTC (permalink / raw)
  To: Peter Xu, Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/28 上午11:57, Peter Xu 写道:
> On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
>> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
>>> On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
>>>> So I think that the first step to remove complexity from the old one
>>>> is to remove iova_begin and iova_end.
>>>>
>>>> As Jason points out, removing iova_end is easier. It has the drawback
>>>> of having to traverse all the list beyond iova_end, but a well formed
>>>> iova tree should contain none. If the guest can manipulate it, it's
>>>> only hurting itself adding nodes to it.
>>>>
>>>> It's possible to extract the check for hole_right (or this in Jason's
>>>> proposal) as a special case too.
>>>>
>>>> But removing the iova_begin parameter is more complicated. We cannot
>>>> know if it's a valid hole without knowing iova_begin, and we cannot
>>>> resume traversing. Could we assume iova_begin will always be 0? I
>>>> think not, the vdpa device can return anything through syscall.
>>> Frankly I don't know what's the syscall you're talking about,
>> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
>> range of iova addresses. We get a pair of uint64_t from it, that
>> indicates the minimum and maximum iova address the device (or iommu)
>> supports.
>>
>> We must allocate iova ranges within that address range, which
>> complicates this algorithm a little bit. Since the SVQ iova addresses
>> are not GPA, qemu needs extra code to be able to allocate and free
>> them, creating a new custom iova as.
>>
>> Please let me know if you want more details or if you prefer me to
>> give more context in the patch message.
> That's good enough, thanks.
>
>>> I mean this one:
>>>
>>> https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
>>>
>>> Though this time I have some comments on the details.
>>>
>>> Personally I like that one (probably with some amendment upon the old version)
>>> more than the current list-based approach.  But I'd like to know your thoughts
>>> too (including Jason).  I'll further comment in that thread soon.
>>>
>> Sure, I'm fine with whatever solution we choose, but I'm just running
>> out of ideas to simplify it. Reading your suggestions on old RFC now.
>>
>> Overall I feel list-based one is both more convenient and easy to
>> delete when qemu raises the minimal glib version, but it adds a lot
>> more code.
>>
>> It could add less code with this less elegant changes:
>> * If we just put the list entry in the DMAMap itself, although it
>> exposes unneeded implementation details.
>> * We force the iova tree either to be an allocation-based or an
>> insertion-based, but not both. In other words, you can only either use
>> iova_tree_alloc or iova_tree_insert on the same tree.


This seems an odd API I must say :(


> Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
> with either way; it shouldn't block the rest of the code.  It'll be good if
> Jason or Michael share their preferences too.


(Havne't gone through the code deeply)

I wonder how about just copy-paste gtree_node_first|last()? A quick 
google told me it's not complicated.

Thanks


>
>> I have a few tests to check the algorithms, but they are not in the
>> qemu test format. I will post them so we all can understand better
>> what is expected from this.
> Sure.  Thanks.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
@ 2022-01-28  5:55                   ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  5:55 UTC (permalink / raw)
  To: Peter Xu, Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/28 上午11:57, Peter Xu 写道:
> On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
>> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
>>> On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
>>>> So I think that the first step to remove complexity from the old one
>>>> is to remove iova_begin and iova_end.
>>>>
>>>> As Jason points out, removing iova_end is easier. It has the drawback
>>>> of having to traverse all the list beyond iova_end, but a well formed
>>>> iova tree should contain none. If the guest can manipulate it, it's
>>>> only hurting itself adding nodes to it.
>>>>
>>>> It's possible to extract the check for hole_right (or this in Jason's
>>>> proposal) as a special case too.
>>>>
>>>> But removing the iova_begin parameter is more complicated. We cannot
>>>> know if it's a valid hole without knowing iova_begin, and we cannot
>>>> resume traversing. Could we assume iova_begin will always be 0? I
>>>> think not, the vdpa device can return anything through syscall.
>>> Frankly I don't know what's the syscall you're talking about,
>> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
>> range of iova addresses. We get a pair of uint64_t from it, that
>> indicates the minimum and maximum iova address the device (or iommu)
>> supports.
>>
>> We must allocate iova ranges within that address range, which
>> complicates this algorithm a little bit. Since the SVQ iova addresses
>> are not GPA, qemu needs extra code to be able to allocate and free
>> them, creating a new custom iova as.
>>
>> Please let me know if you want more details or if you prefer me to
>> give more context in the patch message.
> That's good enough, thanks.
>
>>> I mean this one:
>>>
>>> https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
>>>
>>> Though this time I have some comments on the details.
>>>
>>> Personally I like that one (probably with some amendment upon the old version)
>>> more than the current list-based approach.  But I'd like to know your thoughts
>>> too (including Jason).  I'll further comment in that thread soon.
>>>
>> Sure, I'm fine with whatever solution we choose, but I'm just running
>> out of ideas to simplify it. Reading your suggestions on old RFC now.
>>
>> Overall I feel list-based one is both more convenient and easy to
>> delete when qemu raises the minimal glib version, but it adds a lot
>> more code.
>>
>> It could add less code with this less elegant changes:
>> * If we just put the list entry in the DMAMap itself, although it
>> exposes unneeded implementation details.
>> * We force the iova tree either to be an allocation-based or an
>> insertion-based, but not both. In other words, you can only either use
>> iova_tree_alloc or iova_tree_insert on the same tree.


This seems an odd API I must say :(


> Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
> with either way; it shouldn't block the rest of the code.  It'll be good if
> Jason or Michael share their preferences too.


(Havne't gone through the code deeply)

I wonder how about just copy-paste gtree_node_first|last()? A quick 
google told me it's not complicated.

Thanks


>
>> I have a few tests to check the algorithms, but they are not in the
>> qemu test format. I will post them so we all can understand better
>> what is expected from this.
> Sure.  Thanks.
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
  2022-01-21 20:27 ` [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions Eugenio Pérez
@ 2022-01-28  5:59     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  5:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> vhost_vdpa_set_features and vhost_vdpa_init need to use
> vhost_vdpa_get_features in svq mode.
>
> vhost_vdpa_dev_start needs to use almost all _set_ functions:
> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
>
> No functional change intended.


Is it related (a must) to the SVQ code?

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
>   1 file changed, 82 insertions(+), 82 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 04ea43704f..6c10a7f05f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>       return v->index != 0;
>   }
>   
> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> -{
> -    struct vhost_vdpa *v;
> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> -    trace_vhost_vdpa_init(dev, opaque);
> -    int ret;
> -
> -    /*
> -     * Similar to VFIO, we end up pinning all guest memory and have to
> -     * disable discarding of RAM.
> -     */
> -    ret = ram_block_discard_disable(true);
> -    if (ret) {
> -        error_report("Cannot set discarding of RAM broken");
> -        return ret;
> -    }
> -
> -    v = opaque;
> -    v->dev = dev;
> -    dev->opaque =  opaque ;
> -    v->listener = vhost_vdpa_memory_listener;
> -    v->msg_type = VHOST_IOTLB_MSG_V2;
> -
> -    vhost_vdpa_get_iova_range(v);
> -
> -    if (vhost_vdpa_one_time_request(dev)) {
> -        return 0;
> -    }
> -
> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                               VIRTIO_CONFIG_S_DRIVER);
> -
> -    return 0;
> -}
> -
>   static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
>                                               int queue_index)
>   {
> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
>       return 0;
>   }
>   
> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> -                                   uint64_t features)
> -{
> -    int ret;
> -
> -    if (vhost_vdpa_one_time_request(dev)) {
> -        return 0;
> -    }
> -
> -    trace_vhost_vdpa_set_features(dev, features);
> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> -}
> -
>   static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>   {
>       uint64_t features;
> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>       return ret;
>    }
>   
> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> -{
> -    struct vhost_vdpa *v = dev->opaque;
> -    trace_vhost_vdpa_dev_start(dev, started);
> -
> -    if (started) {
> -        vhost_vdpa_host_notifiers_init(dev);
> -        vhost_vdpa_set_vring_ready(dev);
> -    } else {
> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> -    }
> -
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> -        return 0;
> -    }
> -
> -    if (started) {
> -        memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> -
> -        return 0;
> -    }
> -}
> -
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>                                        struct vhost_log *log)
>   {
> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    trace_vhost_vdpa_dev_start(dev, started);
> +
> +    if (started) {
> +        vhost_vdpa_host_notifiers_init(dev);
> +        vhost_vdpa_set_vring_ready(dev);
> +    } else {
> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> +    }
> +
> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +        return 0;
> +    }
> +
> +    if (started) {
> +        memory_listener_register(&v->listener, &address_space_memory);
> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +    } else {
> +        vhost_vdpa_reset_device(dev);
> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                                   VIRTIO_CONFIG_S_DRIVER);
> +        memory_listener_unregister(&v->listener);
> +
> +        return 0;
> +    }
> +}
> +
>   static int vhost_vdpa_get_features(struct vhost_dev *dev,
>                                        uint64_t *features)
>   {
> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>       return ret;
>   }
>   
> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> +                                   uint64_t features)
> +{
> +    int ret;
> +
> +    if (vhost_vdpa_one_time_request(dev)) {
> +        return 0;
> +    }
> +
> +    trace_vhost_vdpa_set_features(dev, features);
> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> +}
> +
>   static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>   {
>       if (vhost_vdpa_one_time_request(dev)) {
> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> +{
> +    struct vhost_vdpa *v;
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +    trace_vhost_vdpa_init(dev, opaque);
> +    int ret;
> +
> +    /*
> +     * Similar to VFIO, we end up pinning all guest memory and have to
> +     * disable discarding of RAM.
> +     */
> +    ret = ram_block_discard_disable(true);
> +    if (ret) {
> +        error_report("Cannot set discarding of RAM broken");
> +        return ret;
> +    }
> +
> +    v = opaque;
> +    v->dev = dev;
> +    dev->opaque =  opaque ;
> +    v->listener = vhost_vdpa_memory_listener;
> +    v->msg_type = VHOST_IOTLB_MSG_V2;
> +
> +    vhost_vdpa_get_iova_range(v);
> +
> +    if (vhost_vdpa_one_time_request(dev)) {
> +        return 0;
> +    }
> +
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);
> +
> +    return 0;
> +}
> +
>   const VhostOps vdpa_ops = {
>           .backend_type = VHOST_BACKEND_TYPE_VDPA,
>           .vhost_backend_init = vhost_vdpa_init,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
@ 2022-01-28  5:59     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  5:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> vhost_vdpa_set_features and vhost_vdpa_init need to use
> vhost_vdpa_get_features in svq mode.
>
> vhost_vdpa_dev_start needs to use almost all _set_ functions:
> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
>
> No functional change intended.


Is it related (a must) to the SVQ code?

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
>   1 file changed, 82 insertions(+), 82 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 04ea43704f..6c10a7f05f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>       return v->index != 0;
>   }
>   
> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> -{
> -    struct vhost_vdpa *v;
> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> -    trace_vhost_vdpa_init(dev, opaque);
> -    int ret;
> -
> -    /*
> -     * Similar to VFIO, we end up pinning all guest memory and have to
> -     * disable discarding of RAM.
> -     */
> -    ret = ram_block_discard_disable(true);
> -    if (ret) {
> -        error_report("Cannot set discarding of RAM broken");
> -        return ret;
> -    }
> -
> -    v = opaque;
> -    v->dev = dev;
> -    dev->opaque =  opaque ;
> -    v->listener = vhost_vdpa_memory_listener;
> -    v->msg_type = VHOST_IOTLB_MSG_V2;
> -
> -    vhost_vdpa_get_iova_range(v);
> -
> -    if (vhost_vdpa_one_time_request(dev)) {
> -        return 0;
> -    }
> -
> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                               VIRTIO_CONFIG_S_DRIVER);
> -
> -    return 0;
> -}
> -
>   static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
>                                               int queue_index)
>   {
> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
>       return 0;
>   }
>   
> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> -                                   uint64_t features)
> -{
> -    int ret;
> -
> -    if (vhost_vdpa_one_time_request(dev)) {
> -        return 0;
> -    }
> -
> -    trace_vhost_vdpa_set_features(dev, features);
> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> -}
> -
>   static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>   {
>       uint64_t features;
> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>       return ret;
>    }
>   
> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> -{
> -    struct vhost_vdpa *v = dev->opaque;
> -    trace_vhost_vdpa_dev_start(dev, started);
> -
> -    if (started) {
> -        vhost_vdpa_host_notifiers_init(dev);
> -        vhost_vdpa_set_vring_ready(dev);
> -    } else {
> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> -    }
> -
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> -        return 0;
> -    }
> -
> -    if (started) {
> -        memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> -
> -        return 0;
> -    }
> -}
> -
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>                                        struct vhost_log *log)
>   {
> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    trace_vhost_vdpa_dev_start(dev, started);
> +
> +    if (started) {
> +        vhost_vdpa_host_notifiers_init(dev);
> +        vhost_vdpa_set_vring_ready(dev);
> +    } else {
> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> +    }
> +
> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +        return 0;
> +    }
> +
> +    if (started) {
> +        memory_listener_register(&v->listener, &address_space_memory);
> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +    } else {
> +        vhost_vdpa_reset_device(dev);
> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                                   VIRTIO_CONFIG_S_DRIVER);
> +        memory_listener_unregister(&v->listener);
> +
> +        return 0;
> +    }
> +}
> +
>   static int vhost_vdpa_get_features(struct vhost_dev *dev,
>                                        uint64_t *features)
>   {
> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>       return ret;
>   }
>   
> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> +                                   uint64_t features)
> +{
> +    int ret;
> +
> +    if (vhost_vdpa_one_time_request(dev)) {
> +        return 0;
> +    }
> +
> +    trace_vhost_vdpa_set_features(dev, features);
> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> +}
> +
>   static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>   {
>       if (vhost_vdpa_one_time_request(dev)) {
> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> +{
> +    struct vhost_vdpa *v;
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +    trace_vhost_vdpa_init(dev, opaque);
> +    int ret;
> +
> +    /*
> +     * Similar to VFIO, we end up pinning all guest memory and have to
> +     * disable discarding of RAM.
> +     */
> +    ret = ram_block_discard_disable(true);
> +    if (ret) {
> +        error_report("Cannot set discarding of RAM broken");
> +        return ret;
> +    }
> +
> +    v = opaque;
> +    v->dev = dev;
> +    dev->opaque =  opaque ;
> +    v->listener = vhost_vdpa_memory_listener;
> +    v->msg_type = VHOST_IOTLB_MSG_V2;
> +
> +    vhost_vdpa_get_iova_range(v);
> +
> +    if (vhost_vdpa_one_time_request(dev)) {
> +        return 0;
> +    }
> +
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);
> +
> +    return 0;
> +}
> +
>   const VhostOps vdpa_ops = {
>           .backend_type = VHOST_BACKEND_TYPE_VDPA,
>           .vhost_backend_init = vhost_vdpa_init,



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 02/31] vhost: Add VhostShadowVirtqueue
  2022-01-21 20:27 ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Eugenio Pérez
@ 2022-01-28  6:00     ` Jason Wang
  2022-01-28  6:00     ` Jason Wang
  1 sibling, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
> notifications and buffers, allowing qemu to track them. While qemu is
> forwarding the buffers and virtqueue changes, it is able to commit the
> memory it's being dirtied, the same way regular qemu's VirtIO devices
> do.
>
> This commit only exposes basic SVQ allocation and free. Next patches of
> the series add functionality like notifications and buffers forwarding.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
>   hw/virtio/meson.build              |  2 +-
>   3 files changed, 86 insertions(+), 1 deletion(-)
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> new file mode 100644
> index 0000000000..61ea112002
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -0,0 +1,21 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef VHOST_SHADOW_VIRTQUEUE_H
> +#define VHOST_SHADOW_VIRTQUEUE_H
> +
> +#include "hw/virtio/vhost.h"
> +
> +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> +
> +VhostShadowVirtqueue *vhost_svq_new(void);
> +
> +void vhost_svq_free(VhostShadowVirtqueue *vq);
> +
> +#endif
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> new file mode 100644
> index 0000000000..5ee7b401cb
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -0,0 +1,64 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
> +
> +#include "qemu/error-report.h"
> +#include "qemu/event_notifier.h"
> +
> +/* Shadow virtqueue to relay notifications */
> +typedef struct VhostShadowVirtqueue {
> +    /* Shadow kick notifier, sent to vhost */
> +    EventNotifier hdev_kick;
> +    /* Shadow call notifier, sent to vhost */
> +    EventNotifier hdev_call;
> +} VhostShadowVirtqueue;
> +
> +/**
> + * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> + * methods and file descriptors.
> + */
> +VhostShadowVirtqueue *vhost_svq_new(void)
> +{
> +    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> +    int r;
> +
> +    r = event_notifier_init(&svq->hdev_kick, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create kick event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_kick;
> +    }
> +
> +    r = event_notifier_init(&svq->hdev_call, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create call event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_call;
> +    }
> +
> +    return g_steal_pointer(&svq);
> +
> +err_init_hdev_call:
> +    event_notifier_cleanup(&svq->hdev_kick);
> +
> +err_init_hdev_kick:
> +    return NULL;
> +}
> +
> +/**
> + * Free the resources of the shadow virtqueue.
> + */
> +void vhost_svq_free(VhostShadowVirtqueue *vq)
> +{
> +    event_notifier_cleanup(&vq->hdev_kick);
> +    event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq);
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 521f7d64a8..2dc87613bc 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>   
>   virtio_ss = ss.source_set()
>   virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))


I wonder if we need a dedicated config option for shadow virtqueue.

Thanks


>   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 02/31] vhost: Add VhostShadowVirtqueue
@ 2022-01-28  6:00     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
> notifications and buffers, allowing qemu to track them. While qemu is
> forwarding the buffers and virtqueue changes, it is able to commit the
> memory it's being dirtied, the same way regular qemu's VirtIO devices
> do.
>
> This commit only exposes basic SVQ allocation and free. Next patches of
> the series add functionality like notifications and buffers forwarding.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
>   hw/virtio/meson.build              |  2 +-
>   3 files changed, 86 insertions(+), 1 deletion(-)
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> new file mode 100644
> index 0000000000..61ea112002
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -0,0 +1,21 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef VHOST_SHADOW_VIRTQUEUE_H
> +#define VHOST_SHADOW_VIRTQUEUE_H
> +
> +#include "hw/virtio/vhost.h"
> +
> +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> +
> +VhostShadowVirtqueue *vhost_svq_new(void);
> +
> +void vhost_svq_free(VhostShadowVirtqueue *vq);
> +
> +#endif
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> new file mode 100644
> index 0000000000..5ee7b401cb
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -0,0 +1,64 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
> +
> +#include "qemu/error-report.h"
> +#include "qemu/event_notifier.h"
> +
> +/* Shadow virtqueue to relay notifications */
> +typedef struct VhostShadowVirtqueue {
> +    /* Shadow kick notifier, sent to vhost */
> +    EventNotifier hdev_kick;
> +    /* Shadow call notifier, sent to vhost */
> +    EventNotifier hdev_call;
> +} VhostShadowVirtqueue;
> +
> +/**
> + * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> + * methods and file descriptors.
> + */
> +VhostShadowVirtqueue *vhost_svq_new(void)
> +{
> +    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> +    int r;
> +
> +    r = event_notifier_init(&svq->hdev_kick, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create kick event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_kick;
> +    }
> +
> +    r = event_notifier_init(&svq->hdev_call, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create call event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_call;
> +    }
> +
> +    return g_steal_pointer(&svq);
> +
> +err_init_hdev_call:
> +    event_notifier_cleanup(&svq->hdev_kick);
> +
> +err_init_hdev_kick:
> +    return NULL;
> +}
> +
> +/**
> + * Free the resources of the shadow virtqueue.
> + */
> +void vhost_svq_free(VhostShadowVirtqueue *vq)
> +{
> +    event_notifier_cleanup(&vq->hdev_kick);
> +    event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq);
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 521f7d64a8..2dc87613bc 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>   
>   virtio_ss = ss.source_set()
>   virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))


I wonder if we need a dedicated config option for shadow virtqueue.

Thanks


>   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
  2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
@ 2022-01-28  6:02   ` Jason Wang
  2022-01-21 20:27 ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Eugenio Pérez
                     ` (30 subsequent siblings)
  31 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:02 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's emulated virtio device
> operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only support packed vring and not split and old guests with
> no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> This version of SVQ is limited in the amount of features it can use with
> guest and device, because this series is already very big otherwise.
> Features like indirect or event_idx will be addressed in future series.
>
> SVQ needs to be enabled with cmdline parameter x-svq, like:
>
> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
>
> In this version it cannot be enabled or disabled in runtime. Further
> series will remove this limitation and will enable it only for migration
> time.
>
> Some patches are intentionally very small to ease review, but they can
> be squashed if preferred.
>
> Patches 1-10 prepares the SVQ and QEMU to support both guest to device
> and device to guest notifications forwarding, with the extra qemu hop.
> That part can be tested in isolation if cmdline change is reproduced.
>
> Patches from 11 to 18 implement the actual buffer forwarding, but with
> no IOMMU support. It requires a vdpa device capable of addressing all
> qemu vaddr.
>
> Patches 19 to 23 adds the iommu support, so the device with address
> range limitations can access SVQ through this new virtual address space
> created.
>
> The rest of the series add the last pieces needed for migration.
>
> Comments are welcome.


I wonder the performance impact. So performance numbers are more than 
welcomed.

Thanks


>
> TODO:
> * Event, indirect, packed, and other features of virtio.
> * To separate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support virtio-net control vq.
> * Proper documentation.
>
> Changes from v5 RFC:
> * Remove dynamic enablement of SVQ, making less dependent of the device.
> * Enable live migration if SVQ is enabled.
> * Fix SVQ when driver reset.
> * Comments addressed, specially in the iova area.
> * Rebase on latest master, adding multiqueue support (but no networking
>    control vq processing).
> v5 link:
> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (31):
>    vdpa: Reorder virtio/vhost-vdpa.c functions
>    vhost: Add VhostShadowVirtqueue
>    vdpa: Add vhost_svq_get_dev_kick_notifier
>    vdpa: Add vhost_svq_set_svq_kick_fd
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Route guest->host notification through shadow virtqueue
>    vhost: dd vhost_svq_get_svq_call_notifier
>    vhost: Add vhost_svq_set_guest_call_notifier
>    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>    vhost: Route host->guest notification through shadow virtqueue
>    vhost: Add vhost_svq_valid_device_features to shadow vq
>    vhost: Add vhost_svq_valid_guest_features to shadow vq
>    vhost: Add vhost_svq_ack_guest_features to shadow vq
>    virtio: Add vhost_shadow_vq_get_vring_addr
>    vdpa: Add vhost_svq_get_num
>    vhost: pass queue index to vhost_vq_get_addr
>    vdpa: adapt vhost_ops callbacks to svq
>    vhost: Shadow virtqueue buffers forwarding
>    utils: Add internal DMAMap to iova-tree
>    util: Store DMA entries in a list
>    util: Add iova_tree_alloc
>    vhost: Add VhostIOVATree
>    vdpa: Add custom IOTLB translations to SVQ
>    vhost: Add vhost_svq_get_last_used_idx
>    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>    vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
>    vdpa: Never set log_base addr if SVQ is enabled
>    vdpa: Expose VHOST_F_LOG_ALL on SVQ
>    vdpa: Make ncs autofree
>    vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>
>   qapi/net.json                      |   5 +-
>   hw/virtio/vhost-iova-tree.h        |  27 +
>   hw/virtio/vhost-shadow-virtqueue.h |  46 ++
>   include/hw/virtio/vhost-vdpa.h     |   7 +
>   include/qemu/iova-tree.h           |  17 +
>   hw/virtio/vhost-iova-tree.c        | 157 ++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
>   hw/virtio/vhost.c                  |   6 +-
>   net/vhost-vdpa.c                   |  58 ++-
>   util/iova-tree.c                   | 161 +++++-
>   hw/virtio/meson.build              |   2 +-
>   12 files changed, 1852 insertions(+), 135 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
@ 2022-01-28  6:02   ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:02 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's emulated virtio device
> operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only support packed vring and not split and old guests with
> no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> This version of SVQ is limited in the amount of features it can use with
> guest and device, because this series is already very big otherwise.
> Features like indirect or event_idx will be addressed in future series.
>
> SVQ needs to be enabled with cmdline parameter x-svq, like:
>
> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
>
> In this version it cannot be enabled or disabled in runtime. Further
> series will remove this limitation and will enable it only for migration
> time.
>
> Some patches are intentionally very small to ease review, but they can
> be squashed if preferred.
>
> Patches 1-10 prepares the SVQ and QEMU to support both guest to device
> and device to guest notifications forwarding, with the extra qemu hop.
> That part can be tested in isolation if cmdline change is reproduced.
>
> Patches from 11 to 18 implement the actual buffer forwarding, but with
> no IOMMU support. It requires a vdpa device capable of addressing all
> qemu vaddr.
>
> Patches 19 to 23 adds the iommu support, so the device with address
> range limitations can access SVQ through this new virtual address space
> created.
>
> The rest of the series add the last pieces needed for migration.
>
> Comments are welcome.


I wonder the performance impact. So performance numbers are more than 
welcomed.

Thanks


>
> TODO:
> * Event, indirect, packed, and other features of virtio.
> * To separate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support virtio-net control vq.
> * Proper documentation.
>
> Changes from v5 RFC:
> * Remove dynamic enablement of SVQ, making less dependent of the device.
> * Enable live migration if SVQ is enabled.
> * Fix SVQ when driver reset.
> * Comments addressed, specially in the iova area.
> * Rebase on latest master, adding multiqueue support (but no networking
>    control vq processing).
> v5 link:
> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (31):
>    vdpa: Reorder virtio/vhost-vdpa.c functions
>    vhost: Add VhostShadowVirtqueue
>    vdpa: Add vhost_svq_get_dev_kick_notifier
>    vdpa: Add vhost_svq_set_svq_kick_fd
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Route guest->host notification through shadow virtqueue
>    vhost: dd vhost_svq_get_svq_call_notifier
>    vhost: Add vhost_svq_set_guest_call_notifier
>    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>    vhost: Route host->guest notification through shadow virtqueue
>    vhost: Add vhost_svq_valid_device_features to shadow vq
>    vhost: Add vhost_svq_valid_guest_features to shadow vq
>    vhost: Add vhost_svq_ack_guest_features to shadow vq
>    virtio: Add vhost_shadow_vq_get_vring_addr
>    vdpa: Add vhost_svq_get_num
>    vhost: pass queue index to vhost_vq_get_addr
>    vdpa: adapt vhost_ops callbacks to svq
>    vhost: Shadow virtqueue buffers forwarding
>    utils: Add internal DMAMap to iova-tree
>    util: Store DMA entries in a list
>    util: Add iova_tree_alloc
>    vhost: Add VhostIOVATree
>    vdpa: Add custom IOTLB translations to SVQ
>    vhost: Add vhost_svq_get_last_used_idx
>    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>    vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
>    vdpa: Never set log_base addr if SVQ is enabled
>    vdpa: Expose VHOST_F_LOG_ALL on SVQ
>    vdpa: Make ncs autofree
>    vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>
>   qapi/net.json                      |   5 +-
>   hw/virtio/vhost-iova-tree.h        |  27 +
>   hw/virtio/vhost-shadow-virtqueue.h |  46 ++
>   include/hw/virtio/vhost-vdpa.h     |   7 +
>   include/qemu/iova-tree.h           |  17 +
>   hw/virtio/vhost-iova-tree.c        | 157 ++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
>   hw/virtio/vhost.c                  |   6 +-
>   net/vhost-vdpa.c                   |  58 ++-
>   util/iova-tree.c                   | 161 +++++-
>   hw/virtio/meson.build              |   2 +-
>   12 files changed, 1852 insertions(+), 135 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier
  2022-01-21 20:27 ` [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
@ 2022-01-28  6:03     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:03 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Is needed so vhost-vdpa knows the device's kick event fd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
>   hw/virtio/vhost-shadow-virtqueue.c | 10 +++++++++-
>   2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 61ea112002..400effd9f2 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -11,9 +11,13 @@
>   #define VHOST_SHADOW_VIRTQUEUE_H
>   
>   #include "hw/virtio/vhost.h"
> +#include "qemu/event_notifier.h"


Let's move this part to patch 2.

Thanks


>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> +                                              const VhostShadowVirtqueue *svq);
> +
>   VhostShadowVirtqueue *vhost_svq_new(void);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 5ee7b401cb..bd87110073 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,7 +11,6 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   #include "qemu/error-report.h"
> -#include "qemu/event_notifier.h"
>   
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
> @@ -21,6 +20,15 @@ typedef struct VhostShadowVirtqueue {
>       EventNotifier hdev_call;
>   } VhostShadowVirtqueue;
>   
> +/**
> + * The notifier that SVQ will use to notify the device.
> + */
> +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> +                                               const VhostShadowVirtqueue *svq)
> +{
> +    return &svq->hdev_kick;
> +}
> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier
@ 2022-01-28  6:03     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:03 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Is needed so vhost-vdpa knows the device's kick event fd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
>   hw/virtio/vhost-shadow-virtqueue.c | 10 +++++++++-
>   2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 61ea112002..400effd9f2 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -11,9 +11,13 @@
>   #define VHOST_SHADOW_VIRTQUEUE_H
>   
>   #include "hw/virtio/vhost.h"
> +#include "qemu/event_notifier.h"


Let's move this part to patch 2.

Thanks


>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> +                                              const VhostShadowVirtqueue *svq);
> +
>   VhostShadowVirtqueue *vhost_svq_new(void);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 5ee7b401cb..bd87110073 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,7 +11,6 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   #include "qemu/error-report.h"
> -#include "qemu/event_notifier.h"
>   
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
> @@ -21,6 +20,15 @@ typedef struct VhostShadowVirtqueue {
>       EventNotifier hdev_call;
>   } VhostShadowVirtqueue;
>   
> +/**
> + * The notifier that SVQ will use to notify the device.
> + */
> +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> +                                               const VhostShadowVirtqueue *svq)
> +{
> +    return &svq->hdev_kick;
> +}
> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
  2022-01-21 20:27 ` [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
@ 2022-01-28  6:29     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:29 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This function allows the vhost-vdpa backend to override kick_fd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
>   2 files changed, 46 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 400effd9f2..a56ecfc09d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,7 @@
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index bd87110073..21534bc94d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,6 +11,7 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   #include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
>   
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
>       EventNotifier hdev_kick;
>       /* Shadow call notifier, sent to vhost */
>       EventNotifier hdev_call;
> +
> +    /*
> +     * Borrowed virtqueue's guest to host notifier.
> +     * To borrow it in this event notifier allows to register on the event
> +     * loop and access the associated shadow virtqueue easily. If we use the
> +     * VirtQueue, we don't have an easy way to retrieve it.
> +     *
> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> +     */
> +    EventNotifier svq_kick;
>   } VhostShadowVirtqueue;
>   
> +#define INVALID_SVQ_KICK_FD -1
> +
>   /**
>    * The notifier that SVQ will use to notify the device.
>    */
> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/**
> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
> + *
> + * @svq          The svq
> + * @svq_kick_fd  The new svq kick fd
> + */
> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> +{
> +    EventNotifier tmp;
> +    bool check_old = INVALID_SVQ_KICK_FD !=
> +                     event_notifier_get_fd(&svq->svq_kick);
> +
> +    if (check_old) {
> +        event_notifier_set_handler(&svq->svq_kick, NULL);
> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
> +    }


It looks to me we don't do similar things in vhost-net. Any reason for 
caring about the old svq_kick?


> +
> +    /*
> +     * event_notifier_set_handler already checks for guest's notifications if
> +     * they arrive to the new file descriptor in the switch, so there is no
> +     * need to explicitely check for them.
> +     */
> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> +
> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
> +        event_notifier_set(&svq->hdev_kick);


Any reason we need to kick the device directly here?

Thanks


> +    }
> +}
> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.
> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>           goto err_init_hdev_call;
>       }
>   
> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> +
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
@ 2022-01-28  6:29     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:29 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This function allows the vhost-vdpa backend to override kick_fd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
>   2 files changed, 46 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 400effd9f2..a56ecfc09d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,7 @@
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index bd87110073..21534bc94d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,6 +11,7 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   #include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
>   
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
>       EventNotifier hdev_kick;
>       /* Shadow call notifier, sent to vhost */
>       EventNotifier hdev_call;
> +
> +    /*
> +     * Borrowed virtqueue's guest to host notifier.
> +     * To borrow it in this event notifier allows to register on the event
> +     * loop and access the associated shadow virtqueue easily. If we use the
> +     * VirtQueue, we don't have an easy way to retrieve it.
> +     *
> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> +     */
> +    EventNotifier svq_kick;
>   } VhostShadowVirtqueue;
>   
> +#define INVALID_SVQ_KICK_FD -1
> +
>   /**
>    * The notifier that SVQ will use to notify the device.
>    */
> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/**
> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
> + *
> + * @svq          The svq
> + * @svq_kick_fd  The new svq kick fd
> + */
> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> +{
> +    EventNotifier tmp;
> +    bool check_old = INVALID_SVQ_KICK_FD !=
> +                     event_notifier_get_fd(&svq->svq_kick);
> +
> +    if (check_old) {
> +        event_notifier_set_handler(&svq->svq_kick, NULL);
> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
> +    }


It looks to me we don't do similar things in vhost-net. Any reason for 
caring about the old svq_kick?


> +
> +    /*
> +     * event_notifier_set_handler already checks for guest's notifications if
> +     * they arrive to the new file descriptor in the switch, so there is no
> +     * need to explicitely check for them.
> +     */
> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> +
> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
> +        event_notifier_set(&svq->hdev_kick);


Any reason we need to kick the device directly here?

Thanks


> +    }
> +}
> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.
> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>           goto err_init_hdev_call;
>       }
>   
> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> +
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities
  2022-01-21 20:27 ` [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
@ 2022-01-28  6:32     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:32 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> At this mode no buffer forwarding will be performed in SVQ mode: Qemu
> will just forward the guest's kicks to the device.
>
> Also, host notifiers must be disabled at SVQ start, and they will not
> start if SVQ has been enabled when the device is stopped. This will be
> addressed in next patches.


We need to disable host_notifier_mr as well, otherwise guest may touch 
the hardware doorbell directly without going through eventfd.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 27 ++++++++++++++++++++++++++-
>   2 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index a56ecfc09d..4c583a9171 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -19,6 +19,8 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   
> +void vhost_svq_stop(VhostShadowVirtqueue *svq);
> +
>   VhostShadowVirtqueue *vhost_svq_new(void);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 21534bc94d..8991f0b3c3 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -42,11 +42,26 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/* Forward guest notifications */
> +static void vhost_handle_guest_kick(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +
> +    if (unlikely(!event_notifier_test_and_clear(n))) {
> +        return;
> +    }
> +
> +    event_notifier_set(&svq->hdev_kick);
> +}
> +
>   /**
>    * Set a new file descriptor for the guest to kick SVQ and notify for avail
>    *
>    * @svq          The svq
> - * @svq_kick_fd  The new svq kick fd
> + * @svq_kick_fd  The svq kick fd
> + *
> + * Note that SVQ will never close the old file descriptor.
>    */
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   {
> @@ -65,12 +80,22 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> +    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
>   
>       if (!check_old || event_notifier_test_and_clear(&tmp)) {
>           event_notifier_set(&svq->hdev_kick);
>       }
>   }
>   
> +/**
> + * Stop shadow virtqueue operation.
> + * @svq Shadow Virtqueue
> + */
> +void vhost_svq_stop(VhostShadowVirtqueue *svq)
> +{
> +    event_notifier_set_handler(&svq->svq_kick, NULL);
> +}


This function is not used in the patch.

Thanks


> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities
@ 2022-01-28  6:32     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:32 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> At this mode no buffer forwarding will be performed in SVQ mode: Qemu
> will just forward the guest's kicks to the device.
>
> Also, host notifiers must be disabled at SVQ start, and they will not
> start if SVQ has been enabled when the device is stopped. This will be
> addressed in next patches.


We need to disable host_notifier_mr as well, otherwise guest may touch 
the hardware doorbell directly without going through eventfd.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 27 ++++++++++++++++++++++++++-
>   2 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index a56ecfc09d..4c583a9171 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -19,6 +19,8 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   
> +void vhost_svq_stop(VhostShadowVirtqueue *svq);
> +
>   VhostShadowVirtqueue *vhost_svq_new(void);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 21534bc94d..8991f0b3c3 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -42,11 +42,26 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/* Forward guest notifications */
> +static void vhost_handle_guest_kick(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +
> +    if (unlikely(!event_notifier_test_and_clear(n))) {
> +        return;
> +    }
> +
> +    event_notifier_set(&svq->hdev_kick);
> +}
> +
>   /**
>    * Set a new file descriptor for the guest to kick SVQ and notify for avail
>    *
>    * @svq          The svq
> - * @svq_kick_fd  The new svq kick fd
> + * @svq_kick_fd  The svq kick fd
> + *
> + * Note that SVQ will never close the old file descriptor.
>    */
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   {
> @@ -65,12 +80,22 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> +    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
>   
>       if (!check_old || event_notifier_test_and_clear(&tmp)) {
>           event_notifier_set(&svq->hdev_kick);
>       }
>   }
>   
> +/**
> + * Stop shadow virtqueue operation.
> + * @svq Shadow Virtqueue
> + */
> +void vhost_svq_stop(VhostShadowVirtqueue *svq)
> +{
> +    event_notifier_set_handler(&svq->svq_kick, NULL);
> +}


This function is not used in the patch.

Thanks


> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
  2022-01-21 20:27 ` [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
@ 2022-01-28  6:56     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:56 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> At this moment no buffer forwarding will be performed in SVQ mode: Qemu
> just forward the guest's kicks to the device. This commit also set up
> SVQs in the vhost device.
>
> Host memory notifiers regions are left out for simplicity, and they will
> not be addressed in this series.


I wonder if it's better to squash this into patch 5 since it gives us a 
full guest->host forwarding.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-vdpa.h |   4 ++
>   hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
>   2 files changed, 124 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 3ce79a646d..009a9f3b6b 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -12,6 +12,8 @@
>   #ifndef HW_VIRTIO_VHOST_VDPA_H
>   #define HW_VIRTIO_VHOST_VDPA_H
>   
> +#include <gmodule.h>
> +
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
>       bool iotlb_batch_begin_sent;
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
> +    bool shadow_vqs_enabled;
> +    GPtrArray *shadow_vqs;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 6c10a7f05f..18de14f0fb 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -17,12 +17,14 @@
>   #include "hw/virtio/vhost.h"
>   #include "hw/virtio/vhost-backend.h"
>   #include "hw/virtio/virtio-net.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost-vdpa.h"
>   #include "exec/address-spaces.h"
>   #include "qemu/main-loop.h"
>   #include "cpu.h"
>   #include "trace.h"
>   #include "qemu-common.h"
> +#include "qapi/error.h"
>   
>   /*
>    * Return one past the end of the end of section. Be careful with uint64_t
> @@ -409,8 +411,14 @@ err:
>   
>   static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int i;
>   
> +    if (v->shadow_vqs_enabled) {
> +        /* SVQ is not compatible with host notifiers mr */


I guess there should be a TODO or FIXME here.


> +        return;
> +    }
> +
>       for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
>           if (vhost_vdpa_host_notifier_init(dev, i)) {
>               goto err;
> @@ -424,6 +432,17 @@ err:
>       return;
>   }
>   
> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    size_t idx;
> +
> +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
> +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
> +    }
> +    g_ptr_array_free(v->shadow_vqs, true);
> +}
> +
>   static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>   {
>       struct vhost_vdpa *v;
> @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>       trace_vhost_vdpa_cleanup(dev, v);
>       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       memory_listener_unregister(&v->listener);
> +    vhost_vdpa_svq_cleanup(dev);
>   
>       dev->opaque = NULL;
>       ram_block_discard_disable(false);
> @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
>   
>   static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int ret;
>       uint8_t status = 0;
>   
> +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +        vhost_svq_stop(svq);
> +    }
> +
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
>       return ret;
> @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>       return ret;
>   }
>   
> -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> -                                       struct vhost_vring_file *file)
> +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
> +                                         struct vhost_vring_file *file)
>   {
>       trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
>   }
>   
> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> +                                       struct vhost_vring_file *file)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +
> +    if (v->shadow_vqs_enabled) {
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> +        vhost_svq_set_svq_kick_fd(svq, file->fd);
> +        return 0;
> +    } else {
> +        return vhost_vdpa_set_vring_dev_kick(dev, file);
> +    }
> +}
> +
>   static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>                                          struct vhost_vring_file *file)
>   {
> @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +/**
> + * Set shadow virtqueue descriptors to the device
> + *
> + * @dev   The vhost device model
> + * @svq   The shadow virtqueue
> + * @idx   The index of the virtqueue in the vhost device
> + */
> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> +                                VhostShadowVirtqueue *svq,
> +                                unsigned idx)
> +{
> +    struct vhost_vring_file file = {
> +        .index = dev->vq_index + idx,
> +    };
> +    const EventNotifier *event_notifier;
> +    int r;
> +
> +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);


A question, any reason for making VhostShadowVirtqueue private? If we 
export it in .h we don't need helper to access its member like 
vhost_svq_get_dev_kick_notifier().

Note that vhost_dev is a public structure.


> +    file.fd = event_notifier_get_fd(event_notifier);
> +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> +    if (unlikely(r != 0)) {
> +        error_report("Can't set device kick fd (%d)", -r);
> +    }


I wonder whether or not we can generalize the logic here and 
vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the 
vhost_ops->set_vring_kick().


> +
> +    return r == 0;
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> +            if (unlikely(!ok)) {
> +                return -1;
> +            }
> +        }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +/**
> + * Adaptor function to free shadow virtqueue through gpointer
> + *
> + * @svq   The Shadow Virtqueue
> + */
> +static void vhost_psvq_free(gpointer svq)
> +{
> +    vhost_svq_free(svq);
> +}


Any reason for such indirection? Can we simply use vhost_svq_free()?

Thanks


> +
> +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> +                               Error **errp)
> +{
> +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
> +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
> +                                                           vhost_psvq_free);
> +    if (!v->shadow_vqs_enabled) {
> +        goto out;
> +    }
> +
> +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
> +        VhostShadowVirtqueue *svq = vhost_svq_new();
> +
> +        if (unlikely(!svq)) {
> +            error_setg(errp, "Cannot create svq %u", n);
> +            return -1;
> +        }
> +        g_ptr_array_add(v->shadow_vqs, svq);
> +    }
> +
> +out:
> +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
> +    return 0;
> +}
> +
>   static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>   {
>       struct vhost_vdpa *v;
> @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>       dev->opaque =  opaque ;
>       v->listener = vhost_vdpa_memory_listener;
>       v->msg_type = VHOST_IOTLB_MSG_V2;
> +    ret = vhost_vdpa_init_svq(dev, v, errp);
> +    if (ret) {
> +        goto err;
> +    }
>   
>       vhost_vdpa_get_iova_range(v);
>   
> @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>                                  VIRTIO_CONFIG_S_DRIVER);
>   
>       return 0;
> +
> +err:
> +    ram_block_discard_disable(false);
> +    return ret;
>   }
>   
>   const VhostOps vdpa_ops = {

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
@ 2022-01-28  6:56     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-28  6:56 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> At this moment no buffer forwarding will be performed in SVQ mode: Qemu
> just forward the guest's kicks to the device. This commit also set up
> SVQs in the vhost device.
>
> Host memory notifiers regions are left out for simplicity, and they will
> not be addressed in this series.


I wonder if it's better to squash this into patch 5 since it gives us a 
full guest->host forwarding.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-vdpa.h |   4 ++
>   hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
>   2 files changed, 124 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 3ce79a646d..009a9f3b6b 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -12,6 +12,8 @@
>   #ifndef HW_VIRTIO_VHOST_VDPA_H
>   #define HW_VIRTIO_VHOST_VDPA_H
>   
> +#include <gmodule.h>
> +
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
>       bool iotlb_batch_begin_sent;
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
> +    bool shadow_vqs_enabled;
> +    GPtrArray *shadow_vqs;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 6c10a7f05f..18de14f0fb 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -17,12 +17,14 @@
>   #include "hw/virtio/vhost.h"
>   #include "hw/virtio/vhost-backend.h"
>   #include "hw/virtio/virtio-net.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost-vdpa.h"
>   #include "exec/address-spaces.h"
>   #include "qemu/main-loop.h"
>   #include "cpu.h"
>   #include "trace.h"
>   #include "qemu-common.h"
> +#include "qapi/error.h"
>   
>   /*
>    * Return one past the end of the end of section. Be careful with uint64_t
> @@ -409,8 +411,14 @@ err:
>   
>   static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int i;
>   
> +    if (v->shadow_vqs_enabled) {
> +        /* SVQ is not compatible with host notifiers mr */


I guess there should be a TODO or FIXME here.


> +        return;
> +    }
> +
>       for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
>           if (vhost_vdpa_host_notifier_init(dev, i)) {
>               goto err;
> @@ -424,6 +432,17 @@ err:
>       return;
>   }
>   
> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    size_t idx;
> +
> +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
> +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
> +    }
> +    g_ptr_array_free(v->shadow_vqs, true);
> +}
> +
>   static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>   {
>       struct vhost_vdpa *v;
> @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>       trace_vhost_vdpa_cleanup(dev, v);
>       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       memory_listener_unregister(&v->listener);
> +    vhost_vdpa_svq_cleanup(dev);
>   
>       dev->opaque = NULL;
>       ram_block_discard_disable(false);
> @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
>   
>   static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int ret;
>       uint8_t status = 0;
>   
> +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +        vhost_svq_stop(svq);
> +    }
> +
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
>       return ret;
> @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>       return ret;
>   }
>   
> -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> -                                       struct vhost_vring_file *file)
> +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
> +                                         struct vhost_vring_file *file)
>   {
>       trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
>   }
>   
> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> +                                       struct vhost_vring_file *file)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +
> +    if (v->shadow_vqs_enabled) {
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> +        vhost_svq_set_svq_kick_fd(svq, file->fd);
> +        return 0;
> +    } else {
> +        return vhost_vdpa_set_vring_dev_kick(dev, file);
> +    }
> +}
> +
>   static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>                                          struct vhost_vring_file *file)
>   {
> @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +/**
> + * Set shadow virtqueue descriptors to the device
> + *
> + * @dev   The vhost device model
> + * @svq   The shadow virtqueue
> + * @idx   The index of the virtqueue in the vhost device
> + */
> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> +                                VhostShadowVirtqueue *svq,
> +                                unsigned idx)
> +{
> +    struct vhost_vring_file file = {
> +        .index = dev->vq_index + idx,
> +    };
> +    const EventNotifier *event_notifier;
> +    int r;
> +
> +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);


A question, any reason for making VhostShadowVirtqueue private? If we 
export it in .h we don't need helper to access its member like 
vhost_svq_get_dev_kick_notifier().

Note that vhost_dev is a public structure.


> +    file.fd = event_notifier_get_fd(event_notifier);
> +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> +    if (unlikely(r != 0)) {
> +        error_report("Can't set device kick fd (%d)", -r);
> +    }


I wonder whether or not we can generalize the logic here and 
vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the 
vhost_ops->set_vring_kick().


> +
> +    return r == 0;
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> +            if (unlikely(!ok)) {
> +                return -1;
> +            }
> +        }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +/**
> + * Adaptor function to free shadow virtqueue through gpointer
> + *
> + * @svq   The Shadow Virtqueue
> + */
> +static void vhost_psvq_free(gpointer svq)
> +{
> +    vhost_svq_free(svq);
> +}


Any reason for such indirection? Can we simply use vhost_svq_free()?

Thanks


> +
> +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> +                               Error **errp)
> +{
> +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
> +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
> +                                                           vhost_psvq_free);
> +    if (!v->shadow_vqs_enabled) {
> +        goto out;
> +    }
> +
> +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
> +        VhostShadowVirtqueue *svq = vhost_svq_new();
> +
> +        if (unlikely(!svq)) {
> +            error_setg(errp, "Cannot create svq %u", n);
> +            return -1;
> +        }
> +        g_ptr_array_add(v->shadow_vqs, svq);
> +    }
> +
> +out:
> +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
> +    return 0;
> +}
> +
>   static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>   {
>       struct vhost_vdpa *v;
> @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>       dev->opaque =  opaque ;
>       v->listener = vhost_vdpa_memory_listener;
>       v->msg_type = VHOST_IOTLB_MSG_V2;
> +    ret = vhost_vdpa_init_svq(dev, v, errp);
> +    if (ret) {
> +        goto err;
> +    }
>   
>       vhost_vdpa_get_iova_range(v);
>   
> @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>                                  VIRTIO_CONFIG_S_DRIVER);
>   
>       return 0;
> +
> +err:
> +    ram_block_discard_disable(false);
> +    return ret;
>   }
>   
>   const VhostOps vdpa_ops = {



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-28  5:55                   ` Jason Wang
  (?)
@ 2022-01-28  7:48                   ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-28  7:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 6:56 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/28 上午11:57, Peter Xu 写道:
> > On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
> >> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
> >>> On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> >>>> So I think that the first step to remove complexity from the old one
> >>>> is to remove iova_begin and iova_end.
> >>>>
> >>>> As Jason points out, removing iova_end is easier. It has the drawback
> >>>> of having to traverse all the list beyond iova_end, but a well formed
> >>>> iova tree should contain none. If the guest can manipulate it, it's
> >>>> only hurting itself adding nodes to it.
> >>>>
> >>>> It's possible to extract the check for hole_right (or this in Jason's
> >>>> proposal) as a special case too.
> >>>>
> >>>> But removing the iova_begin parameter is more complicated. We cannot
> >>>> know if it's a valid hole without knowing iova_begin, and we cannot
> >>>> resume traversing. Could we assume iova_begin will always be 0? I
> >>>> think not, the vdpa device can return anything through syscall.
> >>> Frankly I don't know what's the syscall you're talking about,
> >> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
> >> range of iova addresses. We get a pair of uint64_t from it, that
> >> indicates the minimum and maximum iova address the device (or iommu)
> >> supports.
> >>
> >> We must allocate iova ranges within that address range, which
> >> complicates this algorithm a little bit. Since the SVQ iova addresses
> >> are not GPA, qemu needs extra code to be able to allocate and free
> >> them, creating a new custom iova as.
> >>
> >> Please let me know if you want more details or if you prefer me to
> >> give more context in the patch message.
> > That's good enough, thanks.
> >
> >>> I mean this one:
> >>>
> >>> https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
> >>>
> >>> Though this time I have some comments on the details.
> >>>
> >>> Personally I like that one (probably with some amendment upon the old version)
> >>> more than the current list-based approach.  But I'd like to know your thoughts
> >>> too (including Jason).  I'll further comment in that thread soon.
> >>>
> >> Sure, I'm fine with whatever solution we choose, but I'm just running
> >> out of ideas to simplify it. Reading your suggestions on old RFC now.
> >>
> >> Overall I feel list-based one is both more convenient and easy to
> >> delete when qemu raises the minimal glib version, but it adds a lot
> >> more code.
> >>
> >> It could add less code with this less elegant changes:
> >> * If we just put the list entry in the DMAMap itself, although it
> >> exposes unneeded implementation details.
> >> * We force the iova tree either to be an allocation-based or an
> >> insertion-based, but not both. In other words, you can only either use
> >> iova_tree_alloc or iova_tree_insert on the same tree.
>
>
> This seems an odd API I must say :(
>
>
> > Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
> > with either way; it shouldn't block the rest of the code.  It'll be good if
> > Jason or Michael share their preferences too.
>
>
> (Havne't gone through the code deeply)
>
> I wonder how about just copy-paste gtree_node_first|last()? A quick
> google told me it's not complicated.
>

Both GTree and GTreeNode definitions are not part of the ABI of glib.
I think the ustream code has not changed its layout for a long time
but still it's allowed to do so in the future.

Having said that, they use a list internally to traverse the nodes,
with node->left and node->right instead of TAILQ entries. These
patches duplicate that intrusive list in DMAMap entries, and make them
private so other parts of qemu are not affected.

Thanks!

> Thanks
>
>
> >
> >> I have a few tests to check the algorithms, but they are not in the
> >> qemu test format. I will post them so we all can understand better
> >> what is expected from this.
> > Sure.  Thanks.
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
  2022-01-28  5:59     ` Jason Wang
  (?)
@ 2022-01-28  7:57     ` Eugenio Perez Martin
  2022-02-21  7:31         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-28  7:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 6:59 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > vhost_vdpa_set_features and vhost_vdpa_init need to use
> > vhost_vdpa_get_features in svq mode.
> >
> > vhost_vdpa_dev_start needs to use almost all _set_ functions:
> > vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
> > vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
> >
> > No functional change intended.
>
>
> Is it related (a must) to the SVQ code?
>

Yes, SVQ needs to access the device variants to configure it, while
exposing the SVQ ones.

For example for set_features, SVQ needs to set device features in the
start code, but expose SVQ ones to the guest.

Another possibility is to forward-declare them but I feel it pollutes
the code more, doesn't it? Is there any reason to avoid the reordering
beyond reducing the number of changes/patches?

Thanks!


> Thanks
>
>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
> >   1 file changed, 82 insertions(+), 82 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 04ea43704f..6c10a7f05f 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
> >       return v->index != 0;
> >   }
> >
> > -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> > -{
> > -    struct vhost_vdpa *v;
> > -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > -    trace_vhost_vdpa_init(dev, opaque);
> > -    int ret;
> > -
> > -    /*
> > -     * Similar to VFIO, we end up pinning all guest memory and have to
> > -     * disable discarding of RAM.
> > -     */
> > -    ret = ram_block_discard_disable(true);
> > -    if (ret) {
> > -        error_report("Cannot set discarding of RAM broken");
> > -        return ret;
> > -    }
> > -
> > -    v = opaque;
> > -    v->dev = dev;
> > -    dev->opaque =  opaque ;
> > -    v->listener = vhost_vdpa_memory_listener;
> > -    v->msg_type = VHOST_IOTLB_MSG_V2;
> > -
> > -    vhost_vdpa_get_iova_range(v);
> > -
> > -    if (vhost_vdpa_one_time_request(dev)) {
> > -        return 0;
> > -    }
> > -
> > -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > -                               VIRTIO_CONFIG_S_DRIVER);
> > -
> > -    return 0;
> > -}
> > -
> >   static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
> >                                               int queue_index)
> >   {
> > @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
> >       return 0;
> >   }
> >
> > -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > -                                   uint64_t features)
> > -{
> > -    int ret;
> > -
> > -    if (vhost_vdpa_one_time_request(dev)) {
> > -        return 0;
> > -    }
> > -
> > -    trace_vhost_vdpa_set_features(dev, features);
> > -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> > -    if (ret) {
> > -        return ret;
> > -    }
> > -
> > -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> > -}
> > -
> >   static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >   {
> >       uint64_t features;
> > @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
> >       return ret;
> >    }
> >
> > -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > -{
> > -    struct vhost_vdpa *v = dev->opaque;
> > -    trace_vhost_vdpa_dev_start(dev, started);
> > -
> > -    if (started) {
> > -        vhost_vdpa_host_notifiers_init(dev);
> > -        vhost_vdpa_set_vring_ready(dev);
> > -    } else {
> > -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> > -    }
> > -
> > -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > -        return 0;
> > -    }
> > -
> > -    if (started) {
> > -        memory_listener_register(&v->listener, &address_space_memory);
> > -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > -    } else {
> > -        vhost_vdpa_reset_device(dev);
> > -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > -                                   VIRTIO_CONFIG_S_DRIVER);
> > -        memory_listener_unregister(&v->listener);
> > -
> > -        return 0;
> > -    }
> > -}
> > -
> >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> >                                        struct vhost_log *log)
> >   {
> > @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >   }
> >
> > +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    trace_vhost_vdpa_dev_start(dev, started);
> > +
> > +    if (started) {
> > +        vhost_vdpa_host_notifiers_init(dev);
> > +        vhost_vdpa_set_vring_ready(dev);
> > +    } else {
> > +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> > +    }
> > +
> > +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > +        return 0;
> > +    }
> > +
> > +    if (started) {
> > +        memory_listener_register(&v->listener, &address_space_memory);
> > +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +    } else {
> > +        vhost_vdpa_reset_device(dev);
> > +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > +                                   VIRTIO_CONFIG_S_DRIVER);
> > +        memory_listener_unregister(&v->listener);
> > +
> > +        return 0;
> > +    }
> > +}
> > +
> >   static int vhost_vdpa_get_features(struct vhost_dev *dev,
> >                                        uint64_t *features)
> >   {
> > @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
> >       return ret;
> >   }
> >
> > +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > +                                   uint64_t features)
> > +{
> > +    int ret;
> > +
> > +    if (vhost_vdpa_one_time_request(dev)) {
> > +        return 0;
> > +    }
> > +
> > +    trace_vhost_vdpa_set_features(dev, features);
> > +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> > +    if (ret) {
> > +        return ret;
> > +    }
> > +
> > +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> > +}
> > +
> >   static int vhost_vdpa_set_owner(struct vhost_dev *dev)
> >   {
> >       if (vhost_vdpa_one_time_request(dev)) {
> > @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
> >       return true;
> >   }
> >
> > +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> > +{
> > +    struct vhost_vdpa *v;
> > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > +    trace_vhost_vdpa_init(dev, opaque);
> > +    int ret;
> > +
> > +    /*
> > +     * Similar to VFIO, we end up pinning all guest memory and have to
> > +     * disable discarding of RAM.
> > +     */
> > +    ret = ram_block_discard_disable(true);
> > +    if (ret) {
> > +        error_report("Cannot set discarding of RAM broken");
> > +        return ret;
> > +    }
> > +
> > +    v = opaque;
> > +    v->dev = dev;
> > +    dev->opaque =  opaque ;
> > +    v->listener = vhost_vdpa_memory_listener;
> > +    v->msg_type = VHOST_IOTLB_MSG_V2;
> > +
> > +    vhost_vdpa_get_iova_range(v);
> > +
> > +    if (vhost_vdpa_one_time_request(dev)) {
> > +        return 0;
> > +    }
> > +
> > +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > +                               VIRTIO_CONFIG_S_DRIVER);
> > +
> > +    return 0;
> > +}
> > +
> >   const VhostOps vdpa_ops = {
> >           .backend_type = VHOST_BACKEND_TYPE_VDPA,
> >           .vhost_backend_init = vhost_vdpa_init,
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 02/31] vhost: Add VhostShadowVirtqueue
  2022-01-28  6:00     ` Jason Wang
  (?)
@ 2022-01-28  8:10     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-28  8:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 7:00 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
> > notifications and buffers, allowing qemu to track them. While qemu is
> > forwarding the buffers and virtqueue changes, it is able to commit the
> > memory it's being dirtied, the same way regular qemu's VirtIO devices
> > do.
> >
> > This commit only exposes basic SVQ allocation and free. Next patches of
> > the series add functionality like notifications and buffers forwarding.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
> >   hw/virtio/meson.build              |  2 +-
> >   3 files changed, 86 insertions(+), 1 deletion(-)
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > new file mode 100644
> > index 0000000000..61ea112002
> > --- /dev/null
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -0,0 +1,21 @@
> > +/*
> > + * vhost shadow virtqueue
> > + *
> > + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#ifndef VHOST_SHADOW_VIRTQUEUE_H
> > +#define VHOST_SHADOW_VIRTQUEUE_H
> > +
> > +#include "hw/virtio/vhost.h"
> > +
> > +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > +
> > +VhostShadowVirtqueue *vhost_svq_new(void);
> > +
> > +void vhost_svq_free(VhostShadowVirtqueue *vq);
> > +
> > +#endif
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > new file mode 100644
> > index 0000000000..5ee7b401cb
> > --- /dev/null
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -0,0 +1,64 @@
> > +/*
> > + * vhost shadow virtqueue
> > + *
> > + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/virtio/vhost-shadow-virtqueue.h"
> > +
> > +#include "qemu/error-report.h"
> > +#include "qemu/event_notifier.h"
> > +
> > +/* Shadow virtqueue to relay notifications */
> > +typedef struct VhostShadowVirtqueue {
> > +    /* Shadow kick notifier, sent to vhost */
> > +    EventNotifier hdev_kick;
> > +    /* Shadow call notifier, sent to vhost */
> > +    EventNotifier hdev_call;
> > +} VhostShadowVirtqueue;
> > +
> > +/**
> > + * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > + * methods and file descriptors.
> > + */
> > +VhostShadowVirtqueue *vhost_svq_new(void)
> > +{
> > +    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > +    int r;
> > +
> > +    r = event_notifier_init(&svq->hdev_kick, 0);
> > +    if (r != 0) {
> > +        error_report("Couldn't create kick event notifier: %s",
> > +                     strerror(errno));
> > +        goto err_init_hdev_kick;
> > +    }
> > +
> > +    r = event_notifier_init(&svq->hdev_call, 0);
> > +    if (r != 0) {
> > +        error_report("Couldn't create call event notifier: %s",
> > +                     strerror(errno));
> > +        goto err_init_hdev_call;
> > +    }
> > +
> > +    return g_steal_pointer(&svq);
> > +
> > +err_init_hdev_call:
> > +    event_notifier_cleanup(&svq->hdev_kick);
> > +
> > +err_init_hdev_kick:
> > +    return NULL;
> > +}
> > +
> > +/**
> > + * Free the resources of the shadow virtqueue.
> > + */
> > +void vhost_svq_free(VhostShadowVirtqueue *vq)
> > +{
> > +    event_notifier_cleanup(&vq->hdev_kick);
> > +    event_notifier_cleanup(&vq->hdev_call);
> > +    g_free(vq);
> > +}
> > diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> > index 521f7d64a8..2dc87613bc 100644
> > --- a/hw/virtio/meson.build
> > +++ b/hw/virtio/meson.build
> > @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
> >
> >   virtio_ss = ss.source_set()
> >   virtio_ss.add(files('virtio.c'))
> > -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
> > +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
>
>
> I wonder if we need a dedicated config option for shadow virtqueue.
>

I'd say that the changes are isolated enough and require no external
library dependencies so the gain is little. But it can be done with an
explicit enable/disable for sure.

Thanks!

> Thanks
>
>
> >   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
> >   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
> >   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier
  2022-01-21 20:27 ` [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier Eugenio Pérez
@ 2022-01-29  7:57     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  7:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This allows vhost-vdpa device to retrieve device -> svq call eventfd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


What did 'dd' mean in the title?

Thanks


> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 4c583a9171..a78234b52b 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
> +const EventNotifier *vhost_svq_get_svq_call_notifier(
> +                                              const VhostShadowVirtqueue *svq);
>   
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 8991f0b3c3..25fcdf16ec 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -55,6 +55,18 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> +/**
> + * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> + * exists pending used buffers.
> + *
> + * @svq Shadow Virtqueue
> + */
> +const EventNotifier *vhost_svq_get_svq_call_notifier(
> +                                               const VhostShadowVirtqueue *svq)
> +{
> +    return &svq->hdev_call;
> +}
> +
>   /**
>    * Set a new file descriptor for the guest to kick SVQ and notify for avail
>    *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier
@ 2022-01-29  7:57     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  7:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This allows vhost-vdpa device to retrieve device -> svq call eventfd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


What did 'dd' mean in the title?

Thanks


> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 4c583a9171..a78234b52b 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
> +const EventNotifier *vhost_svq_get_svq_call_notifier(
> +                                              const VhostShadowVirtqueue *svq);
>   
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 8991f0b3c3..25fcdf16ec 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -55,6 +55,18 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> +/**
> + * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> + * exists pending used buffers.
> + *
> + * @svq Shadow Virtqueue
> + */
> +const EventNotifier *vhost_svq_get_svq_call_notifier(
> +                                               const VhostShadowVirtqueue *svq)
> +{
> +    return &svq->hdev_call;
> +}
> +
>   /**
>    * Set a new file descriptor for the guest to kick SVQ and notify for avail
>    *



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-01-21 20:27 ` [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
@ 2022-01-29  8:05     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 18de14f0fb..029f98feee 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>       }
>   }
>   
> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> -                                       struct vhost_vring_file *file)
> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> +                                         struct vhost_vring_file *file)
>   {
>       trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> +                                     struct vhost_vring_file *file)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> +
> +        vhost_svq_set_guest_call_notifier(svq, file->fd);


Two questions here (had similar questions for vring kick):

1) Any reason that we setup the eventfd for vhost-vdpa in 
vhost_vdpa_svq_setup() not here?

2) The call could be disabled by using -1 as the fd, I don't see any 
code to deal with that.

Thanks


> +        return 0;
> +    } else {
> +        return vhost_vdpa_set_vring_dev_call(dev, file);
> +    }
> +}
> +
>   /**
>    * Set shadow virtqueue descriptors to the device
>    *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
@ 2022-01-29  8:05     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 18de14f0fb..029f98feee 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>       }
>   }
>   
> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> -                                       struct vhost_vring_file *file)
> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> +                                         struct vhost_vring_file *file)
>   {
>       trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> +                                     struct vhost_vring_file *file)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> +
> +        vhost_svq_set_guest_call_notifier(svq, file->fd);


Two questions here (had similar questions for vring kick):

1) Any reason that we setup the eventfd for vhost-vdpa in 
vhost_vdpa_svq_setup() not here?

2) The call could be disabled by using -1 as the fd, I don't see any 
code to deal with that.

Thanks


> +        return 0;
> +    } else {
> +        return vhost_vdpa_set_vring_dev_call(dev, file);
> +    }
> +}
> +
>   /**
>    * Set shadow virtqueue descriptors to the device
>    *



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-01-21 20:27 ` [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
@ 2022-01-29  8:11     ` Jason Wang
  2022-02-26  9:11   ` Liuxiangdong via
  1 sibling, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:11 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This allows SVQ to negotiate features with the device. For the device,
> SVQ is a driver. While this function needs to bypass all non-transport
> features, it needs to disable the features that SVQ does not support
> when forwarding buffers. This includes packed vq layout, indirect
> descriptors or event idx.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
>   3 files changed, 67 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index c9ffa11fce..d963867a04 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,8 @@
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +bool vhost_svq_valid_device_features(uint64_t *features);
> +
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 9619c8082c..51442b3dbf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/**
> + * Validate the transport device features that SVQ can use with the device
> + *
> + * @dev_features  The device features. If success, the acknowledged features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
> +{
> +    bool r = true;
> +
> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
> +         ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
> +
> +        case VIRTIO_F_ACCESS_PLATFORM:
> +            /* SVQ does not know how to translate addresses */


I may miss something but any reason that we need to disable 
ACCESS_PLATFORM? I'd expect the vring helper we used for shadow 
virtqueue can deal with vIOMMU perfectly.


> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +                r = false;
> +            }
> +            break;
> +
> +        case VIRTIO_F_VERSION_1:


I had the same question here.

Thanks


> +            /* SVQ trust that guest vring is little endian */
> +            if (!(*dev_features & BIT_ULL(b))) {
> +                set_bit(b, dev_features);
> +                r = false;
> +            }
> +            continue;
> +
> +        default:
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +            }
> +        }
> +    }
> +
> +    return r;
> +}
> +
>   /* Forward guest notifications */
>   static void vhost_handle_guest_kick(EventNotifier *n)
>   {
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index bdb45c8808..9d801cf907 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>       size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>       g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>                                                              vhost_psvq_free);
> +    uint64_t dev_features;
> +    uint64_t svq_features;
> +    int r;
> +    bool ok;
> +
>       if (!v->shadow_vqs_enabled) {
>           goto out;
>       }
>   
> +    r = vhost_vdpa_get_features(hdev, &dev_features);
> +    if (r != 0) {
> +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
> +        return r;
> +    }
> +
> +    svq_features = dev_features;
> +    ok = vhost_svq_valid_device_features(&svq_features);
> +    if (unlikely(!ok)) {
> +        error_setg(errp,
> +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
> +            hdev->features, svq_features);
> +        return -1;
> +    }
> +
> +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
>           VhostShadowVirtqueue *svq = vhost_svq_new();
>   

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
@ 2022-01-29  8:11     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:11 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This allows SVQ to negotiate features with the device. For the device,
> SVQ is a driver. While this function needs to bypass all non-transport
> features, it needs to disable the features that SVQ does not support
> when forwarding buffers. This includes packed vq layout, indirect
> descriptors or event idx.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
>   3 files changed, 67 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index c9ffa11fce..d963867a04 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,8 @@
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +bool vhost_svq_valid_device_features(uint64_t *features);
> +
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 9619c8082c..51442b3dbf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/**
> + * Validate the transport device features that SVQ can use with the device
> + *
> + * @dev_features  The device features. If success, the acknowledged features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
> +{
> +    bool r = true;
> +
> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
> +         ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
> +
> +        case VIRTIO_F_ACCESS_PLATFORM:
> +            /* SVQ does not know how to translate addresses */


I may miss something but any reason that we need to disable 
ACCESS_PLATFORM? I'd expect the vring helper we used for shadow 
virtqueue can deal with vIOMMU perfectly.


> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +                r = false;
> +            }
> +            break;
> +
> +        case VIRTIO_F_VERSION_1:


I had the same question here.

Thanks


> +            /* SVQ trust that guest vring is little endian */
> +            if (!(*dev_features & BIT_ULL(b))) {
> +                set_bit(b, dev_features);
> +                r = false;
> +            }
> +            continue;
> +
> +        default:
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +            }
> +        }
> +    }
> +
> +    return r;
> +}
> +
>   /* Forward guest notifications */
>   static void vhost_handle_guest_kick(EventNotifier *n)
>   {
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index bdb45c8808..9d801cf907 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>       size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>       g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>                                                              vhost_psvq_free);
> +    uint64_t dev_features;
> +    uint64_t svq_features;
> +    int r;
> +    bool ok;
> +
>       if (!v->shadow_vqs_enabled) {
>           goto out;
>       }
>   
> +    r = vhost_vdpa_get_features(hdev, &dev_features);
> +    if (r != 0) {
> +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
> +        return r;
> +    }
> +
> +    svq_features = dev_features;
> +    ok = vhost_svq_valid_device_features(&svq_features);
> +    if (unlikely(!ok)) {
> +        error_setg(errp,
> +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
> +            hdev->features, svq_features);
> +        return -1;
> +    }
> +
> +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
>           VhostShadowVirtqueue *svq = vhost_svq_new();
>   



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 15/31] vdpa: Add vhost_svq_get_num
  2022-01-21 20:27 ` [PATCH 15/31] vdpa: Add vhost_svq_get_num Eugenio Pérez
@ 2022-01-29  8:14     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:14 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This reports the guest's visible SVQ effective length, not the device's
> one.


I think we need to explain if there could be a case that the SVQ size is 
not equal to the device queue size.

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 5 +++++
>   2 files changed, 6 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 3521e8094d..035207a469 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -29,6 +29,7 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>                                 struct vhost_vring_addr *addr);
> +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 0f2c2403ff..f129ec8395 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -212,6 +212,11 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>       addr->used_user_addr = (uint64_t)svq->vring.used;
>   }
>   
> +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq)
> +{
> +    return svq->vring.num;
> +}
> +
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
>   {
>       size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 15/31] vdpa: Add vhost_svq_get_num
@ 2022-01-29  8:14     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:14 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This reports the guest's visible SVQ effective length, not the device's
> one.


I think we need to explain if there could be a case that the SVQ size is 
not equal to the device queue size.

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 5 +++++
>   2 files changed, 6 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 3521e8094d..035207a469 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -29,6 +29,7 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>                                 struct vhost_vring_addr *addr);
> +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 0f2c2403ff..f129ec8395 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -212,6 +212,11 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>       addr->used_user_addr = (uint64_t)svq->vring.used;
>   }
>   
> +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq)
> +{
> +    return svq->vring.num;
> +}
> +
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
>   {
>       size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
  2022-01-21 20:27 ` [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr Eugenio Pérez
@ 2022-01-29  8:20     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:20 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Doing that way allows vhost backend to know what address to return.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 7b03efccec..64b955ba0c 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>                                       struct vhost_virtqueue *vq,
>                                       unsigned idx, bool enable_log)
>   {
> -    struct vhost_vring_addr addr;
> +    struct vhost_vring_addr addr = {
> +        .index = idx,
> +    };
>       int r;
> -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
>   
>       if (dev->vhost_ops->vhost_vq_get_addr) {
>           r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
> @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>           addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
>           addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
>       }


I'm a bit lost in the logic above, any reason we need call 
vhost_vq_get_addr() :) ?

Thanks


> -    addr.index = idx;
>       addr.log_guest_addr = vq->used_phys;
>       addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>       r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
@ 2022-01-29  8:20     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-29  8:20 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Doing that way allows vhost backend to know what address to return.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 7b03efccec..64b955ba0c 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>                                       struct vhost_virtqueue *vq,
>                                       unsigned idx, bool enable_log)
>   {
> -    struct vhost_vring_addr addr;
> +    struct vhost_vring_addr addr = {
> +        .index = idx,
> +    };
>       int r;
> -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
>   
>       if (dev->vhost_ops->vhost_vq_get_addr) {
>           r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
> @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>           addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
>           addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
>       }


I'm a bit lost in the logic above, any reason we need call 
vhost_vq_get_addr() :) ?

Thanks


> -    addr.index = idx;
>       addr.log_guest_addr = vq->used_phys;
>       addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>       r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier
  2022-01-29  7:57     ` Jason Wang
  (?)
@ 2022-01-29 17:49     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-29 17:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sat, Jan 29, 2022 at 8:57 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > This allows vhost-vdpa device to retrieve device -> svq call eventfd.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
> What did 'dd' mean in the title?
>

It was intended to be "add" but I missed the first letter. I will fix
for the next version.

Thanks!

> Thanks
>
>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> >   hw/virtio/vhost-shadow-virtqueue.c | 12 ++++++++++++
> >   2 files changed, 14 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 4c583a9171..a78234b52b 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >                                                 const VhostShadowVirtqueue *svq);
> > +const EventNotifier *vhost_svq_get_svq_call_notifier(
> > +                                              const VhostShadowVirtqueue *svq);
> >
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 8991f0b3c3..25fcdf16ec 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -55,6 +55,18 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> >       event_notifier_set(&svq->hdev_kick);
> >   }
> >
> > +/**
> > + * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> > + * exists pending used buffers.
> > + *
> > + * @svq Shadow Virtqueue
> > + */
> > +const EventNotifier *vhost_svq_get_svq_call_notifier(
> > +                                               const VhostShadowVirtqueue *svq)
> > +{
> > +    return &svq->hdev_call;
> > +}
> > +
> >   /**
> >    * Set a new file descriptor for the guest to kick SVQ and notify for avail
> >    *
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-01-21 20:27 ` [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
@ 2022-01-30  4:03     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  4:03 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> First half of the buffers forwarding part, preparing vhost-vdpa
> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> this is effectively dead code at the moment, but it helps to reduce
> patch size.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>   hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>   hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>   3 files changed, 143 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 035207a469..39aef5ffdf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(void);
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index f129ec8395..7c168075d7 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.
> + *
> + * @qsize Shadow VirtQueue size
> + *
> + * Returns the new virtqueue or NULL.
> + *
> + * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(void)
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>   {
> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> +    size_t device_size, driver_size;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
>   
> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>       /* Placeholder descriptor, it should be deleted at set_kick_fd */
>       event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>   
> +    svq->vring.num = qsize;


I wonder if this is the best. E.g some hardware can support up to 32K 
queue size. So this will probably end up with:

1) SVQ use 32K queue size
2) hardware queue uses 256

? Or we SVQ can stick to 256 but this will this cause trouble if we want 
to add event index support?


> +    driver_size = vhost_svq_driver_area_size(svq);
> +    device_size = vhost_svq_device_area_size(svq);
> +    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
> +    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> +    memset(svq->vring.desc, 0, driver_size);
> +    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> +    memset(svq->vring.used, 0, device_size);
> +
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -318,5 +335,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    qemu_vfree(vq->vring.desc);
> +    qemu_vfree(vq->vring.used);
>       g_free(vq);
>   }
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 9d801cf907..53e14bafa0 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -641,20 +641,52 @@ static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
>   }
>   
> -static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> -                                      struct vhost_vring_state *ring)
> +static int vhost_vdpa_set_dev_vring_num(struct vhost_dev *dev,
> +                                        struct vhost_vring_state *ring)
>   {
>       trace_vhost_vdpa_set_vring_num(dev, ring->index, ring->num);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
>   }
>   
> -static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> -                                       struct vhost_vring_state *ring)
> +static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> +                                    struct vhost_vring_state *ring)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        /*
> +         * Vring num was set at device start. SVQ num is handled by VirtQueue
> +         * code
> +         */
> +        return 0;
> +    }
> +
> +    return vhost_vdpa_set_dev_vring_num(dev, ring);
> +}
> +
> +static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
> +                                         struct vhost_vring_state *ring)
>   {
>       trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
>   }
>   
> +static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> +                                     struct vhost_vring_state *ring)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        /*
> +         * Vring base was set at device start. SVQ base is handled by VirtQueue
> +         * code
> +         */
> +        return 0;
> +    }
> +
> +    return vhost_vdpa_set_dev_vring_base(dev, ring);
> +}
> +
>   static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
> @@ -784,8 +816,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       }
>   }
>   
> -static int vhost_vdpa_get_features(struct vhost_dev *dev,
> -                                     uint64_t *features)
> +static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
> +                                       uint64_t *features)
>   {
>       int ret;
>   
> @@ -794,15 +826,64 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>       return ret;
>   }
>   
> +static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int ret = vhost_vdpa_get_dev_features(dev, features);
> +
> +    if (ret == 0 && v->shadow_vqs_enabled) {
> +        /* Filter only features that SVQ can offer to guest */
> +        vhost_svq_valid_guest_features(features);
> +    }


Sorry if I've asked before, I think it's sufficient to filter out the 
device features that we don't support during and fail the vhost 
initialization. Any reason we need do it again here?


> +
> +    return ret;
> +}
> +
>   static int vhost_vdpa_set_features(struct vhost_dev *dev,
>                                      uint64_t features)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int ret;
>   
>       if (vhost_vdpa_one_time_request(dev)) {
>           return 0;
>       }
>   
> +    if (v->shadow_vqs_enabled) {
> +        uint64_t dev_features, svq_features, acked_features;
> +        bool ok;
> +
> +        ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> +        if (ret != 0) {
> +            error_report("Can't get vdpa device features, got (%d)", ret);
> +            return ret;
> +        }
> +
> +        svq_features = dev_features;
> +        ok = vhost_svq_valid_device_features(&svq_features);
> +        if (unlikely(!ok)) {
> +            error_report("SVQ Invalid device feature flags, offer: 0x%"
> +                         PRIx64", ok: 0x%"PRIx64, dev->features, svq_features);
> +            return -1;
> +        }
> +
> +        ok = vhost_svq_valid_guest_features(&features);
> +        if (unlikely(!ok)) {
> +            error_report(
> +                "Invalid guest acked feature flag, acked: 0x%"
> +                PRIx64", ok: 0x%"PRIx64, dev->acked_features, features);
> +            return -1;
> +        }
> +
> +        ok = vhost_svq_ack_guest_features(svq_features, features,
> +                                          &acked_features);
> +        if (unlikely(!ok)) {
> +            return -1;
> +        }
> +
> +        features = acked_features;
> +    }
> +
>       trace_vhost_vdpa_set_features(dev, features);
>       ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>       if (ret) {
> @@ -822,13 +903,31 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> -static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> -                    struct vhost_vring_addr *addr, struct vhost_virtqueue *vq)
> +static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
> +                                         struct vhost_virtqueue *vq)
>   {
> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>       addr->desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
>       addr->avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
>       addr->used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
> +}
> +
> +static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> +                                  struct vhost_vring_addr *addr,
> +                                  struct vhost_virtqueue *vq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    if (v->shadow_vqs_enabled) {
> +        int idx = vhost_vdpa_get_vq_index(dev, addr->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> +
> +        vhost_svq_get_vring_addr(svq, addr);
> +    } else {
> +        vhost_vdpa_vq_get_guest_addr(addr, vq);
> +    }
> +
>       trace_vhost_vdpa_vq_get_addr(dev, vq, addr->desc_user_addr,
>                                    addr->avail_user_addr, addr->used_user_addr);
>       return 0;
> @@ -849,6 +948,12 @@ static void vhost_psvq_free(gpointer svq)
>       vhost_svq_free(svq);
>   }
>   
> +static int vhost_vdpa_get_max_queue_size(struct vhost_dev *dev,
> +                                         uint16_t *qsize)
> +{
> +    return vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_NUM, qsize);
> +}
> +
>   static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                  Error **errp)
>   {
> @@ -857,6 +962,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                                              vhost_psvq_free);
>       uint64_t dev_features;
>       uint64_t svq_features;
> +    uint16_t qsize;
>       int r;
>       bool ok;
>   
> @@ -864,7 +970,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>           goto out;
>       }
>   
> -    r = vhost_vdpa_get_features(hdev, &dev_features);
> +    r = vhost_vdpa_get_dev_features(hdev, &dev_features);
>       if (r != 0) {
>           error_setg(errp, "Can't get vdpa device features, got (%d)", r);
>           return r;
> @@ -879,9 +985,14 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>           return -1;
>       }
>   
> +    r = vhost_vdpa_get_max_queue_size(hdev, &qsize);
> +    if (unlikely(r)) {
> +        qsize = 256;
> +    }


Should we fail instead of having a "default" value here?

Thanks


> +
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        VhostShadowVirtqueue *svq = vhost_svq_new();
> +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
>   
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
@ 2022-01-30  4:03     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  4:03 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> First half of the buffers forwarding part, preparing vhost-vdpa
> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> this is effectively dead code at the moment, but it helps to reduce
> patch size.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>   hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>   hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>   3 files changed, 143 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 035207a469..39aef5ffdf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(void);
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index f129ec8395..7c168075d7 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.
> + *
> + * @qsize Shadow VirtQueue size
> + *
> + * Returns the new virtqueue or NULL.
> + *
> + * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(void)
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>   {
> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> +    size_t device_size, driver_size;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
>   
> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>       /* Placeholder descriptor, it should be deleted at set_kick_fd */
>       event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>   
> +    svq->vring.num = qsize;


I wonder if this is the best. E.g some hardware can support up to 32K 
queue size. So this will probably end up with:

1) SVQ use 32K queue size
2) hardware queue uses 256

? Or we SVQ can stick to 256 but this will this cause trouble if we want 
to add event index support?


> +    driver_size = vhost_svq_driver_area_size(svq);
> +    device_size = vhost_svq_device_area_size(svq);
> +    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
> +    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> +    memset(svq->vring.desc, 0, driver_size);
> +    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> +    memset(svq->vring.used, 0, device_size);
> +
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -318,5 +335,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    qemu_vfree(vq->vring.desc);
> +    qemu_vfree(vq->vring.used);
>       g_free(vq);
>   }
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 9d801cf907..53e14bafa0 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -641,20 +641,52 @@ static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
>   }
>   
> -static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> -                                      struct vhost_vring_state *ring)
> +static int vhost_vdpa_set_dev_vring_num(struct vhost_dev *dev,
> +                                        struct vhost_vring_state *ring)
>   {
>       trace_vhost_vdpa_set_vring_num(dev, ring->index, ring->num);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
>   }
>   
> -static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> -                                       struct vhost_vring_state *ring)
> +static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> +                                    struct vhost_vring_state *ring)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        /*
> +         * Vring num was set at device start. SVQ num is handled by VirtQueue
> +         * code
> +         */
> +        return 0;
> +    }
> +
> +    return vhost_vdpa_set_dev_vring_num(dev, ring);
> +}
> +
> +static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
> +                                         struct vhost_vring_state *ring)
>   {
>       trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
>   }
>   
> +static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> +                                     struct vhost_vring_state *ring)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        /*
> +         * Vring base was set at device start. SVQ base is handled by VirtQueue
> +         * code
> +         */
> +        return 0;
> +    }
> +
> +    return vhost_vdpa_set_dev_vring_base(dev, ring);
> +}
> +
>   static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
> @@ -784,8 +816,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       }
>   }
>   
> -static int vhost_vdpa_get_features(struct vhost_dev *dev,
> -                                     uint64_t *features)
> +static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
> +                                       uint64_t *features)
>   {
>       int ret;
>   
> @@ -794,15 +826,64 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>       return ret;
>   }
>   
> +static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int ret = vhost_vdpa_get_dev_features(dev, features);
> +
> +    if (ret == 0 && v->shadow_vqs_enabled) {
> +        /* Filter only features that SVQ can offer to guest */
> +        vhost_svq_valid_guest_features(features);
> +    }


Sorry if I've asked before, I think it's sufficient to filter out the 
device features that we don't support during and fail the vhost 
initialization. Any reason we need do it again here?


> +
> +    return ret;
> +}
> +
>   static int vhost_vdpa_set_features(struct vhost_dev *dev,
>                                      uint64_t features)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int ret;
>   
>       if (vhost_vdpa_one_time_request(dev)) {
>           return 0;
>       }
>   
> +    if (v->shadow_vqs_enabled) {
> +        uint64_t dev_features, svq_features, acked_features;
> +        bool ok;
> +
> +        ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> +        if (ret != 0) {
> +            error_report("Can't get vdpa device features, got (%d)", ret);
> +            return ret;
> +        }
> +
> +        svq_features = dev_features;
> +        ok = vhost_svq_valid_device_features(&svq_features);
> +        if (unlikely(!ok)) {
> +            error_report("SVQ Invalid device feature flags, offer: 0x%"
> +                         PRIx64", ok: 0x%"PRIx64, dev->features, svq_features);
> +            return -1;
> +        }
> +
> +        ok = vhost_svq_valid_guest_features(&features);
> +        if (unlikely(!ok)) {
> +            error_report(
> +                "Invalid guest acked feature flag, acked: 0x%"
> +                PRIx64", ok: 0x%"PRIx64, dev->acked_features, features);
> +            return -1;
> +        }
> +
> +        ok = vhost_svq_ack_guest_features(svq_features, features,
> +                                          &acked_features);
> +        if (unlikely(!ok)) {
> +            return -1;
> +        }
> +
> +        features = acked_features;
> +    }
> +
>       trace_vhost_vdpa_set_features(dev, features);
>       ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>       if (ret) {
> @@ -822,13 +903,31 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> -static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> -                    struct vhost_vring_addr *addr, struct vhost_virtqueue *vq)
> +static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
> +                                         struct vhost_virtqueue *vq)
>   {
> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>       addr->desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
>       addr->avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
>       addr->used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
> +}
> +
> +static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> +                                  struct vhost_vring_addr *addr,
> +                                  struct vhost_virtqueue *vq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    if (v->shadow_vqs_enabled) {
> +        int idx = vhost_vdpa_get_vq_index(dev, addr->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> +
> +        vhost_svq_get_vring_addr(svq, addr);
> +    } else {
> +        vhost_vdpa_vq_get_guest_addr(addr, vq);
> +    }
> +
>       trace_vhost_vdpa_vq_get_addr(dev, vq, addr->desc_user_addr,
>                                    addr->avail_user_addr, addr->used_user_addr);
>       return 0;
> @@ -849,6 +948,12 @@ static void vhost_psvq_free(gpointer svq)
>       vhost_svq_free(svq);
>   }
>   
> +static int vhost_vdpa_get_max_queue_size(struct vhost_dev *dev,
> +                                         uint16_t *qsize)
> +{
> +    return vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_NUM, qsize);
> +}
> +
>   static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                  Error **errp)
>   {
> @@ -857,6 +962,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                                              vhost_psvq_free);
>       uint64_t dev_features;
>       uint64_t svq_features;
> +    uint16_t qsize;
>       int r;
>       bool ok;
>   
> @@ -864,7 +970,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>           goto out;
>       }
>   
> -    r = vhost_vdpa_get_features(hdev, &dev_features);
> +    r = vhost_vdpa_get_dev_features(hdev, &dev_features);
>       if (r != 0) {
>           error_setg(errp, "Can't get vdpa device features, got (%d)", r);
>           return r;
> @@ -879,9 +985,14 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>           return -1;
>       }
>   
> +    r = vhost_vdpa_get_max_queue_size(hdev, &qsize);
> +    if (unlikely(r)) {
> +        qsize = 256;
> +    }


Should we fail instead of having a "default" value here?

Thanks


> +
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        VhostShadowVirtqueue *svq = vhost_svq_new();
> +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
>   
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-01-21 20:27 ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
@ 2022-01-30  4:42     ` Jason Wang
  2022-01-30  6:46     ` Jason Wang
  1 sibling, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  4:42 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> is no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review.
>
> However, forwarding buffers have some particular pieces: One of the most
> unexpected ones is that a guest's buffer can expand through more than
> one descriptor in SVQ. While this is handled gracefully by qemu's
> emulated virtio devices, it may cause unexpected SVQ queue full. This
> patch also solves it by checking for this condition at both guest's
> kicks and device's calls. The code may be more elegant in the future if
> SVQ code runs in its own iocontext.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   2 +
>   hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
>   hw/virtio/vhost-vdpa.c             | 111 ++++++++-
>   3 files changed, 462 insertions(+), 16 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 39aef5ffdf..19c934af49 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> +                     VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
>   VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 7c168075d7..a1a404f68f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,6 +9,8 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Guest's call notifier, where SVQ calls guest. */
>       EventNotifier svq_call;
> +
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
> +
> +    /* Virtio device */
> +    VirtIODevice *vdev;
> +
> +    /* Map for returning guest's descriptors */
> +    VirtQueueElement **ring_id_maps;
> +
> +    /* Next VirtQueue element that guest made available */
> +    VirtQueueElement *next_guest_avail_elem;
> +
> +    /* Next head to expose to device */
> +    uint16_t avail_idx_shadow;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Last seen used idx */
> +    uint16_t shadow_used_idx;
> +
> +    /* Next head to consume from device */
> +    uint16_t last_used_idx;
> +
> +    /* Cache for the exposed notification flag */
> +    bool notification;
>   } VhostShadowVirtqueue;
>   
>   #define INVALID_SVQ_KICK_FD -1
> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
>       return true;
>   }
>   
> -/* Forward guest notifications */
> -static void vhost_handle_guest_kick(EventNotifier *n)
> +/**
> + * Number of descriptors that SVQ can make available from the guest.
> + *
> + * @svq   The svq
> + */
> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>   {
> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> -                                             svq_kick);
> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> +}
> +
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> +    uint16_t notification_flag;
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> +    if (svq->notification == enable) {
> +        return;
> +    }
> +
> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> +    svq->notification = enable;
> +    if (enable) {
> +        svq->vring.avail->flags &= ~notification_flag;
> +    } else {
> +        svq->vring.avail->flags |= notification_flag;
> +    }
> +}
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    const struct iovec *iovec,
> +                                    size_t num, bool more_descs, bool write)
> +{
> +    uint16_t i = svq->free_head, last = svq->free_head;
> +    unsigned n;
> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> +    vring_desc_t *descs = svq->vring.desc;
> +
> +    if (num == 0) {
> +        return;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        if (more_descs || (n + 1 < num)) {
> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> +        last = i;
> +        i = cpu_to_le16(descs[i].next);
> +    }
> +
> +    svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                    VirtQueueElement *elem)
> +{
> +    int head;
> +    unsigned avail_idx;
> +    vring_avail_t *avail = svq->vring.avail;
> +
> +    head = svq->free_head;
> +
> +    /* We need some descriptors here */
> +    assert(elem->out_num || elem->in_num);


Looks like this could be triggered by guest, we need fail instead assert 
here.


> +
> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +                            elem->in_num > 0, false);
> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +    /*
> +     * Put entry in available array (but don't update avail->idx until they
> +     * do sync).
> +     */
> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> +    avail->ring[avail_idx] = cpu_to_le16(head);
> +    svq->avail_idx_shadow++;
> +
> +    /* Update avail index after the descriptor is wrote */
> +    smp_wmb();
> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> +    return head;
> +}
> +
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> +    svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* We need to expose available array entries before checking used flags */
> +    smp_mb();
> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>           return;
>       }
>   
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> -/* Forward vhost notifications */
> +/**
> + * Forward available buffers.
> + *
> + * @svq Shadow VirtQueue
> + *
> + * Note that this function does not guarantee that all guest's available
> + * buffers are available to the device in SVQ avail ring. The guest may have
> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> + * vaddr.
> + *
> + * If that happens, guest's kick notifications will be disabled until device
> + * makes some buffers used.
> + */
> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(&svq->svq_kick);
> +
> +    /* Make available as many buffers as possible */
> +    do {
> +        if (virtio_queue_get_notification(svq->vq)) {
> +            virtio_queue_set_notification(svq->vq, false);


This looks like an optimization the should belong to 
virtio_queue_set_notification() itself.


> +        }
> +
> +        while (true) {
> +            VirtQueueElement *elem;
> +
> +            if (svq->next_guest_avail_elem) {
> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +            } else {
> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> +            }
> +
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (elem->out_num + elem->in_num >
> +                vhost_svq_available_slots(svq)) {
> +                /*
> +                 * This condition is possible since a contiguous buffer in GPA
> +                 * does not imply a contiguous buffer in qemu's VA
> +                 * scatter-gather segments. If that happen, the buffer exposed
> +                 * to the device needs to be a chain of descriptors at this
> +                 * moment.
> +                 *
> +                 * SVQ cannot hold more available buffers if we are here:
> +                 * queue the current guest descriptor and ignore further kicks
> +                 * until some elements are used.
> +                 */
> +                svq->next_guest_avail_elem = elem;
> +                return;
> +            }
> +
> +            vhost_svq_add(svq, elem);
> +            vhost_svq_kick(svq);
> +        }
> +
> +        virtio_queue_set_notification(svq->vq, true);
> +    } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +/**
> + * Handle guest's kick.
> + *
> + * @n guest kick event notifier, the one that guest set to notify svq.
> + */
> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +    vhost_handle_guest_kick(svq);
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> +        return true;
> +    }
> +
> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> +    return svq->last_used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> +    vring_desc_t *descs = svq->vring.desc;
> +    const vring_used_t *used = svq->vring.used;
> +    vring_used_elem_t used_elem;
> +    uint16_t last_used;
> +
> +    if (!vhost_svq_more_used(svq)) {
> +        return NULL;
> +    }
> +
> +    /* Only get used array entries after they have been exposed by dev */
> +    smp_rmb();
> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> +    svq->last_used_idx++;
> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> +        error_report("Device %s says index %u is used", svq->vdev->name,
> +                     used_elem.id);
> +        return NULL;
> +    }
> +
> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +        error_report(
> +            "Device %s says index %u is used, but it was not available",
> +            svq->vdev->name, used_elem.id);
> +        return NULL;
> +    }
> +
> +    descs[used_elem.id].next = svq->free_head;
> +    svq->free_head = used_elem.id;
> +
> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> +}
> +
> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                            bool check_for_avail_queue)
> +{
> +    VirtQueue *vq = svq->vq;
> +
> +    /* Make as many buffers as possible used. */
> +    do {
> +        unsigned i = 0;
> +
> +        vhost_svq_set_notification(svq, false);
> +        while (true) {
> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (unlikely(i >= svq->vring.num)) {
> +                virtio_error(svq->vdev,
> +                         "More than %u used buffers obtained in a %u size SVQ",
> +                         i, svq->vring.num);
> +                virtqueue_fill(vq, elem, elem->len, i);
> +                virtqueue_flush(vq, i);


Let's simply use virtqueue_push() here?


> +                i = 0;


Do we need to bail out here?


> +            }
> +            virtqueue_fill(vq, elem, elem->len, i++);
> +        }
> +
> +        virtqueue_flush(vq, i);
> +        event_notifier_set(&svq->svq_call);
> +
> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> +            /*
> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> +             * good moment to make more descriptors available if possible
> +             */
> +            vhost_handle_guest_kick(svq);


Is there better to have a similar check as vhost_handle_guest_kick() did?

             if (elem->out_num + elem->in_num >
                 vhost_svq_available_slots(svq)) {


> +        }
> +
> +        vhost_svq_set_notification(svq, true);


A mb() is needed here? Otherwise we may lost a call here (where 
vhost_svq_more_used() is run before vhost_svq_set_notification()).


> +    } while (vhost_svq_more_used(svq));
> +}
> +
> +/**
> + * Forward used buffers.
> + *
> + * @n hdev call event notifier, the one that device set to notify svq.
> + *
> + * Note that we are not making any buffers available in the loop, there is no
> + * way that it runs more than virtqueue size times.
> + */
>   static void vhost_svq_handle_call(EventNotifier *n)
>   {
>       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>                                                hdev_call);
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> -        return;
> -    }
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(n);


Any reason that we remove the above check?


>   
> -    event_notifier_set(&svq->svq_call);
> +    vhost_svq_flush(svq, true);
>   }
>   
>   /**
> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> +    event_notifier_set_handler(&svq->svq_kick,
> +                               vhost_handle_guest_kick_notifier);
>   
>       if (!check_old || event_notifier_test_and_clear(&tmp)) {
>           event_notifier_set(&svq->hdev_kick);
>       }
>   }
>   
> +/**
> + * Start shadow virtqueue operation.
> + *
> + * @svq Shadow Virtqueue
> + * @vdev        VirtIO device
> + * @vq          Virtqueue to shadow
> + */
> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> +                     VirtQueue *vq)
> +{
> +    svq->next_guest_avail_elem = NULL;
> +    svq->avail_idx_shadow = 0;
> +    svq->shadow_used_idx = 0;
> +    svq->last_used_idx = 0;
> +    svq->vdev = vdev;
> +    svq->vq = vq;
> +
> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +    }
> +}
> +
>   /**
>    * Stop shadow virtqueue operation.
>    * @svq Shadow Virtqueue
> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   {
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> +
> +    if (!svq->vq) {
> +        return;
> +    }
> +
> +    /* Send all pending used descriptors to guest */
> +    vhost_svq_flush(svq, false);
> +
> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = NULL;
> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
> +
> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +    if (next_avail_elem) {
> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> +                                 next_avail_elem->len);
> +    }
>   }
>   
>   /**
> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> -
> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq->ring_id_maps);
>       qemu_vfree(vq->vring.desc);
>       qemu_vfree(vq->vring.used);
>       g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 53e14bafa0..0e5c00ed7e 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>    * Note that this function does not rewind kick file descriptor if cannot set
>    * call one.
>    */
> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> -                                VhostShadowVirtqueue *svq,
> -                                unsigned idx)
> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> +                                  VhostShadowVirtqueue *svq,
> +                                  unsigned idx)
>   {
>       struct vhost_vring_file file = {
>           .index = dev->vq_index + idx,
> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>       r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>       if (unlikely(r != 0)) {
>           error_report("Can't set device kick fd (%d)", -r);
> -        return false;
> +        return r;
>       }
>   
>       event_notifier = vhost_svq_get_svq_call_notifier(svq);
> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>           error_report("Can't set device call fd (%d)", -r);
>       }
>   
> +    return r;
> +}
> +
> +/**
> + * Unmap SVQ area in the device
> + */
> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> +                                      hwaddr size)
> +{
> +    int r;
> +
> +    size = ROUND_UP(size, qemu_real_host_page_size);
> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> +    return r == 0;
> +}
> +
> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                       const VhostShadowVirtqueue *svq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_addr svq_addr;
> +    size_t device_size = vhost_svq_device_area_size(svq);
> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> +    bool ok;
> +
> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> +
> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> +}
> +
> +/**
> + * Map shadow virtqueue rings in device
> + *
> + * @dev   The vhost device
> + * @svq   The shadow virtqueue
> + */
> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> +                                     const VhostShadowVirtqueue *svq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_addr svq_addr;
> +    size_t device_size = vhost_svq_device_area_size(svq);
> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> +    int r;
> +
> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> +
> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> +                           (void *)svq_addr.desc_user_addr, true);
> +    if (unlikely(r != 0)) {
> +        return false;
> +    }
> +
> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> +                           (void *)svq_addr.used_user_addr, false);


Do we need unmap the driver area if we fail here?

Thanks


> +    return r == 0;
> +}
> +
> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> +                                VhostShadowVirtqueue *svq,
> +                                unsigned idx)
> +{
> +    uint16_t vq_index = dev->vq_index + idx;
> +    struct vhost_vring_state s = {
> +        .index = vq_index,
> +    };
> +    int r;
> +    bool ok;
> +
> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> +    if (unlikely(r)) {
> +        error_report("Can't set vring base (%d)", r);
> +        return false;
> +    }
> +
> +    s.num = vhost_svq_get_num(svq);
> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> +    if (unlikely(r)) {
> +        error_report("Can't set vring num (%d)", r);
> +        return false;
> +    }
> +
> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
>       return r == 0;
>   }
>   
> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
>           for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>               VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>               bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>               if (unlikely(!ok)) {
>                   return -1;
>               }
> +            vhost_svq_start(svq, dev->vdev, vq);
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> +                                                          i);
> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> +            if (unlikely(!ok)) {
> +                return -1;
> +            }
> +        }
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
>   

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-01-30  4:42     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  4:42 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> is no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review.
>
> However, forwarding buffers have some particular pieces: One of the most
> unexpected ones is that a guest's buffer can expand through more than
> one descriptor in SVQ. While this is handled gracefully by qemu's
> emulated virtio devices, it may cause unexpected SVQ queue full. This
> patch also solves it by checking for this condition at both guest's
> kicks and device's calls. The code may be more elegant in the future if
> SVQ code runs in its own iocontext.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   2 +
>   hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
>   hw/virtio/vhost-vdpa.c             | 111 ++++++++-
>   3 files changed, 462 insertions(+), 16 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 39aef5ffdf..19c934af49 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> +                     VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
>   VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 7c168075d7..a1a404f68f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,6 +9,8 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Guest's call notifier, where SVQ calls guest. */
>       EventNotifier svq_call;
> +
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
> +
> +    /* Virtio device */
> +    VirtIODevice *vdev;
> +
> +    /* Map for returning guest's descriptors */
> +    VirtQueueElement **ring_id_maps;
> +
> +    /* Next VirtQueue element that guest made available */
> +    VirtQueueElement *next_guest_avail_elem;
> +
> +    /* Next head to expose to device */
> +    uint16_t avail_idx_shadow;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Last seen used idx */
> +    uint16_t shadow_used_idx;
> +
> +    /* Next head to consume from device */
> +    uint16_t last_used_idx;
> +
> +    /* Cache for the exposed notification flag */
> +    bool notification;
>   } VhostShadowVirtqueue;
>   
>   #define INVALID_SVQ_KICK_FD -1
> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
>       return true;
>   }
>   
> -/* Forward guest notifications */
> -static void vhost_handle_guest_kick(EventNotifier *n)
> +/**
> + * Number of descriptors that SVQ can make available from the guest.
> + *
> + * @svq   The svq
> + */
> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>   {
> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> -                                             svq_kick);
> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> +}
> +
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> +    uint16_t notification_flag;
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> +    if (svq->notification == enable) {
> +        return;
> +    }
> +
> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> +    svq->notification = enable;
> +    if (enable) {
> +        svq->vring.avail->flags &= ~notification_flag;
> +    } else {
> +        svq->vring.avail->flags |= notification_flag;
> +    }
> +}
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    const struct iovec *iovec,
> +                                    size_t num, bool more_descs, bool write)
> +{
> +    uint16_t i = svq->free_head, last = svq->free_head;
> +    unsigned n;
> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> +    vring_desc_t *descs = svq->vring.desc;
> +
> +    if (num == 0) {
> +        return;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        if (more_descs || (n + 1 < num)) {
> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> +        last = i;
> +        i = cpu_to_le16(descs[i].next);
> +    }
> +
> +    svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                    VirtQueueElement *elem)
> +{
> +    int head;
> +    unsigned avail_idx;
> +    vring_avail_t *avail = svq->vring.avail;
> +
> +    head = svq->free_head;
> +
> +    /* We need some descriptors here */
> +    assert(elem->out_num || elem->in_num);


Looks like this could be triggered by guest, we need fail instead assert 
here.


> +
> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +                            elem->in_num > 0, false);
> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +    /*
> +     * Put entry in available array (but don't update avail->idx until they
> +     * do sync).
> +     */
> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> +    avail->ring[avail_idx] = cpu_to_le16(head);
> +    svq->avail_idx_shadow++;
> +
> +    /* Update avail index after the descriptor is wrote */
> +    smp_wmb();
> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> +    return head;
> +}
> +
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> +    svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* We need to expose available array entries before checking used flags */
> +    smp_mb();
> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>           return;
>       }
>   
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> -/* Forward vhost notifications */
> +/**
> + * Forward available buffers.
> + *
> + * @svq Shadow VirtQueue
> + *
> + * Note that this function does not guarantee that all guest's available
> + * buffers are available to the device in SVQ avail ring. The guest may have
> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> + * vaddr.
> + *
> + * If that happens, guest's kick notifications will be disabled until device
> + * makes some buffers used.
> + */
> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(&svq->svq_kick);
> +
> +    /* Make available as many buffers as possible */
> +    do {
> +        if (virtio_queue_get_notification(svq->vq)) {
> +            virtio_queue_set_notification(svq->vq, false);


This looks like an optimization the should belong to 
virtio_queue_set_notification() itself.


> +        }
> +
> +        while (true) {
> +            VirtQueueElement *elem;
> +
> +            if (svq->next_guest_avail_elem) {
> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +            } else {
> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> +            }
> +
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (elem->out_num + elem->in_num >
> +                vhost_svq_available_slots(svq)) {
> +                /*
> +                 * This condition is possible since a contiguous buffer in GPA
> +                 * does not imply a contiguous buffer in qemu's VA
> +                 * scatter-gather segments. If that happen, the buffer exposed
> +                 * to the device needs to be a chain of descriptors at this
> +                 * moment.
> +                 *
> +                 * SVQ cannot hold more available buffers if we are here:
> +                 * queue the current guest descriptor and ignore further kicks
> +                 * until some elements are used.
> +                 */
> +                svq->next_guest_avail_elem = elem;
> +                return;
> +            }
> +
> +            vhost_svq_add(svq, elem);
> +            vhost_svq_kick(svq);
> +        }
> +
> +        virtio_queue_set_notification(svq->vq, true);
> +    } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +/**
> + * Handle guest's kick.
> + *
> + * @n guest kick event notifier, the one that guest set to notify svq.
> + */
> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +    vhost_handle_guest_kick(svq);
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> +        return true;
> +    }
> +
> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> +    return svq->last_used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> +    vring_desc_t *descs = svq->vring.desc;
> +    const vring_used_t *used = svq->vring.used;
> +    vring_used_elem_t used_elem;
> +    uint16_t last_used;
> +
> +    if (!vhost_svq_more_used(svq)) {
> +        return NULL;
> +    }
> +
> +    /* Only get used array entries after they have been exposed by dev */
> +    smp_rmb();
> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> +    svq->last_used_idx++;
> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> +        error_report("Device %s says index %u is used", svq->vdev->name,
> +                     used_elem.id);
> +        return NULL;
> +    }
> +
> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +        error_report(
> +            "Device %s says index %u is used, but it was not available",
> +            svq->vdev->name, used_elem.id);
> +        return NULL;
> +    }
> +
> +    descs[used_elem.id].next = svq->free_head;
> +    svq->free_head = used_elem.id;
> +
> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> +}
> +
> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                            bool check_for_avail_queue)
> +{
> +    VirtQueue *vq = svq->vq;
> +
> +    /* Make as many buffers as possible used. */
> +    do {
> +        unsigned i = 0;
> +
> +        vhost_svq_set_notification(svq, false);
> +        while (true) {
> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (unlikely(i >= svq->vring.num)) {
> +                virtio_error(svq->vdev,
> +                         "More than %u used buffers obtained in a %u size SVQ",
> +                         i, svq->vring.num);
> +                virtqueue_fill(vq, elem, elem->len, i);
> +                virtqueue_flush(vq, i);


Let's simply use virtqueue_push() here?


> +                i = 0;


Do we need to bail out here?


> +            }
> +            virtqueue_fill(vq, elem, elem->len, i++);
> +        }
> +
> +        virtqueue_flush(vq, i);
> +        event_notifier_set(&svq->svq_call);
> +
> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> +            /*
> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> +             * good moment to make more descriptors available if possible
> +             */
> +            vhost_handle_guest_kick(svq);


Is there better to have a similar check as vhost_handle_guest_kick() did?

             if (elem->out_num + elem->in_num >
                 vhost_svq_available_slots(svq)) {


> +        }
> +
> +        vhost_svq_set_notification(svq, true);


A mb() is needed here? Otherwise we may lost a call here (where 
vhost_svq_more_used() is run before vhost_svq_set_notification()).


> +    } while (vhost_svq_more_used(svq));
> +}
> +
> +/**
> + * Forward used buffers.
> + *
> + * @n hdev call event notifier, the one that device set to notify svq.
> + *
> + * Note that we are not making any buffers available in the loop, there is no
> + * way that it runs more than virtqueue size times.
> + */
>   static void vhost_svq_handle_call(EventNotifier *n)
>   {
>       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>                                                hdev_call);
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> -        return;
> -    }
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(n);


Any reason that we remove the above check?


>   
> -    event_notifier_set(&svq->svq_call);
> +    vhost_svq_flush(svq, true);
>   }
>   
>   /**
> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> +    event_notifier_set_handler(&svq->svq_kick,
> +                               vhost_handle_guest_kick_notifier);
>   
>       if (!check_old || event_notifier_test_and_clear(&tmp)) {
>           event_notifier_set(&svq->hdev_kick);
>       }
>   }
>   
> +/**
> + * Start shadow virtqueue operation.
> + *
> + * @svq Shadow Virtqueue
> + * @vdev        VirtIO device
> + * @vq          Virtqueue to shadow
> + */
> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> +                     VirtQueue *vq)
> +{
> +    svq->next_guest_avail_elem = NULL;
> +    svq->avail_idx_shadow = 0;
> +    svq->shadow_used_idx = 0;
> +    svq->last_used_idx = 0;
> +    svq->vdev = vdev;
> +    svq->vq = vq;
> +
> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +    }
> +}
> +
>   /**
>    * Stop shadow virtqueue operation.
>    * @svq Shadow Virtqueue
> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   {
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> +
> +    if (!svq->vq) {
> +        return;
> +    }
> +
> +    /* Send all pending used descriptors to guest */
> +    vhost_svq_flush(svq, false);
> +
> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = NULL;
> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
> +
> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +    if (next_avail_elem) {
> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> +                                 next_avail_elem->len);
> +    }
>   }
>   
>   /**
> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> -
> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq->ring_id_maps);
>       qemu_vfree(vq->vring.desc);
>       qemu_vfree(vq->vring.used);
>       g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 53e14bafa0..0e5c00ed7e 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>    * Note that this function does not rewind kick file descriptor if cannot set
>    * call one.
>    */
> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> -                                VhostShadowVirtqueue *svq,
> -                                unsigned idx)
> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> +                                  VhostShadowVirtqueue *svq,
> +                                  unsigned idx)
>   {
>       struct vhost_vring_file file = {
>           .index = dev->vq_index + idx,
> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>       r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>       if (unlikely(r != 0)) {
>           error_report("Can't set device kick fd (%d)", -r);
> -        return false;
> +        return r;
>       }
>   
>       event_notifier = vhost_svq_get_svq_call_notifier(svq);
> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>           error_report("Can't set device call fd (%d)", -r);
>       }
>   
> +    return r;
> +}
> +
> +/**
> + * Unmap SVQ area in the device
> + */
> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> +                                      hwaddr size)
> +{
> +    int r;
> +
> +    size = ROUND_UP(size, qemu_real_host_page_size);
> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> +    return r == 0;
> +}
> +
> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                       const VhostShadowVirtqueue *svq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_addr svq_addr;
> +    size_t device_size = vhost_svq_device_area_size(svq);
> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> +    bool ok;
> +
> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> +
> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> +}
> +
> +/**
> + * Map shadow virtqueue rings in device
> + *
> + * @dev   The vhost device
> + * @svq   The shadow virtqueue
> + */
> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> +                                     const VhostShadowVirtqueue *svq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_addr svq_addr;
> +    size_t device_size = vhost_svq_device_area_size(svq);
> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> +    int r;
> +
> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> +
> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> +                           (void *)svq_addr.desc_user_addr, true);
> +    if (unlikely(r != 0)) {
> +        return false;
> +    }
> +
> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> +                           (void *)svq_addr.used_user_addr, false);


Do we need unmap the driver area if we fail here?

Thanks


> +    return r == 0;
> +}
> +
> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> +                                VhostShadowVirtqueue *svq,
> +                                unsigned idx)
> +{
> +    uint16_t vq_index = dev->vq_index + idx;
> +    struct vhost_vring_state s = {
> +        .index = vq_index,
> +    };
> +    int r;
> +    bool ok;
> +
> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> +    if (unlikely(r)) {
> +        error_report("Can't set vring base (%d)", r);
> +        return false;
> +    }
> +
> +    s.num = vhost_svq_get_num(svq);
> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> +    if (unlikely(r)) {
> +        error_report("Can't set vring num (%d)", r);
> +        return false;
> +    }
> +
> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
>       return r == 0;
>   }
>   
> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
>           for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>               VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>               bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>               if (unlikely(!ok)) {
>                   return -1;
>               }
> +            vhost_svq_start(svq, dev->vdev, vq);
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> +                                                          i);
> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> +            if (unlikely(!ok)) {
> +                return -1;
> +            }
> +        }
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
>   



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-24  9:20     ` Eugenio Perez Martin
@ 2022-01-30  5:06         ` Jason Wang
  2022-01-30  5:06         ` Jason Wang
  1 sibling, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  5:06 UTC (permalink / raw)
  To: Eugenio Perez Martin, Peter Xu
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/24 下午5:20, Eugenio Perez Martin 写道:
> On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
>> On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
>>> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
>
>>> +                    hwaddr iova_last)
>>> +{
>>> +    const DMAMapInternal *last, *i;
>>> +
>>> +    assert(iova_begin < iova_last);
>>> +
>>> +    /*
>>> +     * Find a valid hole for the mapping
>>> +     *
>>> +     * TODO: Replace all this with g_tree_node_first/next/last when available
>>> +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
>>> +     *
>>> +     * Try to allocate first at the end of the list.
>>> +     */
>>> +    last = QTAILQ_LAST(&tree->list);
>>> +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
>>> +                                    map->size)) {
>>> +        goto alloc;
>>> +    }
>>> +
>>> +    /* Look for inner hole */
>>> +    last = NULL;
>>> +    for (i = QTAILQ_FIRST(&tree->list); i;
>>> +         last = i, i = QTAILQ_NEXT(i, entry)) {
>>> +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
>>> +                                        map->size)) {
>>> +            goto alloc;
>>> +        }
>>> +    }
>>> +
>>> +    return IOVA_ERR_NOMEM;
>>> +
>>> +alloc:
>>> +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
>>> +    return iova_tree_insert(tree, map);
>>> +}
>> Hi, Eugenio,
>>
>> Have you tried with what Jason suggested previously?
>>
>>    https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
>>
>> That solution still sounds very sensible to me even without the newly
>> introduced list in previous two patches.
>>
>> IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
>> stucture that was passed into the traverse func though, so it'll naturally work
>> with threading.
>>
>> Or is there any blocker for it?
>>
> Hi Peter,
>
> I can try that solution again, but the main problem was the special
> cases of the beginning and ending.
>
> For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> 0} means that it cannot account 0 for the hole.
>
> In other words, with that algorithm, if the only valid hole is [0, N)
> and we try to allocate a block of size N, it would fail.
>
> Same happens with iova_end, although in practice it seems that IOMMU
> hardware iova upper limit is never UINT64_MAX.
>
> Maybe we could treat .size = 0 as a special case?


Yes, the pseudo-code I past is just to show the idea of using 
g_tree_foreach() instead of introducing new auxiliary data structures. 
That will simplify both the codes and the reviewers.

Down the road, we may start from an iova range specified during the 
creation of the iova tree. E.g for vtd, it's the GAW, for vhost-vdpa, 
it's the one that we get from VHOST_VDPA_GET_IOVA_RANGE.

Thanks


> I see cleaner either
> to build the list (but insert needs to take the list into account) or
> to explicitly tell that prev == NULL means to use iova_first.
>
> Another solution that comes to my mind: to add both exceptions outside
> of transverse function, and skip the first iteration with something
> like:
>
> if (prev == NULL) {
>    prev = this;
>    return false /* continue */
> }
>
> So the transverse callback has way less code paths. Would it work for
> you if I send a separate RFC from SVQ only to validate this?
>
> Thanks!
>
>> Thanks,
>> --
>> Peter Xu
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
@ 2022-01-30  5:06         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  5:06 UTC (permalink / raw)
  To: Eugenio Perez Martin, Peter Xu
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/24 下午5:20, Eugenio Perez Martin 写道:
> On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
>> On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
>>> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
>
>>> +                    hwaddr iova_last)
>>> +{
>>> +    const DMAMapInternal *last, *i;
>>> +
>>> +    assert(iova_begin < iova_last);
>>> +
>>> +    /*
>>> +     * Find a valid hole for the mapping
>>> +     *
>>> +     * TODO: Replace all this with g_tree_node_first/next/last when available
>>> +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
>>> +     *
>>> +     * Try to allocate first at the end of the list.
>>> +     */
>>> +    last = QTAILQ_LAST(&tree->list);
>>> +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
>>> +                                    map->size)) {
>>> +        goto alloc;
>>> +    }
>>> +
>>> +    /* Look for inner hole */
>>> +    last = NULL;
>>> +    for (i = QTAILQ_FIRST(&tree->list); i;
>>> +         last = i, i = QTAILQ_NEXT(i, entry)) {
>>> +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
>>> +                                        map->size)) {
>>> +            goto alloc;
>>> +        }
>>> +    }
>>> +
>>> +    return IOVA_ERR_NOMEM;
>>> +
>>> +alloc:
>>> +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
>>> +    return iova_tree_insert(tree, map);
>>> +}
>> Hi, Eugenio,
>>
>> Have you tried with what Jason suggested previously?
>>
>>    https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
>>
>> That solution still sounds very sensible to me even without the newly
>> introduced list in previous two patches.
>>
>> IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
>> stucture that was passed into the traverse func though, so it'll naturally work
>> with threading.
>>
>> Or is there any blocker for it?
>>
> Hi Peter,
>
> I can try that solution again, but the main problem was the special
> cases of the beginning and ending.
>
> For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> 0} means that it cannot account 0 for the hole.
>
> In other words, with that algorithm, if the only valid hole is [0, N)
> and we try to allocate a block of size N, it would fail.
>
> Same happens with iova_end, although in practice it seems that IOMMU
> hardware iova upper limit is never UINT64_MAX.
>
> Maybe we could treat .size = 0 as a special case?


Yes, the pseudo-code I past is just to show the idea of using 
g_tree_foreach() instead of introducing new auxiliary data structures. 
That will simplify both the codes and the reviewers.

Down the road, we may start from an iova range specified during the 
creation of the iova tree. E.g for vtd, it's the GAW, for vhost-vdpa, 
it's the one that we get from VHOST_VDPA_GET_IOVA_RANGE.

Thanks


> I see cleaner either
> to build the list (but insert needs to take the list into account) or
> to explicitly tell that prev == NULL means to use iova_first.
>
> Another solution that comes to my mind: to add both exceptions outside
> of transverse function, and skip the first iteration with something
> like:
>
> if (prev == NULL) {
>    prev = this;
>    return false /* continue */
> }
>
> So the transverse callback has way less code paths. Would it work for
> you if I send a separate RFC from SVQ only to validate this?
>
> Thanks!
>
>> Thanks,
>> --
>> Peter Xu
>>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
  2022-01-21 20:27 ` [PATCH 22/31] vhost: Add VhostIOVATree Eugenio Pérez
@ 2022-01-30  5:21     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  5:21 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This tree is able to look for a translated address from an IOVA address.
>
> At first glance it is similar to util/iova-tree. However, SVQ working on
> devices with limited IOVA space need more capabilities,


So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and 
without RMRRs).


>   like allocating
> IOVA chunks or performing reverse translations (qemu addresses to iova).


This looks like a general request as well. So I wonder if we can simply 
extend iova tree instead.

Thanks


>
> The allocation capability, as "assign a free IOVA address to this chunk
> of memory in qemu's address space" allows shadow virtqueue to create a
> new address space that is not restricted by guest's addressable one, so
> we can allocate shadow vqs vrings outside of it.
>
> It duplicates the tree so it can search efficiently both directions,
> and it will signal overlap if iova or the translated address is
> present in any tree.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-iova-tree.h |  27 +++++++
>   hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
>   hw/virtio/meson.build       |   2 +-
>   3 files changed, 185 insertions(+), 1 deletion(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>
> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> new file mode 100644
> index 0000000000..610394eaf1
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.h
> @@ -0,0 +1,27 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> +
> +#include "qemu/iova-tree.h"
> +#include "exec/memory.h"
> +
> +typedef struct VhostIOVATree VhostIOVATree;
> +
> +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
> +
> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
> +                                        const DMAMap *map);
> +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
> +
> +#endif
> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> new file mode 100644
> index 0000000000..0021dbaf54
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.c
> @@ -0,0 +1,157 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/iova-tree.h"
> +#include "vhost-iova-tree.h"
> +
> +#define iova_min_addr qemu_real_host_page_size
> +
> +/**
> + * VhostIOVATree, able to:
> + * - Translate iova address
> + * - Reverse translate iova address (from translated to iova)
> + * - Allocate IOVA regions for translated range (potentially slow operation)
> + *
> + * Note that it cannot remove nodes.
> + */
> +struct VhostIOVATree {
> +    /* First addresable iova address in the device */
> +    uint64_t iova_first;
> +
> +    /* Last addressable iova address in the device */
> +    uint64_t iova_last;
> +
> +    /* IOVA address to qemu memory maps. */
> +    IOVATree *iova_taddr_map;
> +
> +    /* QEMU virtual memory address to iova maps */
> +    GTree *taddr_iova_map;
> +};
> +
> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
> +                                      gpointer data)
> +{
> +    const DMAMap *m1 = a, *m2 = b;
> +
> +    if (m1->translated_addr > m2->translated_addr + m2->size) {
> +        return 1;
> +    }
> +
> +    if (m1->translated_addr + m1->size < m2->translated_addr) {
> +        return -1;
> +    }
> +
> +    /* Overlapped */
> +    return 0;
> +}
> +
> +/**
> + * Create a new IOVA tree
> + *
> + * Returns the new IOVA tree
> + */
> +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
> +{
> +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
> +
> +    /* Some devices does not like 0 addresses */
> +    tree->iova_first = MAX(iova_first, iova_min_addr);
> +    tree->iova_last = iova_last;
> +
> +    tree->iova_taddr_map = iova_tree_new();
> +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
> +                                           NULL, g_free);
> +    return tree;
> +}
> +
> +/**
> + * Delete an iova tree
> + */
> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
> +{
> +    iova_tree_destroy(iova_tree->iova_taddr_map);
> +    g_tree_unref(iova_tree->taddr_iova_map);
> +    g_free(iova_tree);
> +}
> +
> +/**
> + * Find the IOVA address stored from a memory address
> + *
> + * @tree     The iova tree
> + * @map      The map with the memory address
> + *
> + * Return the stored mapping, or NULL if not found.
> + */
> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
> +                                        const DMAMap *map)
> +{
> +    return g_tree_lookup(tree->taddr_iova_map, map);
> +}
> +
> +/**
> + * Allocate a new mapping
> + *
> + * @tree  The iova tree
> + * @map   The iova map
> + *
> + * Returns:
> + * - IOVA_OK if the map fits in the container
> + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
> + * - IOVA_ERR_OVERLAP if the tree already contains that map
> + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
> + *
> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
> + */
> +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
> +{
> +    /* Some vhost devices does not like addr 0. Skip first page */
> +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
> +    DMAMap *new;
> +    int r;
> +
> +    if (map->translated_addr + map->size < map->translated_addr ||
> +        map->perm == IOMMU_NONE) {
> +        return IOVA_ERR_INVALID;
> +    }
> +
> +    /* Check for collisions in translated addresses */
> +    if (vhost_iova_tree_find_iova(tree, map)) {
> +        return IOVA_ERR_OVERLAP;
> +    }
> +
> +    /* Allocate a node in IOVA address */
> +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
> +                        tree->iova_last);
> +    if (r != IOVA_OK) {
> +        return r;
> +    }
> +
> +    /* Allocate node in qemu -> iova translations */
> +    new = g_malloc(sizeof(*new));
> +    memcpy(new, map, sizeof(*new));
> +    g_tree_insert(tree->taddr_iova_map, new, new);
> +    return IOVA_OK;
> +}
> +
> +/**
> + * Remove existing mappings from iova tree
> + *
> + * @param  iova_tree  The vhost iova tree
> + * @param  map        The map to remove
> + */
> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
> +{
> +    const DMAMap *overlap;
> +
> +    iova_tree_remove(iova_tree->iova_taddr_map, map);
> +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
> +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
> +    }
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 2dc87613bc..6047670804 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>   
>   virtio_ss = ss.source_set()
>   virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
@ 2022-01-30  5:21     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  5:21 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This tree is able to look for a translated address from an IOVA address.
>
> At first glance it is similar to util/iova-tree. However, SVQ working on
> devices with limited IOVA space need more capabilities,


So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and 
without RMRRs).


>   like allocating
> IOVA chunks or performing reverse translations (qemu addresses to iova).


This looks like a general request as well. So I wonder if we can simply 
extend iova tree instead.

Thanks


>
> The allocation capability, as "assign a free IOVA address to this chunk
> of memory in qemu's address space" allows shadow virtqueue to create a
> new address space that is not restricted by guest's addressable one, so
> we can allocate shadow vqs vrings outside of it.
>
> It duplicates the tree so it can search efficiently both directions,
> and it will signal overlap if iova or the translated address is
> present in any tree.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-iova-tree.h |  27 +++++++
>   hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
>   hw/virtio/meson.build       |   2 +-
>   3 files changed, 185 insertions(+), 1 deletion(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>
> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> new file mode 100644
> index 0000000000..610394eaf1
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.h
> @@ -0,0 +1,27 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> +
> +#include "qemu/iova-tree.h"
> +#include "exec/memory.h"
> +
> +typedef struct VhostIOVATree VhostIOVATree;
> +
> +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
> +
> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
> +                                        const DMAMap *map);
> +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
> +
> +#endif
> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> new file mode 100644
> index 0000000000..0021dbaf54
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.c
> @@ -0,0 +1,157 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/iova-tree.h"
> +#include "vhost-iova-tree.h"
> +
> +#define iova_min_addr qemu_real_host_page_size
> +
> +/**
> + * VhostIOVATree, able to:
> + * - Translate iova address
> + * - Reverse translate iova address (from translated to iova)
> + * - Allocate IOVA regions for translated range (potentially slow operation)
> + *
> + * Note that it cannot remove nodes.
> + */
> +struct VhostIOVATree {
> +    /* First addresable iova address in the device */
> +    uint64_t iova_first;
> +
> +    /* Last addressable iova address in the device */
> +    uint64_t iova_last;
> +
> +    /* IOVA address to qemu memory maps. */
> +    IOVATree *iova_taddr_map;
> +
> +    /* QEMU virtual memory address to iova maps */
> +    GTree *taddr_iova_map;
> +};
> +
> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
> +                                      gpointer data)
> +{
> +    const DMAMap *m1 = a, *m2 = b;
> +
> +    if (m1->translated_addr > m2->translated_addr + m2->size) {
> +        return 1;
> +    }
> +
> +    if (m1->translated_addr + m1->size < m2->translated_addr) {
> +        return -1;
> +    }
> +
> +    /* Overlapped */
> +    return 0;
> +}
> +
> +/**
> + * Create a new IOVA tree
> + *
> + * Returns the new IOVA tree
> + */
> +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
> +{
> +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
> +
> +    /* Some devices does not like 0 addresses */
> +    tree->iova_first = MAX(iova_first, iova_min_addr);
> +    tree->iova_last = iova_last;
> +
> +    tree->iova_taddr_map = iova_tree_new();
> +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
> +                                           NULL, g_free);
> +    return tree;
> +}
> +
> +/**
> + * Delete an iova tree
> + */
> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
> +{
> +    iova_tree_destroy(iova_tree->iova_taddr_map);
> +    g_tree_unref(iova_tree->taddr_iova_map);
> +    g_free(iova_tree);
> +}
> +
> +/**
> + * Find the IOVA address stored from a memory address
> + *
> + * @tree     The iova tree
> + * @map      The map with the memory address
> + *
> + * Return the stored mapping, or NULL if not found.
> + */
> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
> +                                        const DMAMap *map)
> +{
> +    return g_tree_lookup(tree->taddr_iova_map, map);
> +}
> +
> +/**
> + * Allocate a new mapping
> + *
> + * @tree  The iova tree
> + * @map   The iova map
> + *
> + * Returns:
> + * - IOVA_OK if the map fits in the container
> + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
> + * - IOVA_ERR_OVERLAP if the tree already contains that map
> + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
> + *
> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
> + */
> +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
> +{
> +    /* Some vhost devices does not like addr 0. Skip first page */
> +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
> +    DMAMap *new;
> +    int r;
> +
> +    if (map->translated_addr + map->size < map->translated_addr ||
> +        map->perm == IOMMU_NONE) {
> +        return IOVA_ERR_INVALID;
> +    }
> +
> +    /* Check for collisions in translated addresses */
> +    if (vhost_iova_tree_find_iova(tree, map)) {
> +        return IOVA_ERR_OVERLAP;
> +    }
> +
> +    /* Allocate a node in IOVA address */
> +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
> +                        tree->iova_last);
> +    if (r != IOVA_OK) {
> +        return r;
> +    }
> +
> +    /* Allocate node in qemu -> iova translations */
> +    new = g_malloc(sizeof(*new));
> +    memcpy(new, map, sizeof(*new));
> +    g_tree_insert(tree->taddr_iova_map, new, new);
> +    return IOVA_OK;
> +}
> +
> +/**
> + * Remove existing mappings from iova tree
> + *
> + * @param  iova_tree  The vhost iova tree
> + * @param  map        The map to remove
> + */
> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
> +{
> +    const DMAMap *overlap;
> +
> +    iova_tree_remove(iova_tree->iova_taddr_map, map);
> +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
> +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
> +    }
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 2dc87613bc..6047670804 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>   
>   virtio_ss = ss.source_set()
>   virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
  2022-01-21 20:27 ` [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
@ 2022-01-30  5:57     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  5:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Use translations added in VhostIOVATree in SVQ.
>
> Only introduce usage here, not allocation and deallocation. As with
> previous patches, we use the dead code paths of shadow_vqs_enabled to
> avoid commiting too many changes at once. These are impossible to take
> at the moment.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   3 +-
>   include/hw/virtio/vhost-vdpa.h     |   3 +
>   hw/virtio/vhost-shadow-virtqueue.c | 111 ++++++++++++++++----
>   hw/virtio/vhost-vdpa.c             | 161 +++++++++++++++++++++++++----
>   4 files changed, 238 insertions(+), 40 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 19c934af49..c6f67d6f76 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -12,6 +12,7 @@
>   
>   #include "hw/virtio/vhost.h"
>   #include "qemu/event_notifier.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> @@ -37,7 +38,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>                        VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_map);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
>   
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 009a9f3b6b..cd2388b3be 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -14,6 +14,7 @@
>   
>   #include <gmodule.h>
>   
> +#include "hw/virtio/vhost-iova-tree.h"
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
>       bool shadow_vqs_enabled;
> +    /* IOVA mapping used by Shadow Virtqueue */
> +    VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index a1a404f68f..c7888eb8cf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,6 +11,7 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost.h"
>   #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -45,6 +46,9 @@ typedef struct VhostShadowVirtqueue {
>       /* Virtio device */
>       VirtIODevice *vdev;
>   
> +    /* IOVA mapping */
> +    VhostIOVATree *iova_tree;
> +
>       /* Map for returning guest's descriptors */
>       VirtQueueElement **ring_id_maps;
>   
> @@ -97,13 +101,7 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
>               continue;
>   
>           case VIRTIO_F_ACCESS_PLATFORM:
> -            /* SVQ does not know how to translate addresses */
> -            if (*dev_features & BIT_ULL(b)) {
> -                clear_bit(b, dev_features);
> -                r = false;
> -            }
> -            break;
> -
> +            /* SVQ trust in host's IOMMU to translate addresses */
>           case VIRTIO_F_VERSION_1:
>               /* SVQ trust that guest vring is little endian */
>               if (!(*dev_features & BIT_ULL(b))) {
> @@ -205,7 +203,55 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
>       }
>   }
>   
> +/**
> + * Translate addresses between qemu's virtual address and SVQ IOVA
> + *
> + * @svq    Shadow VirtQueue
> + * @vaddr  Translated IOVA addresses
> + * @iovec  Source qemu's VA addresses
> + * @num    Length of iovec and minimum length of vaddr
> + */
> +static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> +                                     void **addrs, const struct iovec *iovec,
> +                                     size_t num)
> +{
> +    size_t i;
> +
> +    if (num == 0) {
> +        return true;
> +    }
> +
> +    for (i = 0; i < num; ++i) {
> +        DMAMap needle = {
> +            .translated_addr = (hwaddr)iovec[i].iov_base,
> +            .size = iovec[i].iov_len,
> +        };
> +        size_t off;
> +
> +        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
> +        /*
> +         * Map cannot be NULL since iova map contains all guest space and
> +         * qemu already has a physical address mapped
> +         */
> +        if (unlikely(!map)) {
> +            error_report("Invalid address 0x%"HWADDR_PRIx" given by guest",
> +                         needle.translated_addr);


This can be triggered by guest, we need use once or log_guest_error() etc.


> +            return false;
> +        }
> +
> +        /*
> +         * Map->iova chunk size is ignored. What to do if descriptor
> +         * (addr, size) does not fit is delegated to the device.
> +         */


I think we need at least check the size and fail if the size doesn't 
match here. Or is it possible that we have a buffer that may cross two 
memory regions?


> +        off = needle.translated_addr - map->translated_addr;
> +        addrs[i] = (void *)(map->iova + off);
> +    }
> +
> +    return true;
> +}
> +
>   static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    void * const *vaddr_sg,
>                                       const struct iovec *iovec,
>                                       size_t num, bool more_descs, bool write)
>   {
> @@ -224,7 +270,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>           } else {
>               descs[i].flags = flags;
>           }
> -        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].addr = cpu_to_le64((hwaddr)vaddr_sg[n]);
>           descs[i].len = cpu_to_le32(iovec[n].iov_len);
>   
>           last = i;
> @@ -234,42 +280,60 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>       svq->free_head = le16_to_cpu(descs[last].next);
>   }
>   
> -static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -                                    VirtQueueElement *elem)
> +static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                VirtQueueElement *elem,
> +                                unsigned *head)


I'd suggest to make it returns bool since the patch that introduces this 
function.


>   {
> -    int head;
>       unsigned avail_idx;
>       vring_avail_t *avail = svq->vring.avail;
> +    bool ok;
> +    g_autofree void **sgs = g_new(void *, MAX(elem->out_num, elem->in_num));
>   
> -    head = svq->free_head;
> +    *head = svq->free_head;
>   
>       /* We need some descriptors here */
>       assert(elem->out_num || elem->in_num);
>   
> -    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
>                               elem->in_num > 0, false);
> -    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +
> +    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
>   
>       /*
>        * Put entry in available array (but don't update avail->idx until they
>        * do sync).
>        */
>       avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> -    avail->ring[avail_idx] = cpu_to_le16(head);
> +    avail->ring[avail_idx] = cpu_to_le16(*head);
>       svq->avail_idx_shadow++;
>   
>       /* Update avail index after the descriptor is wrote */
>       smp_wmb();
>       avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>   
> -    return head;
> +    return true;
>   }
>   
> -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>   {
> -    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +    unsigned qemu_head;
> +    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
>   
>       svq->ring_id_maps[qemu_head] = elem;
> +    return true;
>   }
>   
>   static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> @@ -309,6 +373,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   
>           while (true) {
>               VirtQueueElement *elem;
> +            bool ok;
>   
>               if (svq->next_guest_avail_elem) {
>                   elem = g_steal_pointer(&svq->next_guest_avail_elem);
> @@ -337,7 +402,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>                   return;
>               }
>   
> -            vhost_svq_add(svq, elem);
> +            ok = vhost_svq_add(svq, elem);
> +            if (unlikely(!ok)) {
> +                /* VQ is broken, just return and ignore any other kicks */
> +                return;
> +            }
>               vhost_svq_kick(svq);
>           }
>   
> @@ -619,12 +688,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>    * methods and file descriptors.
>    *
>    * @qsize Shadow VirtQueue size
> + * @iova_tree Tree to perform descriptors translations
>    *
>    * Returns the new virtqueue or NULL.
>    *
>    * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_tree)
>   {
>       size_t desc_size = sizeof(vring_desc_t) * qsize;
>       size_t device_size, driver_size;
> @@ -656,6 +726,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> +    svq->iova_tree = iova_tree;
>       svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 0e5c00ed7e..276a559649 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -209,6 +209,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                            vaddr, section->readonly);
>   
>       llsize = int128_sub(llend, int128_make64(iova));
> +    if (v->shadow_vqs_enabled) {
> +        DMAMap mem_region = {
> +            .translated_addr = (hwaddr)vaddr,
> +            .size = int128_get64(llsize) - 1,
> +            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> +        };
> +
> +        int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
> +        assert(r == IOVA_OK);


It's better to fail or warn here.


> +
> +        iova = mem_region.iova;
> +    }
>   
>       vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> @@ -261,6 +273,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>   
>       llsize = int128_sub(llend, int128_make64(iova));
>   
> +    if (v->shadow_vqs_enabled) {
> +        const DMAMap *result;
> +        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
> +            section->offset_within_region +
> +            (iova - section->offset_within_address_space);
> +        DMAMap mem_region = {
> +            .translated_addr = (hwaddr)vaddr,
> +            .size = int128_get64(llsize) - 1,
> +        };
> +
> +        result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
> +        iova = result->iova;
> +        vhost_iova_tree_remove(v->iova_tree, &mem_region);
> +    }
>       vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
>       if (ret) {
> @@ -783,33 +809,70 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
>   /**
>    * Unmap SVQ area in the device
>    */
> -static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> -                                      hwaddr size)
> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> +                                      const DMAMap *needle)
>   {
> +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> +    hwaddr size;
>       int r;
>   
> -    size = ROUND_UP(size, qemu_real_host_page_size);
> -    r = vhost_vdpa_dma_unmap(v, iova, size);
> +    if (unlikely(!result)) {
> +        error_report("Unable to find SVQ address to unmap");
> +        return false;
> +    }
> +
> +    size = ROUND_UP(result->size, qemu_real_host_page_size);
> +    r = vhost_vdpa_dma_unmap(v, result->iova, size);
>       return r == 0;
>   }
>   
>   static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>                                          const VhostShadowVirtqueue *svq)
>   {
> +    DMAMap needle;
>       struct vhost_vdpa *v = dev->opaque;
>       struct vhost_vring_addr svq_addr;
> -    size_t device_size = vhost_svq_device_area_size(svq);
> -    size_t driver_size = vhost_svq_driver_area_size(svq);
>       bool ok;
>   
>       vhost_svq_get_vring_addr(svq, &svq_addr);
>   
> -    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.desc_user_addr,
> +    };
> +    ok = vhost_vdpa_svq_unmap_ring(v, &needle);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.used_user_addr,
> +    };
> +    return vhost_vdpa_svq_unmap_ring(v, &needle);
> +}
> +
> +/**
> + * Map SVQ area in the device
> + *
> + * @v          Vhost-vdpa device
> + * @needle     The area to search iova
> + * @readonly   Permissions of the area
> + */
> +static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, const DMAMap *needle,
> +                                    bool readonly)
> +{
> +    hwaddr off;
> +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> +    int r;
> +
> +    if (unlikely(!result)) {
> +        error_report("Can't locate SVQ ring");
> +        return false;
> +    }
> +
> +    off = needle->translated_addr - result->translated_addr;
> +    r = vhost_vdpa_dma_map(v, result->iova + off, needle->size,
> +                           (void *)needle->translated_addr, readonly);
> +    return r == 0;
>   }
>   
>   /**
> @@ -821,23 +884,29 @@ static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>   static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
>                                        const VhostShadowVirtqueue *svq)
>   {
> +    DMAMap needle;
>       struct vhost_vdpa *v = dev->opaque;
>       struct vhost_vring_addr svq_addr;
>       size_t device_size = vhost_svq_device_area_size(svq);
>       size_t driver_size = vhost_svq_driver_area_size(svq);
> -    int r;
> +    bool ok;
>   
>       vhost_svq_get_vring_addr(svq, &svq_addr);
>   
> -    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> -                           (void *)svq_addr.desc_user_addr, true);
> -    if (unlikely(r != 0)) {
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.desc_user_addr,
> +        .size = driver_size,
> +    };
> +    ok = vhost_vdpa_svq_map_ring(v, &needle, true);
> +    if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> -                           (void *)svq_addr.used_user_addr, false);
> -    return r == 0;
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.used_user_addr,
> +        .size = device_size,
> +    };
> +    return vhost_vdpa_svq_map_ring(v, &needle, false);
>   }
>   
>   static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> @@ -1006,6 +1075,23 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> +static bool vhost_vdpa_svq_get_vq_region(struct vhost_vdpa *v,
> +                                         unsigned long long addr,
> +                                         uint64_t *iova_addr)
> +{
> +    const DMAMap needle = {
> +        .translated_addr = addr,
> +    };
> +    const DMAMap *translation = vhost_iova_tree_find_iova(v->iova_tree,
> +                                                          &needle);
> +    if (!translation) {
> +        return false;
> +    }
> +
> +    *iova_addr = translation->iova + (addr - translation->translated_addr);
> +    return true;
> +}
> +
>   static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
>                                            struct vhost_virtqueue *vq)
>   {
> @@ -1023,10 +1109,23 @@ static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
>       assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>   
>       if (v->shadow_vqs_enabled) {
> +        struct vhost_vring_addr svq_addr;
>           int idx = vhost_vdpa_get_vq_index(dev, addr->index);
>           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
>   
> -        vhost_svq_get_vring_addr(svq, addr);
> +        vhost_svq_get_vring_addr(svq, &svq_addr);
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.desc_user_addr,
> +                                          &addr->desc_user_addr)) {
> +            return -1;
> +        }
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.avail_user_addr,
> +                                          &addr->avail_user_addr)) {
> +            return -1;
> +        }
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.used_user_addr,
> +                                          &addr->used_user_addr)) {
> +            return -1;
> +        }
>       } else {
>           vhost_vdpa_vq_get_guest_addr(addr, vq);
>       }
> @@ -1095,13 +1194,37 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>   
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
> -
> +        DMAMap device_region, driver_region;
> +        struct vhost_vring_addr addr;
> +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize, v->iova_tree);
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);
>               return -1;
>           }
> -        g_ptr_array_add(v->shadow_vqs, svq);
> +
> +        vhost_svq_get_vring_addr(svq, &addr);
> +        driver_region = (DMAMap) {
> +            .translated_addr = (hwaddr)addr.desc_user_addr,
> +
> +            /*
> +             * DMAMAp.size include the last byte included in the range, while
> +             * sizeof marks one past it. Substract one byte to make them match.
> +             */
> +            .size = vhost_svq_driver_area_size(svq) - 1,
> +            .perm = VHOST_ACCESS_RO,
> +        };
> +        device_region = (DMAMap) {
> +            .translated_addr = (hwaddr)addr.used_user_addr,
> +            .size = vhost_svq_device_area_size(svq) - 1,
> +            .perm = VHOST_ACCESS_RW,
> +        };
> +
> +        r = vhost_iova_tree_map_alloc(v->iova_tree, &driver_region);
> +        assert(r == IOVA_OK);


Let's fail instead of assert here.

Thanks


> +        r = vhost_iova_tree_map_alloc(v->iova_tree, &device_region);
> +        assert(r == IOVA_OK);
> +
> +        g_ptr_array_add(shadow_vqs, svq);
>       }
>   
>   out:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
@ 2022-01-30  5:57     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  5:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Use translations added in VhostIOVATree in SVQ.
>
> Only introduce usage here, not allocation and deallocation. As with
> previous patches, we use the dead code paths of shadow_vqs_enabled to
> avoid commiting too many changes at once. These are impossible to take
> at the moment.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   3 +-
>   include/hw/virtio/vhost-vdpa.h     |   3 +
>   hw/virtio/vhost-shadow-virtqueue.c | 111 ++++++++++++++++----
>   hw/virtio/vhost-vdpa.c             | 161 +++++++++++++++++++++++++----
>   4 files changed, 238 insertions(+), 40 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 19c934af49..c6f67d6f76 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -12,6 +12,7 @@
>   
>   #include "hw/virtio/vhost.h"
>   #include "qemu/event_notifier.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> @@ -37,7 +38,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>                        VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_map);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
>   
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 009a9f3b6b..cd2388b3be 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -14,6 +14,7 @@
>   
>   #include <gmodule.h>
>   
> +#include "hw/virtio/vhost-iova-tree.h"
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
>       bool shadow_vqs_enabled;
> +    /* IOVA mapping used by Shadow Virtqueue */
> +    VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index a1a404f68f..c7888eb8cf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,6 +11,7 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost.h"
>   #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -45,6 +46,9 @@ typedef struct VhostShadowVirtqueue {
>       /* Virtio device */
>       VirtIODevice *vdev;
>   
> +    /* IOVA mapping */
> +    VhostIOVATree *iova_tree;
> +
>       /* Map for returning guest's descriptors */
>       VirtQueueElement **ring_id_maps;
>   
> @@ -97,13 +101,7 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
>               continue;
>   
>           case VIRTIO_F_ACCESS_PLATFORM:
> -            /* SVQ does not know how to translate addresses */
> -            if (*dev_features & BIT_ULL(b)) {
> -                clear_bit(b, dev_features);
> -                r = false;
> -            }
> -            break;
> -
> +            /* SVQ trust in host's IOMMU to translate addresses */
>           case VIRTIO_F_VERSION_1:
>               /* SVQ trust that guest vring is little endian */
>               if (!(*dev_features & BIT_ULL(b))) {
> @@ -205,7 +203,55 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
>       }
>   }
>   
> +/**
> + * Translate addresses between qemu's virtual address and SVQ IOVA
> + *
> + * @svq    Shadow VirtQueue
> + * @vaddr  Translated IOVA addresses
> + * @iovec  Source qemu's VA addresses
> + * @num    Length of iovec and minimum length of vaddr
> + */
> +static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> +                                     void **addrs, const struct iovec *iovec,
> +                                     size_t num)
> +{
> +    size_t i;
> +
> +    if (num == 0) {
> +        return true;
> +    }
> +
> +    for (i = 0; i < num; ++i) {
> +        DMAMap needle = {
> +            .translated_addr = (hwaddr)iovec[i].iov_base,
> +            .size = iovec[i].iov_len,
> +        };
> +        size_t off;
> +
> +        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
> +        /*
> +         * Map cannot be NULL since iova map contains all guest space and
> +         * qemu already has a physical address mapped
> +         */
> +        if (unlikely(!map)) {
> +            error_report("Invalid address 0x%"HWADDR_PRIx" given by guest",
> +                         needle.translated_addr);


This can be triggered by guest, we need use once or log_guest_error() etc.


> +            return false;
> +        }
> +
> +        /*
> +         * Map->iova chunk size is ignored. What to do if descriptor
> +         * (addr, size) does not fit is delegated to the device.
> +         */


I think we need at least check the size and fail if the size doesn't 
match here. Or is it possible that we have a buffer that may cross two 
memory regions?


> +        off = needle.translated_addr - map->translated_addr;
> +        addrs[i] = (void *)(map->iova + off);
> +    }
> +
> +    return true;
> +}
> +
>   static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    void * const *vaddr_sg,
>                                       const struct iovec *iovec,
>                                       size_t num, bool more_descs, bool write)
>   {
> @@ -224,7 +270,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>           } else {
>               descs[i].flags = flags;
>           }
> -        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].addr = cpu_to_le64((hwaddr)vaddr_sg[n]);
>           descs[i].len = cpu_to_le32(iovec[n].iov_len);
>   
>           last = i;
> @@ -234,42 +280,60 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>       svq->free_head = le16_to_cpu(descs[last].next);
>   }
>   
> -static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -                                    VirtQueueElement *elem)
> +static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                VirtQueueElement *elem,
> +                                unsigned *head)


I'd suggest to make it returns bool since the patch that introduces this 
function.


>   {
> -    int head;
>       unsigned avail_idx;
>       vring_avail_t *avail = svq->vring.avail;
> +    bool ok;
> +    g_autofree void **sgs = g_new(void *, MAX(elem->out_num, elem->in_num));
>   
> -    head = svq->free_head;
> +    *head = svq->free_head;
>   
>       /* We need some descriptors here */
>       assert(elem->out_num || elem->in_num);
>   
> -    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
>                               elem->in_num > 0, false);
> -    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +
> +    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
>   
>       /*
>        * Put entry in available array (but don't update avail->idx until they
>        * do sync).
>        */
>       avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> -    avail->ring[avail_idx] = cpu_to_le16(head);
> +    avail->ring[avail_idx] = cpu_to_le16(*head);
>       svq->avail_idx_shadow++;
>   
>       /* Update avail index after the descriptor is wrote */
>       smp_wmb();
>       avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>   
> -    return head;
> +    return true;
>   }
>   
> -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>   {
> -    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +    unsigned qemu_head;
> +    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
>   
>       svq->ring_id_maps[qemu_head] = elem;
> +    return true;
>   }
>   
>   static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> @@ -309,6 +373,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   
>           while (true) {
>               VirtQueueElement *elem;
> +            bool ok;
>   
>               if (svq->next_guest_avail_elem) {
>                   elem = g_steal_pointer(&svq->next_guest_avail_elem);
> @@ -337,7 +402,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>                   return;
>               }
>   
> -            vhost_svq_add(svq, elem);
> +            ok = vhost_svq_add(svq, elem);
> +            if (unlikely(!ok)) {
> +                /* VQ is broken, just return and ignore any other kicks */
> +                return;
> +            }
>               vhost_svq_kick(svq);
>           }
>   
> @@ -619,12 +688,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>    * methods and file descriptors.
>    *
>    * @qsize Shadow VirtQueue size
> + * @iova_tree Tree to perform descriptors translations
>    *
>    * Returns the new virtqueue or NULL.
>    *
>    * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_tree)
>   {
>       size_t desc_size = sizeof(vring_desc_t) * qsize;
>       size_t device_size, driver_size;
> @@ -656,6 +726,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> +    svq->iova_tree = iova_tree;
>       svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 0e5c00ed7e..276a559649 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -209,6 +209,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                            vaddr, section->readonly);
>   
>       llsize = int128_sub(llend, int128_make64(iova));
> +    if (v->shadow_vqs_enabled) {
> +        DMAMap mem_region = {
> +            .translated_addr = (hwaddr)vaddr,
> +            .size = int128_get64(llsize) - 1,
> +            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> +        };
> +
> +        int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
> +        assert(r == IOVA_OK);


It's better to fail or warn here.


> +
> +        iova = mem_region.iova;
> +    }
>   
>       vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> @@ -261,6 +273,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>   
>       llsize = int128_sub(llend, int128_make64(iova));
>   
> +    if (v->shadow_vqs_enabled) {
> +        const DMAMap *result;
> +        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
> +            section->offset_within_region +
> +            (iova - section->offset_within_address_space);
> +        DMAMap mem_region = {
> +            .translated_addr = (hwaddr)vaddr,
> +            .size = int128_get64(llsize) - 1,
> +        };
> +
> +        result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
> +        iova = result->iova;
> +        vhost_iova_tree_remove(v->iova_tree, &mem_region);
> +    }
>       vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
>       if (ret) {
> @@ -783,33 +809,70 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
>   /**
>    * Unmap SVQ area in the device
>    */
> -static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> -                                      hwaddr size)
> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> +                                      const DMAMap *needle)
>   {
> +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> +    hwaddr size;
>       int r;
>   
> -    size = ROUND_UP(size, qemu_real_host_page_size);
> -    r = vhost_vdpa_dma_unmap(v, iova, size);
> +    if (unlikely(!result)) {
> +        error_report("Unable to find SVQ address to unmap");
> +        return false;
> +    }
> +
> +    size = ROUND_UP(result->size, qemu_real_host_page_size);
> +    r = vhost_vdpa_dma_unmap(v, result->iova, size);
>       return r == 0;
>   }
>   
>   static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>                                          const VhostShadowVirtqueue *svq)
>   {
> +    DMAMap needle;
>       struct vhost_vdpa *v = dev->opaque;
>       struct vhost_vring_addr svq_addr;
> -    size_t device_size = vhost_svq_device_area_size(svq);
> -    size_t driver_size = vhost_svq_driver_area_size(svq);
>       bool ok;
>   
>       vhost_svq_get_vring_addr(svq, &svq_addr);
>   
> -    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.desc_user_addr,
> +    };
> +    ok = vhost_vdpa_svq_unmap_ring(v, &needle);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.used_user_addr,
> +    };
> +    return vhost_vdpa_svq_unmap_ring(v, &needle);
> +}
> +
> +/**
> + * Map SVQ area in the device
> + *
> + * @v          Vhost-vdpa device
> + * @needle     The area to search iova
> + * @readonly   Permissions of the area
> + */
> +static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, const DMAMap *needle,
> +                                    bool readonly)
> +{
> +    hwaddr off;
> +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> +    int r;
> +
> +    if (unlikely(!result)) {
> +        error_report("Can't locate SVQ ring");
> +        return false;
> +    }
> +
> +    off = needle->translated_addr - result->translated_addr;
> +    r = vhost_vdpa_dma_map(v, result->iova + off, needle->size,
> +                           (void *)needle->translated_addr, readonly);
> +    return r == 0;
>   }
>   
>   /**
> @@ -821,23 +884,29 @@ static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>   static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
>                                        const VhostShadowVirtqueue *svq)
>   {
> +    DMAMap needle;
>       struct vhost_vdpa *v = dev->opaque;
>       struct vhost_vring_addr svq_addr;
>       size_t device_size = vhost_svq_device_area_size(svq);
>       size_t driver_size = vhost_svq_driver_area_size(svq);
> -    int r;
> +    bool ok;
>   
>       vhost_svq_get_vring_addr(svq, &svq_addr);
>   
> -    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> -                           (void *)svq_addr.desc_user_addr, true);
> -    if (unlikely(r != 0)) {
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.desc_user_addr,
> +        .size = driver_size,
> +    };
> +    ok = vhost_vdpa_svq_map_ring(v, &needle, true);
> +    if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> -                           (void *)svq_addr.used_user_addr, false);
> -    return r == 0;
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.used_user_addr,
> +        .size = device_size,
> +    };
> +    return vhost_vdpa_svq_map_ring(v, &needle, false);
>   }
>   
>   static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> @@ -1006,6 +1075,23 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> +static bool vhost_vdpa_svq_get_vq_region(struct vhost_vdpa *v,
> +                                         unsigned long long addr,
> +                                         uint64_t *iova_addr)
> +{
> +    const DMAMap needle = {
> +        .translated_addr = addr,
> +    };
> +    const DMAMap *translation = vhost_iova_tree_find_iova(v->iova_tree,
> +                                                          &needle);
> +    if (!translation) {
> +        return false;
> +    }
> +
> +    *iova_addr = translation->iova + (addr - translation->translated_addr);
> +    return true;
> +}
> +
>   static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
>                                            struct vhost_virtqueue *vq)
>   {
> @@ -1023,10 +1109,23 @@ static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
>       assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>   
>       if (v->shadow_vqs_enabled) {
> +        struct vhost_vring_addr svq_addr;
>           int idx = vhost_vdpa_get_vq_index(dev, addr->index);
>           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
>   
> -        vhost_svq_get_vring_addr(svq, addr);
> +        vhost_svq_get_vring_addr(svq, &svq_addr);
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.desc_user_addr,
> +                                          &addr->desc_user_addr)) {
> +            return -1;
> +        }
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.avail_user_addr,
> +                                          &addr->avail_user_addr)) {
> +            return -1;
> +        }
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.used_user_addr,
> +                                          &addr->used_user_addr)) {
> +            return -1;
> +        }
>       } else {
>           vhost_vdpa_vq_get_guest_addr(addr, vq);
>       }
> @@ -1095,13 +1194,37 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>   
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
> -
> +        DMAMap device_region, driver_region;
> +        struct vhost_vring_addr addr;
> +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize, v->iova_tree);
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);
>               return -1;
>           }
> -        g_ptr_array_add(v->shadow_vqs, svq);
> +
> +        vhost_svq_get_vring_addr(svq, &addr);
> +        driver_region = (DMAMap) {
> +            .translated_addr = (hwaddr)addr.desc_user_addr,
> +
> +            /*
> +             * DMAMAp.size include the last byte included in the range, while
> +             * sizeof marks one past it. Substract one byte to make them match.
> +             */
> +            .size = vhost_svq_driver_area_size(svq) - 1,
> +            .perm = VHOST_ACCESS_RO,
> +        };
> +        device_region = (DMAMap) {
> +            .translated_addr = (hwaddr)addr.used_user_addr,
> +            .size = vhost_svq_device_area_size(svq) - 1,
> +            .perm = VHOST_ACCESS_RW,
> +        };
> +
> +        r = vhost_iova_tree_map_alloc(v->iova_tree, &driver_region);
> +        assert(r == IOVA_OK);


Let's fail instead of assert here.

Thanks


> +        r = vhost_iova_tree_map_alloc(v->iova_tree, &device_region);
> +        assert(r == IOVA_OK);
> +
> +        g_ptr_array_add(shadow_vqs, svq);
>       }
>   
>   out:



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-01-21 20:27 ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
@ 2022-01-30  6:46     ` Jason Wang
  2022-01-30  6:46     ` Jason Wang
  1 sibling, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:46 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   {
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> +
> +    if (!svq->vq) {
> +        return;
> +    }
> +
> +    /* Send all pending used descriptors to guest */
> +    vhost_svq_flush(svq, false);


Do we need to wait for all the pending descriptors to be completed here?

Thanks


> +
> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = NULL;
> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
> +
> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +    if (next_avail_elem) {
> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> +                                 next_avail_elem->len);
> +    }
>   }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-01-30  6:46     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:46 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   {
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> +
> +    if (!svq->vq) {
> +        return;
> +    }
> +
> +    /* Send all pending used descriptors to guest */
> +    vhost_svq_flush(svq, false);


Do we need to wait for all the pending descriptors to be completed here?

Thanks


> +
> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = NULL;
> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
> +
> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +    if (next_avail_elem) {
> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> +                                 next_avail_elem->len);
> +    }
>   }



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-01-21 20:27 ` [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
@ 2022-01-30  6:50     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> SVQ is able to log the dirty bits by itself, so let's use it to not
> block migration.
>
> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> enabled. Even if the device supports it, the reports would be nonsense
> because SVQ memory is in the qemu region.
>
> The log region is still allocated. Future changes might skip that, but
> this series is already long enough.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index fb0a338baa..75090d65e8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>       if (ret == 0 && v->shadow_vqs_enabled) {
>           /* Filter only features that SVQ can offer to guest */
>           vhost_svq_valid_guest_features(features);
> +
> +        /* Add SVQ logging capabilities */
> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>       }
>   
>       return ret;
> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>   
>       if (v->shadow_vqs_enabled) {
>           uint64_t dev_features, svq_features, acked_features;
> +        uint8_t status = 0;
>           bool ok;
>   
> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> +        if (unlikely(ret)) {
> +            return ret;
> +        }
> +
> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +            /*
> +             * vhost is trying to enable or disable _F_LOG, and the device
> +             * would report wrong dirty pages. SVQ handles it.
> +             */


I fail to understand this comment, I'd think there's no way to disable 
dirty page tracking for SVQ.

Thanks


> +            return 0;
> +        }
> +
> +        /* We must not ack _F_LOG if SVQ is enabled */
> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> +
>           ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>           if (ret != 0) {
>               error_report("Can't get vdpa device features, got (%d)", ret);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
@ 2022-01-30  6:50     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> SVQ is able to log the dirty bits by itself, so let's use it to not
> block migration.
>
> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> enabled. Even if the device supports it, the reports would be nonsense
> because SVQ memory is in the qemu region.
>
> The log region is still allocated. Future changes might skip that, but
> this series is already long enough.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index fb0a338baa..75090d65e8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>       if (ret == 0 && v->shadow_vqs_enabled) {
>           /* Filter only features that SVQ can offer to guest */
>           vhost_svq_valid_guest_features(features);
> +
> +        /* Add SVQ logging capabilities */
> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>       }
>   
>       return ret;
> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>   
>       if (v->shadow_vqs_enabled) {
>           uint64_t dev_features, svq_features, acked_features;
> +        uint8_t status = 0;
>           bool ok;
>   
> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> +        if (unlikely(ret)) {
> +            return ret;
> +        }
> +
> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +            /*
> +             * vhost is trying to enable or disable _F_LOG, and the device
> +             * would report wrong dirty pages. SVQ handles it.
> +             */


I fail to understand this comment, I'd think there's no way to disable 
dirty page tracking for SVQ.

Thanks


> +            return 0;
> +        }
> +
> +        /* We must not ack _F_LOG if SVQ is enabled */
> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> +
>           ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>           if (ret != 0) {
>               error_report("Can't get vdpa device features, got (%d)", ret);



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 29/31] vdpa: Make ncs autofree
  2022-01-21 20:27 ` [PATCH 29/31] vdpa: Make ncs autofree Eugenio Pérez
@ 2022-01-30  6:51     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:51 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Simplifying memory management.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


To reduce the size of this series. This can be sent as an separate patch 
if I was not wrong.

Thanks


> ---
>   net/vhost-vdpa.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4125d13118..4befba5cc7 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -264,7 +264,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>   {
>       const NetdevVhostVDPAOptions *opts;
>       int vdpa_device_fd;
> -    NetClientState **ncs, *nc;
> +    g_autofree NetClientState **ncs = NULL;
> +    NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> @@ -302,7 +303,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>               goto err;
>       }
>   
> -    g_free(ncs);
>       return 0;
>   
>   err:
> @@ -310,7 +310,6 @@ err:
>           qemu_del_net_client(ncs[0]);
>       }
>       qemu_close(vdpa_device_fd);
> -    g_free(ncs);
>   
>       return -1;
>   }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 29/31] vdpa: Make ncs autofree
@ 2022-01-30  6:51     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:51 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Simplifying memory management.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


To reduce the size of this series. This can be sent as an separate patch 
if I was not wrong.

Thanks


> ---
>   net/vhost-vdpa.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4125d13118..4befba5cc7 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -264,7 +264,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>   {
>       const NetdevVhostVDPAOptions *opts;
>       int vdpa_device_fd;
> -    NetClientState **ncs, *nc;
> +    g_autofree NetClientState **ncs = NULL;
> +    NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> @@ -302,7 +303,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>               goto err;
>       }
>   
> -    g_free(ncs);
>       return 0;
>   
>   err:
> @@ -310,7 +310,6 @@ err:
>           qemu_del_net_client(ncs[0]);
>       }
>       qemu_close(vdpa_device_fd);
> -    g_free(ncs);
>   
>       return -1;
>   }



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
  2022-01-21 20:27 ` [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c Eugenio Pérez
@ 2022-01-30  6:53     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:53 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Since it's a device property, it can be done in net/. This helps SVQ to
> allocate the rings in vdpa device initialization, rather than delay
> that.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 15 ---------------
>   net/vhost-vdpa.c       | 32 ++++++++++++++++++++++++--------


I don't understand here, since we will support device other than net?


>   2 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 75090d65e8..2491c05d29 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -350,19 +350,6 @@ static int vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
>       return 0;
>   }
>   
> -static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> -{
> -    int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
> -                              &v->iova_range);
> -    if (ret != 0) {
> -        v->iova_range.first = 0;
> -        v->iova_range.last = UINT64_MAX;
> -    }
> -
> -    trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> -                                    v->iova_range.last);
> -}


Let's just export this instead?

Thanks


> -
>   static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1295,8 +1282,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>           goto err;
>       }
>   
> -    vhost_vdpa_get_iova_range(v);
> -
>       if (vhost_vdpa_one_time_request(dev)) {
>           return 0;
>       }
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4befba5cc7..cc9cecf8d1 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -22,6 +22,7 @@
>   #include <sys/ioctl.h>
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
> +#include "standard-headers/linux/vhost_types.h"
>   #include "monitor/monitor.h"
>   #include "hw/virtio/vhost.h"
>   
> @@ -187,13 +188,25 @@ static NetClientInfo net_vhost_vdpa_info = {
>           .check_peer_type = vhost_vdpa_check_peer_type,
>   };
>   
> +static void vhost_vdpa_get_iova_range(int fd,
> +                                      struct vhost_vdpa_iova_range *iova_range)
> +{
> +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> +
> +    if (ret < 0) {
> +        iova_range->first = 0;
> +        iova_range->last = UINT64_MAX;
> +    }
> +}
> +
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -                                           const char *device,
> -                                           const char *name,
> -                                           int vdpa_device_fd,
> -                                           int queue_pair_index,
> -                                           int nvqs,
> -                                           bool is_datapath)
> +                                       const char *device,
> +                                       const char *name,
> +                                       int vdpa_device_fd,
> +                                       int queue_pair_index,
> +                                       int nvqs,
> +                                       bool is_datapath,
> +                                       struct vhost_vdpa_iova_range iova_range)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -211,6 +224,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
> +    s->vhost_vdpa.iova_range = iova_range;
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {
>           qemu_del_net_client(nc);
> @@ -267,6 +281,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       g_autofree NetClientState **ncs = NULL;
>       NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
> +    struct vhost_vdpa_iova_range iova_range;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>       opts = &netdev->u.vhost_vdpa;
> @@ -286,19 +301,20 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           qemu_close(vdpa_device_fd);
>           return queue_pairs;
>       }
> +    vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>   
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true);
> +                                     vdpa_device_fd, i, 2, true, iova_range);
>           if (!ncs[i])
>               goto err;
>       }
>   
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false);
> +                                 vdpa_device_fd, i, 1, false, iova_range);
>           if (!nc)
>               goto err;
>       }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
@ 2022-01-30  6:53     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-01-30  6:53 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, Markus Armbruster,
	Gautam Dawar, virtualization, Eduardo Habkost,
	Harpreet Singh Anand, Xiao W Wang, Peter Xu, Stefan Hajnoczi,
	Eli Cohen, Paolo Bonzini, Zhu Lingshan, Eric Blake,
	Stefano Garzarella


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Since it's a device property, it can be done in net/. This helps SVQ to
> allocate the rings in vdpa device initialization, rather than delay
> that.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 15 ---------------
>   net/vhost-vdpa.c       | 32 ++++++++++++++++++++++++--------


I don't understand here, since we will support device other than net?


>   2 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 75090d65e8..2491c05d29 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -350,19 +350,6 @@ static int vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
>       return 0;
>   }
>   
> -static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> -{
> -    int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
> -                              &v->iova_range);
> -    if (ret != 0) {
> -        v->iova_range.first = 0;
> -        v->iova_range.last = UINT64_MAX;
> -    }
> -
> -    trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> -                                    v->iova_range.last);
> -}


Let's just export this instead?

Thanks


> -
>   static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1295,8 +1282,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>           goto err;
>       }
>   
> -    vhost_vdpa_get_iova_range(v);
> -
>       if (vhost_vdpa_one_time_request(dev)) {
>           return 0;
>       }
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4befba5cc7..cc9cecf8d1 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -22,6 +22,7 @@
>   #include <sys/ioctl.h>
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
> +#include "standard-headers/linux/vhost_types.h"
>   #include "monitor/monitor.h"
>   #include "hw/virtio/vhost.h"
>   
> @@ -187,13 +188,25 @@ static NetClientInfo net_vhost_vdpa_info = {
>           .check_peer_type = vhost_vdpa_check_peer_type,
>   };
>   
> +static void vhost_vdpa_get_iova_range(int fd,
> +                                      struct vhost_vdpa_iova_range *iova_range)
> +{
> +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> +
> +    if (ret < 0) {
> +        iova_range->first = 0;
> +        iova_range->last = UINT64_MAX;
> +    }
> +}
> +
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -                                           const char *device,
> -                                           const char *name,
> -                                           int vdpa_device_fd,
> -                                           int queue_pair_index,
> -                                           int nvqs,
> -                                           bool is_datapath)
> +                                       const char *device,
> +                                       const char *name,
> +                                       int vdpa_device_fd,
> +                                       int queue_pair_index,
> +                                       int nvqs,
> +                                       bool is_datapath,
> +                                       struct vhost_vdpa_iova_range iova_range)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -211,6 +224,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
> +    s->vhost_vdpa.iova_range = iova_range;
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {
>           qemu_del_net_client(nc);
> @@ -267,6 +281,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       g_autofree NetClientState **ncs = NULL;
>       NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
> +    struct vhost_vdpa_iova_range iova_range;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>       opts = &netdev->u.vhost_vdpa;
> @@ -286,19 +301,20 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           qemu_close(vdpa_device_fd);
>           return queue_pairs;
>       }
> +    vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>   
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true);
> +                                     vdpa_device_fd, i, 2, true, iova_range);
>           if (!ncs[i])
>               goto err;
>       }
>   
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false);
> +                                 vdpa_device_fd, i, 1, false, iova_range);
>           if (!nc)
>               goto err;
>       }



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
  2022-01-28  6:02   ` Jason Wang
  (?)
@ 2022-01-31  9:15   ` Eugenio Perez Martin
  2022-02-08  8:27       ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31  9:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's emulated virtio device
> > operation.
> >
> > When SVQ is enabled qemu offers a new virtual address space to the
> > device to read and write into, and it maps new vrings and the guest
> > memory in it. SVQ also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked, but at this RFC SVQ is not enabled on migration automatically.
> >
> > Thanks of being a buffers relay system, SVQ can be used also to
> > communicate devices and drivers with different capabilities, like
> > devices that only support packed vring and not split and old guests with
> > no driver packed support.
> >
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> >
> > This version of SVQ is limited in the amount of features it can use with
> > guest and device, because this series is already very big otherwise.
> > Features like indirect or event_idx will be addressed in future series.
> >
> > SVQ needs to be enabled with cmdline parameter x-svq, like:
> >
> > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
> >
> > In this version it cannot be enabled or disabled in runtime. Further
> > series will remove this limitation and will enable it only for migration
> > time.
> >
> > Some patches are intentionally very small to ease review, but they can
> > be squashed if preferred.
> >
> > Patches 1-10 prepares the SVQ and QEMU to support both guest to device
> > and device to guest notifications forwarding, with the extra qemu hop.
> > That part can be tested in isolation if cmdline change is reproduced.
> >
> > Patches from 11 to 18 implement the actual buffer forwarding, but with
> > no IOMMU support. It requires a vdpa device capable of addressing all
> > qemu vaddr.
> >
> > Patches 19 to 23 adds the iommu support, so the device with address
> > range limitations can access SVQ through this new virtual address space
> > created.
> >
> > The rest of the series add the last pieces needed for migration.
> >
> > Comments are welcome.
>
>
> I wonder the performance impact. So performance numbers are more than
> welcomed.
>

Sure, I'll do it for the next revision. Since this one brings a decent
amount of changes, I chose to collect the feedback first.

Thanks!

> Thanks
>
>
> >
> > TODO:
> > * Event, indirect, packed, and other features of virtio.
> > * To separate buffers forwarding in its own AIO context, so we can
> >    throw more threads to that task and we don't need to stop the main
> >    event loop.
> > * Support virtio-net control vq.
> > * Proper documentation.
> >
> > Changes from v5 RFC:
> > * Remove dynamic enablement of SVQ, making less dependent of the device.
> > * Enable live migration if SVQ is enabled.
> > * Fix SVQ when driver reset.
> > * Comments addressed, specially in the iova area.
> > * Rebase on latest master, adding multiqueue support (but no networking
> >    control vq processing).
> > v5 link:
> > https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
> >
> > Changes from v4 RFC:
> > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> >    already present iova-tree for that.
> > * Proper validation of guest features. Now SVQ can negotiate a
> >    different set of features with the device when enabled.
> > * Support of host notifiers memory regions
> > * Handling of SVQ full queue in case guest's descriptors span to
> >    different memory regions (qemu's VA chunks).
> > * Flush pending used buffers at end of SVQ operation.
> > * QMP command now looks by NetClientState name. Other devices will need
> >    to implement it's way to enable vdpa.
> > * Rename QMP command to set, so it looks more like a way of working
> > * Better use of qemu error system
> > * Make a few assertions proper error-handling paths.
> > * Add more documentation
> > * Less coupling of virtio / vhost, that could cause friction on changes
> > * Addressed many other small comments and small fixes.
> >
> > Changes from v3 RFC:
> >    * Move everything to vhost-vdpa backend. A big change, this allowed
> >      some cleanup but more code has been added in other places.
> >    * More use of glib utilities, especially to manage memory.
> > v3 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> >
> > Changes from v2 RFC:
> >    * Adding vhost-vdpa devices support
> >    * Fixed some memory leaks pointed by different comments
> > v2 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> >
> > Changes from v1 RFC:
> >    * Use QMP instead of migration to start SVQ mode.
> >    * Only accepting IOMMU devices, closer behavior with target devices
> >      (vDPA)
> >    * Fix invalid masking/unmasking of vhost call fd.
> >    * Use of proper methods for synchronization.
> >    * No need to modify VirtIO device code, all of the changes are
> >      contained in vhost code.
> >    * Delete superfluous code.
> >    * An intermediate RFC was sent with only the notifications forwarding
> >      changes. It can be seen in
> >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > v1 link:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> >
> > Eugenio Pérez (20):
> >        virtio: Add VIRTIO_F_QUEUE_STATE
> >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> >        virtio: Add virtio_queue_is_host_notifier_enabled
> >        vhost: Make vhost_virtqueue_{start,stop} public
> >        vhost: Add x-vhost-enable-shadow-vq qmp
> >        vhost: Add VhostShadowVirtqueue
> >        vdpa: Register vdpa devices in a list
> >        vhost: Route guest->host notification through shadow virtqueue
> >        Add vhost_svq_get_svq_call_notifier
> >        Add vhost_svq_set_guest_call_notifier
> >        vdpa: Save call_fd in vhost-vdpa
> >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >        vhost: Route host->guest notification through shadow virtqueue
> >        virtio: Add vhost_shadow_vq_get_vring_addr
> >        vdpa: Save host and guest features
> >        vhost: Add vhost_svq_valid_device_features to shadow vq
> >        vhost: Shadow virtqueue buffers forwarding
> >        vhost: Add VhostIOVATree
> >        vhost: Use a tree to store memory mappings
> >        vdpa: Add custom IOTLB translations to SVQ
> >
> > Eugenio Pérez (31):
> >    vdpa: Reorder virtio/vhost-vdpa.c functions
> >    vhost: Add VhostShadowVirtqueue
> >    vdpa: Add vhost_svq_get_dev_kick_notifier
> >    vdpa: Add vhost_svq_set_svq_kick_fd
> >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> >    vhost: Route guest->host notification through shadow virtqueue
> >    vhost: dd vhost_svq_get_svq_call_notifier
> >    vhost: Add vhost_svq_set_guest_call_notifier
> >    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >    vhost: Route host->guest notification through shadow virtqueue
> >    vhost: Add vhost_svq_valid_device_features to shadow vq
> >    vhost: Add vhost_svq_valid_guest_features to shadow vq
> >    vhost: Add vhost_svq_ack_guest_features to shadow vq
> >    virtio: Add vhost_shadow_vq_get_vring_addr
> >    vdpa: Add vhost_svq_get_num
> >    vhost: pass queue index to vhost_vq_get_addr
> >    vdpa: adapt vhost_ops callbacks to svq
> >    vhost: Shadow virtqueue buffers forwarding
> >    utils: Add internal DMAMap to iova-tree
> >    util: Store DMA entries in a list
> >    util: Add iova_tree_alloc
> >    vhost: Add VhostIOVATree
> >    vdpa: Add custom IOTLB translations to SVQ
> >    vhost: Add vhost_svq_get_last_used_idx
> >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> >    vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
> >    vdpa: Never set log_base addr if SVQ is enabled
> >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> >    vdpa: Make ncs autofree
> >    vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
> >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> >
> >   qapi/net.json                      |   5 +-
> >   hw/virtio/vhost-iova-tree.h        |  27 +
> >   hw/virtio/vhost-shadow-virtqueue.h |  46 ++
> >   include/hw/virtio/vhost-vdpa.h     |   7 +
> >   include/qemu/iova-tree.h           |  17 +
> >   hw/virtio/vhost-iova-tree.c        | 157 ++++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
> >   hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
> >   hw/virtio/vhost.c                  |   6 +-
> >   net/vhost-vdpa.c                   |  58 ++-
> >   util/iova-tree.c                   | 161 +++++-
> >   hw/virtio/meson.build              |   2 +-
> >   12 files changed, 1852 insertions(+), 135 deletions(-)
> >   create mode 100644 hw/virtio/vhost-iova-tree.h
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> >   create mode 100644 hw/virtio/vhost-iova-tree.c
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier
  2022-01-28  6:03     ` Jason Wang
  (?)
@ 2022-01-31  9:33     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31  9:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 7:03 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Is needed so vhost-vdpa knows the device's kick event fd.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 10 +++++++++-
> >   2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 61ea112002..400effd9f2 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -11,9 +11,13 @@
> >   #define VHOST_SHADOW_VIRTQUEUE_H
> >
> >   #include "hw/virtio/vhost.h"
> > +#include "qemu/event_notifier.h"
>
>
> Let's move this part to patch 2.
>

Sure, I'll change for the next revision.

> Thanks
>
>
> >
> >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> > +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > +                                              const VhostShadowVirtqueue *svq);
> > +
> >   VhostShadowVirtqueue *vhost_svq_new(void);
> >
> >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 5ee7b401cb..bd87110073 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -11,7 +11,6 @@
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >
> >   #include "qemu/error-report.h"
> > -#include "qemu/event_notifier.h"
> >
> >   /* Shadow virtqueue to relay notifications */
> >   typedef struct VhostShadowVirtqueue {
> > @@ -21,6 +20,15 @@ typedef struct VhostShadowVirtqueue {
> >       EventNotifier hdev_call;
> >   } VhostShadowVirtqueue;
> >
> > +/**
> > + * The notifier that SVQ will use to notify the device.
> > + */
> > +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > +                                               const VhostShadowVirtqueue *svq)
> > +{
> > +    return &svq->hdev_kick;
> > +}
> > +
> >   /**
> >    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >    * methods and file descriptors.
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
  2022-01-28  6:29     ` Jason Wang
  (?)
@ 2022-01-31 10:18     ` Eugenio Perez Martin
  2022-02-08  8:47         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 10:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 7:29 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > This function allows the vhost-vdpa backend to override kick_fd.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  1 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
> >   2 files changed, 46 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 400effd9f2..a56ecfc09d 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -15,6 +15,7 @@
> >
> >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> > +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >                                                 const VhostShadowVirtqueue *svq);
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index bd87110073..21534bc94d 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -11,6 +11,7 @@
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >
> >   #include "qemu/error-report.h"
> > +#include "qemu/main-loop.h"
> >
> >   /* Shadow virtqueue to relay notifications */
> >   typedef struct VhostShadowVirtqueue {
> > @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
> >       EventNotifier hdev_kick;
> >       /* Shadow call notifier, sent to vhost */
> >       EventNotifier hdev_call;
> > +
> > +    /*
> > +     * Borrowed virtqueue's guest to host notifier.
> > +     * To borrow it in this event notifier allows to register on the event
> > +     * loop and access the associated shadow virtqueue easily. If we use the
> > +     * VirtQueue, we don't have an easy way to retrieve it.
> > +     *
> > +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> > +     */
> > +    EventNotifier svq_kick;
> >   } VhostShadowVirtqueue;
> >
> > +#define INVALID_SVQ_KICK_FD -1
> > +
> >   /**
> >    * The notifier that SVQ will use to notify the device.
> >    */
> > @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >       return &svq->hdev_kick;
> >   }
> >
> > +/**
> > + * Set a new file descriptor for the guest to kick SVQ and notify for avail
> > + *
> > + * @svq          The svq
> > + * @svq_kick_fd  The new svq kick fd
> > + */
> > +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > +{
> > +    EventNotifier tmp;
> > +    bool check_old = INVALID_SVQ_KICK_FD !=
> > +                     event_notifier_get_fd(&svq->svq_kick);
> > +
> > +    if (check_old) {
> > +        event_notifier_set_handler(&svq->svq_kick, NULL);
> > +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
> > +    }
>
>
> It looks to me we don't do similar things in vhost-net. Any reason for
> caring about the old svq_kick?
>

Do you mean to check for old kick_fd in case we miss notifications,
and explicitly omit the INVALID_SVQ_KICK_FD?

If you mean qemu's vhost-net, I guess it's because the device's kick
fd is never changed in all the vhost device lifecycle, it's only set
at the beginning. Previous RFC also depended on that, but you
suggested better vhost and SVQ in v4 feedback if I understood
correctly [1]. Or am I missing something?

Qemu's vhost-net does not need to use this because it is not polling
it. For kernel's vhost, I guess the closest is the use of pollstop and
pollstart at vhost_vring_ioctl.

In my opinion, I think that SVQ code size can benefit from now
allowing to override kick_fd from the start of the operation. Not from
initialization, but start. But I can see the benefits of having the
change into account from this moment so it's more resilient to the
future.

>
> > +
> > +    /*
> > +     * event_notifier_set_handler already checks for guest's notifications if
> > +     * they arrive to the new file descriptor in the switch, so there is no
> > +     * need to explicitely check for them.
> > +     */
> > +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > +
> > +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
> > +        event_notifier_set(&svq->hdev_kick);
>
>
> Any reason we need to kick the device directly here?
>

At this point of the series only notifications are forwarded, not
buffers. If kick_fd is set, we need to check the old one, the same way
as vhost checks the masked notifier in case of change.

Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg03152.html
, from "I'd suggest to not depend on this since it:"


> Thanks
>
>
> > +    }
> > +}
> > +
> >   /**
> >    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >    * methods and file descriptors.
> > @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> >           goto err_init_hdev_call;
> >       }
> >
> > +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > +
> >       return g_steal_pointer(&svq);
> >
> >   err_init_hdev_call:
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities
  2022-01-28  6:32     ` Jason Wang
  (?)
@ 2022-01-31 10:48     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 10:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 7:33 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > At this mode no buffer forwarding will be performed in SVQ mode: Qemu
> > will just forward the guest's kicks to the device.
> >
> > Also, host notifiers must be disabled at SVQ start, and they will not
> > start if SVQ has been enabled when the device is stopped. This will be
> > addressed in next patches.
>
>
> We need to disable host_notifier_mr as well, otherwise guest may touch
> the hardware doorbell directly without going through eventfd.
>

Yes. SVQ cannot be enabled at this point anyway, but I think it's a
good idea to reorder so we disable hn_mr first.

>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> >   hw/virtio/vhost-shadow-virtqueue.c | 27 ++++++++++++++++++++++++++-
> >   2 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index a56ecfc09d..4c583a9171 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -19,6 +19,8 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >                                                 const VhostShadowVirtqueue *svq);
> >
> > +void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > +
> >   VhostShadowVirtqueue *vhost_svq_new(void);
> >
> >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 21534bc94d..8991f0b3c3 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -42,11 +42,26 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >       return &svq->hdev_kick;
> >   }
> >
> > +/* Forward guest notifications */
> > +static void vhost_handle_guest_kick(EventNotifier *n)
> > +{
> > +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > +                                             svq_kick);
> > +
> > +    if (unlikely(!event_notifier_test_and_clear(n))) {
> > +        return;
> > +    }
> > +
> > +    event_notifier_set(&svq->hdev_kick);
> > +}
> > +
> >   /**
> >    * Set a new file descriptor for the guest to kick SVQ and notify for avail
> >    *
> >    * @svq          The svq
> > - * @svq_kick_fd  The new svq kick fd
> > + * @svq_kick_fd  The svq kick fd
> > + *
> > + * Note that SVQ will never close the old file descriptor.
> >    */
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >   {
> > @@ -65,12 +80,22 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >        * need to explicitely check for them.
> >        */
> >       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > +    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> >
> >       if (!check_old || event_notifier_test_and_clear(&tmp)) {
> >           event_notifier_set(&svq->hdev_kick);
> >       }
> >   }
> >
> > +/**
> > + * Stop shadow virtqueue operation.
> > + * @svq Shadow Virtqueue
> > + */
> > +void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > +{
> > +    event_notifier_set_handler(&svq->svq_kick, NULL);
> > +}
>
>
> This function is not used in the patch.
>

Right, I will add the use of it here.

Thanks!

> Thanks
>
>
> > +
> >   /**
> >    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >    * methods and file descriptors.
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
  2022-01-28  6:56     ` Jason Wang
  (?)
@ 2022-01-31 11:33     ` Eugenio Perez Martin
  2022-02-08  9:02         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 11:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Fri, Jan 28, 2022 at 7:57 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > At this moment no buffer forwarding will be performed in SVQ mode: Qemu
> > just forward the guest's kicks to the device. This commit also set up
> > SVQs in the vhost device.
> >
> > Host memory notifiers regions are left out for simplicity, and they will
> > not be addressed in this series.
>
>
> I wonder if it's better to squash this into patch 5 since it gives us a
> full guest->host forwarding.
>

I'm fine with that if you think it makes the review easier.

>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/vhost-vdpa.h |   4 ++
> >   hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
> >   2 files changed, 124 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 3ce79a646d..009a9f3b6b 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -12,6 +12,8 @@
> >   #ifndef HW_VIRTIO_VHOST_VDPA_H
> >   #define HW_VIRTIO_VHOST_VDPA_H
> >
> > +#include <gmodule.h>
> > +
> >   #include "hw/virtio/virtio.h"
> >   #include "standard-headers/linux/vhost_types.h"
> >
> > @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
> >       bool iotlb_batch_begin_sent;
> >       MemoryListener listener;
> >       struct vhost_vdpa_iova_range iova_range;
> > +    bool shadow_vqs_enabled;
> > +    GPtrArray *shadow_vqs;
> >       struct vhost_dev *dev;
> >       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >   } VhostVDPA;
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 6c10a7f05f..18de14f0fb 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -17,12 +17,14 @@
> >   #include "hw/virtio/vhost.h"
> >   #include "hw/virtio/vhost-backend.h"
> >   #include "hw/virtio/virtio-net.h"
> > +#include "hw/virtio/vhost-shadow-virtqueue.h"
> >   #include "hw/virtio/vhost-vdpa.h"
> >   #include "exec/address-spaces.h"
> >   #include "qemu/main-loop.h"
> >   #include "cpu.h"
> >   #include "trace.h"
> >   #include "qemu-common.h"
> > +#include "qapi/error.h"
> >
> >   /*
> >    * Return one past the end of the end of section. Be careful with uint64_t
> > @@ -409,8 +411,14 @@ err:
> >
> >   static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
> >   {
> > +    struct vhost_vdpa *v = dev->opaque;
> >       int i;
> >
> > +    if (v->shadow_vqs_enabled) {
> > +        /* SVQ is not compatible with host notifiers mr */
>
>
> I guess there should be a TODO or FIXME here.
>

Sure I can add it.

>
> > +        return;
> > +    }
> > +
> >       for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
> >           if (vhost_vdpa_host_notifier_init(dev, i)) {
> >               goto err;
> > @@ -424,6 +432,17 @@ err:
> >       return;
> >   }
> >
> > +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    size_t idx;
> > +
> > +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
> > +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
> > +    }
> > +    g_ptr_array_free(v->shadow_vqs, true);
> > +}
> > +
> >   static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> >   {
> >       struct vhost_vdpa *v;
> > @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> >       trace_vhost_vdpa_cleanup(dev, v);
> >       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >       memory_listener_unregister(&v->listener);
> > +    vhost_vdpa_svq_cleanup(dev);
> >
> >       dev->opaque = NULL;
> >       ram_block_discard_disable(false);
> > @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
> >
> >   static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> >   {
> > +    struct vhost_vdpa *v = dev->opaque;
> >       int ret;
> >       uint8_t status = 0;
> >
> > +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> > +        vhost_svq_stop(svq);
> > +    }
> > +
> >       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> >       trace_vhost_vdpa_reset_device(dev, status);
> >       return ret;
> > @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >       return ret;
> >   }
> >
> > -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> > -                                       struct vhost_vring_file *file)
> > +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
> > +                                         struct vhost_vring_file *file)
> >   {
> >       trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
> >   }
> >
> > +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> > +                                       struct vhost_vring_file *file)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> > +        vhost_svq_set_svq_kick_fd(svq, file->fd);
> > +        return 0;
> > +    } else {
> > +        return vhost_vdpa_set_vring_dev_kick(dev, file);
> > +    }
> > +}
> > +
> >   static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >                                          struct vhost_vring_file *file)
> >   {
> > @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >   }
> >
> > +/**
> > + * Set shadow virtqueue descriptors to the device
> > + *
> > + * @dev   The vhost device model
> > + * @svq   The shadow virtqueue
> > + * @idx   The index of the virtqueue in the vhost device
> > + */
> > +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > +                                VhostShadowVirtqueue *svq,
> > +                                unsigned idx)
> > +{
> > +    struct vhost_vring_file file = {
> > +        .index = dev->vq_index + idx,
> > +    };
> > +    const EventNotifier *event_notifier;
> > +    int r;
> > +
> > +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);
>
>
> A question, any reason for making VhostShadowVirtqueue private? If we
> export it in .h we don't need helper to access its member like
> vhost_svq_get_dev_kick_notifier().
>

To export it it's always a possibility of course, but that direct
access will not be thread safe if we decide to move SVQ to its own
iothread for example.

I feel it will be easier to work with it this way but it might be that
I'm just used to making as much as possible private. Not like it's
needed to use the helpers in the hot paths, only in the setup and
teardown.

> Note that vhost_dev is a public structure.
>

Sure we could embed in vhost_virtqueue if we choose to do it that way,
for example.

>
> > +    file.fd = event_notifier_get_fd(event_notifier);
> > +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> > +    if (unlikely(r != 0)) {
> > +        error_report("Can't set device kick fd (%d)", -r);
> > +    }
>
>
> I wonder whether or not we can generalize the logic here and
> vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the
> vhost_ops->set_vring_kick().
>

If we call vhost_ops->set_vring_kick we are setting guest->SVQ kick
notifier, not SVQ -> vDPA device, because the
if(v->shadow_vqs_enabled). All of the modified ops callbacks are
hiding the actual device from the vhost subsystem so we need to
explicitly use the newly created _dev_ ones.

>
> > +
> > +    return r == 0;
> > +}
> > +
> >   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >   {
> >       struct vhost_vdpa *v = dev->opaque;
> > @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >
> >       if (started) {
> >           vhost_vdpa_host_notifiers_init(dev);
> > +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> > +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> > +            if (unlikely(!ok)) {
> > +                return -1;
> > +            }
> > +        }
> >           vhost_vdpa_set_vring_ready(dev);
> >       } else {
> >           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> > @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
> >       return true;
> >   }
> >
> > +/**
> > + * Adaptor function to free shadow virtqueue through gpointer
> > + *
> > + * @svq   The Shadow Virtqueue
> > + */
> > +static void vhost_psvq_free(gpointer svq)
> > +{
> > +    vhost_svq_free(svq);
> > +}
>
>
> Any reason for such indirection? Can we simply use vhost_svq_free()?
>

GCC complains about different types. I think we could do a function
type cast and it's valid for every architecture qemu supports, but the
indirection seems cleaner to me, and I would be surprised if the
compiler does not optimize it away in the cases that the casting are
valid.

../hw/virtio/vhost-vdpa.c:1186:60: error: incompatible function
pointer types passing 'void (VhostShadowVirtqueue *)' (aka 'void
(struct VhostShadowVirtqueue *)') to parameter of type
'GDestroyNotify' (aka 'void (*)(void *)')

Thanks!

> Thanks
>
>
> > +
> > +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> > +                               Error **errp)
> > +{
> > +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
> > +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
> > +                                                           vhost_psvq_free);
> > +    if (!v->shadow_vqs_enabled) {
> > +        goto out;
> > +    }
> > +
> > +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > +        VhostShadowVirtqueue *svq = vhost_svq_new();
> > +
> > +        if (unlikely(!svq)) {
> > +            error_setg(errp, "Cannot create svq %u", n);
> > +            return -1;
> > +        }
> > +        g_ptr_array_add(v->shadow_vqs, svq);
> > +    }
> > +
> > +out:
> > +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
> > +    return 0;
> > +}
> > +
> >   static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >   {
> >       struct vhost_vdpa *v;
> > @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >       dev->opaque =  opaque ;
> >       v->listener = vhost_vdpa_memory_listener;
> >       v->msg_type = VHOST_IOTLB_MSG_V2;
> > +    ret = vhost_vdpa_init_svq(dev, v, errp);
> > +    if (ret) {
> > +        goto err;
> > +    }
> >
> >       vhost_vdpa_get_iova_range(v);
> >
> > @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >                                  VIRTIO_CONFIG_S_DRIVER);
> >
> >       return 0;
> > +
> > +err:
> > +    ram_block_discard_disable(false);
> > +    return ret;
> >   }
> >
> >   const VhostOps vdpa_ops = {
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-01-29  8:05     ` Jason Wang
  (?)
@ 2022-01-31 15:34     ` Eugenio Perez Martin
  2022-02-08  3:23         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 15:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
> >   1 file changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 18de14f0fb..029f98feee 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> >       }
> >   }
> >
> > -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> > -                                       struct vhost_vring_file *file)
> > +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> > +                                         struct vhost_vring_file *file)
> >   {
> >       trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >   }
> >
> > +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> > +                                     struct vhost_vring_file *file)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> > +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> > +
> > +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>
>
> Two questions here (had similar questions for vring kick):
>
> 1) Any reason that we setup the eventfd for vhost-vdpa in
> vhost_vdpa_svq_setup() not here?
>

I'm not sure what you mean.

The guest->SVQ call and kick fds are set here and at
vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
notifier handler since we don't poll it.

On the other hand, the connection SVQ <-> device uses the same fds
from the beginning to the end, and they will not change with, for
example, call fd masking. That's why it's setup from
vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
us add way more logic there.

> 2) The call could be disabled by using -1 as the fd, I don't see any
> code to deal with that.
>

Right, I didn't take that into account. vhost-kernel takes also -1 as
kick_fd to unbind, so SVQ can be reworked to take that into account
for sure.

Thanks!

> Thanks
>
>
> > +        return 0;
> > +    } else {
> > +        return vhost_vdpa_set_vring_dev_call(dev, file);
> > +    }
> > +}
> > +
> >   /**
> >    * Set shadow virtqueue descriptors to the device
> >    *
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-01-29  8:11     ` Jason Wang
  (?)
@ 2022-01-31 15:49     ` Eugenio Perez Martin
  2022-02-01 10:57       ` Eugenio Perez Martin
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 15:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sat, Jan 29, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > This allows SVQ to negotiate features with the device. For the device,
> > SVQ is a driver. While this function needs to bypass all non-transport
> > features, it needs to disable the features that SVQ does not support
> > when forwarding buffers. This includes packed vq layout, indirect
> > descriptors or event idx.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> >   hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
> >   hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
> >   3 files changed, 67 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index c9ffa11fce..d963867a04 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -15,6 +15,8 @@
> >
> >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> > +bool vhost_svq_valid_device_features(uint64_t *features);
> > +
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 9619c8082c..51442b3dbf 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >       return &svq->hdev_kick;
> >   }
> >
> > +/**
> > + * Validate the transport device features that SVQ can use with the device
> > + *
> > + * @dev_features  The device features. If success, the acknowledged features.
> > + *
> > + * Returns true if SVQ can go with a subset of these, false otherwise.
> > + */
> > +bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > +{
> > +    bool r = true;
> > +
> > +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
> > +         ++b) {
> > +        switch (b) {
> > +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> > +        case VIRTIO_F_ANY_LAYOUT:
> > +            continue;
> > +
> > +        case VIRTIO_F_ACCESS_PLATFORM:
> > +            /* SVQ does not know how to translate addresses */
>
>
> I may miss something but any reason that we need to disable
> ACCESS_PLATFORM? I'd expect the vring helper we used for shadow
> virtqueue can deal with vIOMMU perfectly.
>

This function is validating SVQ <-> Device communications features,
that may or may not be the same as guest <-> SVQ. These feature flags
are valid for guest <-> SVQ communication, same as with indirect
descriptors one.

Having said that, there is a point in the series where
VIRTIO_F_ACCESS_PLATFORM is actually mandatory, so I think we could
use the latter addition of x-svq cmdline parameter and delay the
feature validations where it makes more sense.

>
> > +            if (*dev_features & BIT_ULL(b)) {
> > +                clear_bit(b, dev_features);
> > +                r = false;
> > +            }
> > +            break;
> > +
> > +        case VIRTIO_F_VERSION_1:
>
>
> I had the same question here.
>

For VERSION_1 it's easier to assume that guest is little endian at
some points, but we could try harder to support both endianness if
needed.

Thanks!

> Thanks
>
>
> > +            /* SVQ trust that guest vring is little endian */
> > +            if (!(*dev_features & BIT_ULL(b))) {
> > +                set_bit(b, dev_features);
> > +                r = false;
> > +            }
> > +            continue;
> > +
> > +        default:
> > +            if (*dev_features & BIT_ULL(b)) {
> > +                clear_bit(b, dev_features);
> > +            }
> > +        }
> > +    }
> > +
> > +    return r;
> > +}
> > +
> >   /* Forward guest notifications */
> >   static void vhost_handle_guest_kick(EventNotifier *n)
> >   {
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index bdb45c8808..9d801cf907 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >       size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
> >       g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
> >                                                              vhost_psvq_free);
> > +    uint64_t dev_features;
> > +    uint64_t svq_features;
> > +    int r;
> > +    bool ok;
> > +
> >       if (!v->shadow_vqs_enabled) {
> >           goto out;
> >       }
> >
> > +    r = vhost_vdpa_get_features(hdev, &dev_features);
> > +    if (r != 0) {
> > +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
> > +        return r;
> > +    }
> > +
> > +    svq_features = dev_features;
> > +    ok = vhost_svq_valid_device_features(&svq_features);
> > +    if (unlikely(!ok)) {
> > +        error_setg(errp,
> > +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
> > +            hdev->features, svq_features);
> > +        return -1;
> > +    }
> > +
> > +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
> >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> >           VhostShadowVirtqueue *svq = vhost_svq_new();
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 15/31] vdpa: Add vhost_svq_get_num
  2022-01-29  8:14     ` Jason Wang
  (?)
@ 2022-01-31 16:36     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 16:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sat, Jan 29, 2022 at 9:15 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > This reports the guest's visible SVQ effective length, not the device's
> > one.
>
>
> I think we need to explain if there could be a case that the SVQ size is
> not equal to the device queue size.
>

The description is actually misleading now that I re-read it. It
reports the size that the guest negotiated with SVQ for the guest's
vring, not the one that SVQ negotiates with the device for SVQ's
vring. I'll reword for the next version so thanks for pointing it out.

Regarding your comment, the only case it can happen is if SVQ cannot
get device's num, something that we could make an error as you point
out later in the series.

Thanks!

> Thanks
>
>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h | 1 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 5 +++++
> >   2 files changed, 6 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 3521e8094d..035207a469 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -29,6 +29,7 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
> >                                                 const VhostShadowVirtqueue *svq);
> >   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> >                                 struct vhost_vring_addr *addr);
> > +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
> >   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> >   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 0f2c2403ff..f129ec8395 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -212,6 +212,11 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> >       addr->used_user_addr = (uint64_t)svq->vring.used;
> >   }
> >
> > +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq)
> > +{
> > +    return svq->vring.num;
> > +}
> > +
> >   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
> >   {
> >       size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
  2022-01-29  8:20     ` Jason Wang
  (?)
@ 2022-01-31 17:44     ` Eugenio Perez Martin
  2022-02-08  6:58         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 17:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sat, Jan 29, 2022 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Doing that way allows vhost backend to know what address to return.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 7b03efccec..64b955ba0c 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
> >                                       struct vhost_virtqueue *vq,
> >                                       unsigned idx, bool enable_log)
> >   {
> > -    struct vhost_vring_addr addr;
> > +    struct vhost_vring_addr addr = {
> > +        .index = idx,
> > +    };
> >       int r;
> > -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
> >
> >       if (dev->vhost_ops->vhost_vq_get_addr) {
> >           r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
> > @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
> >           addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
> >           addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
> >       }
>
>
> I'm a bit lost in the logic above, any reason we need call
> vhost_vq_get_addr() :) ?
>

It's the way vhost_virtqueue_set_addr works if the backend has a
vhost_vq_get_addr operation (currently, only vhost-vdpa). vhost first
ask the address to the back end and then set it.

Previously, index was not needed because all the information was in
vhost_virtqueue. However to extract queue index from vhost_virtqueue
is tricky, so I think it's easier to simply have that information at
request, something similar to get_base or get_num when asking vdpa
device. We can extract the index from vq - dev->vqs or something
similar if it's prefered.

Thanks!

> Thanks
>
>
> > -    addr.index = idx;
> >       addr.log_guest_addr = vq->used_phys;
> >       addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
> >       r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-01-30  4:03     ` Jason Wang
  (?)
@ 2022-01-31 18:58     ` Eugenio Perez Martin
  2022-02-08  3:57         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 18:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > First half of the buffers forwarding part, preparing vhost-vdpa
> > callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > this is effectively dead code at the moment, but it helps to reduce
> > patch size.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> >   hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> >   hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> >   3 files changed, 143 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 035207a469..39aef5ffdf 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> > -VhostShadowVirtqueue *vhost_svq_new(void);
> > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> >
> >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index f129ec8395..7c168075d7 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >   /**
> >    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >    * methods and file descriptors.
> > + *
> > + * @qsize Shadow VirtQueue size
> > + *
> > + * Returns the new virtqueue or NULL.
> > + *
> > + * In case of error, reason is reported through error_report.
> >    */
> > -VhostShadowVirtqueue *vhost_svq_new(void)
> > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >   {
> > +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > +    size_t device_size, driver_size;
> >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> >       int r;
> >
> > @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> >       /* Placeholder descriptor, it should be deleted at set_kick_fd */
> >       event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> >
> > +    svq->vring.num = qsize;
>
>
> I wonder if this is the best. E.g some hardware can support up to 32K
> queue size. So this will probably end up with:
>
> 1) SVQ use 32K queue size
> 2) hardware queue uses 256
>

In that case SVQ vring queue size will be 32K and guest's vring can
negotiate any number with SVQ equal or less than 32K, including 256.
Is that what you mean?

If with hardware queues you mean guest's vring, not sure why it is
"probably 256". I'd say that in that case with the virtio-net kernel
driver the ring size will be the same as the device export, for
example, isn't it?

The implementation should support any combination of sizes, but the
ring size exposed to the guest is never bigger than hardware one.

> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> to add event index support?
>

I think we should not have any problem with event idx. If you mean
that the guest could mark more buffers available than SVQ vring's
size, that should not happen because there must be less entries in the
guest than SVQ.

But if I understood you correctly, a similar situation could happen if
a guest's contiguous buffer is scattered across many qemu's VA chunks.
Even if that would happen, the situation should be ok too: SVQ knows
the guest's avail idx and, if SVQ is full, it will continue forwarding
avail buffers when the device uses more buffers.

Does that make sense to you?

>
> > +    driver_size = vhost_svq_driver_area_size(svq);
> > +    device_size = vhost_svq_device_area_size(svq);
> > +    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
> > +    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> > +    memset(svq->vring.desc, 0, driver_size);
> > +    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > +    memset(svq->vring.used, 0, device_size);
> > +
> >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >       return g_steal_pointer(&svq);
> >
> > @@ -318,5 +335,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> >       event_notifier_cleanup(&vq->hdev_kick);
> >       event_notifier_set_handler(&vq->hdev_call, NULL);
> >       event_notifier_cleanup(&vq->hdev_call);
> > +    qemu_vfree(vq->vring.desc);
> > +    qemu_vfree(vq->vring.used);
> >       g_free(vq);
> >   }
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 9d801cf907..53e14bafa0 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -641,20 +641,52 @@ static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
> >   }
> >
> > -static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> > -                                      struct vhost_vring_state *ring)
> > +static int vhost_vdpa_set_dev_vring_num(struct vhost_dev *dev,
> > +                                        struct vhost_vring_state *ring)
> >   {
> >       trace_vhost_vdpa_set_vring_num(dev, ring->index, ring->num);
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
> >   }
> >
> > -static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> > -                                       struct vhost_vring_state *ring)
> > +static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> > +                                    struct vhost_vring_state *ring)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        /*
> > +         * Vring num was set at device start. SVQ num is handled by VirtQueue
> > +         * code
> > +         */
> > +        return 0;
> > +    }
> > +
> > +    return vhost_vdpa_set_dev_vring_num(dev, ring);
> > +}
> > +
> > +static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
> > +                                         struct vhost_vring_state *ring)
> >   {
> >       trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
> >       return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
> >   }
> >
> > +static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> > +                                     struct vhost_vring_state *ring)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        /*
> > +         * Vring base was set at device start. SVQ base is handled by VirtQueue
> > +         * code
> > +         */
> > +        return 0;
> > +    }
> > +
> > +    return vhost_vdpa_set_dev_vring_base(dev, ring);
> > +}
> > +
> >   static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >                                          struct vhost_vring_state *ring)
> >   {
> > @@ -784,8 +816,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >       }
> >   }
> >
> > -static int vhost_vdpa_get_features(struct vhost_dev *dev,
> > -                                     uint64_t *features)
> > +static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
> > +                                       uint64_t *features)
> >   {
> >       int ret;
> >
> > @@ -794,15 +826,64 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
> >       return ret;
> >   }
> >
> > +static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    int ret = vhost_vdpa_get_dev_features(dev, features);
> > +
> > +    if (ret == 0 && v->shadow_vqs_enabled) {
> > +        /* Filter only features that SVQ can offer to guest */
> > +        vhost_svq_valid_guest_features(features);
> > +    }
>
>
> Sorry if I've asked before, I think it's sufficient to filter out the
> device features that we don't support during and fail the vhost
> initialization. Any reason we need do it again here?
>

On the contrary, if something needs to be asked that means that is not
clear enough :).

At initialization we validate that the device offers all the needed
features (ACCESS_PLATFORM, VERSION_1). We don't have the features
acknowledged by the guest at that point.

This is checking the written features by the guest. For example, we
accept _F_INDIRECT here, so the guest can write indirect descriptors
to SVQ, since qemu VirtQueue code is handling it for us. I've stayed
on the safe side and I've not included packed or event_idx, but I
might try to run with them. These would not have been accepted for the
device.

>
> > +
> > +    return ret;
> > +}
> > +
> >   static int vhost_vdpa_set_features(struct vhost_dev *dev,
> >                                      uint64_t features)
> >   {
> > +    struct vhost_vdpa *v = dev->opaque;
> >       int ret;
> >
> >       if (vhost_vdpa_one_time_request(dev)) {
> >           return 0;
> >       }
> >
> > +    if (v->shadow_vqs_enabled) {
> > +        uint64_t dev_features, svq_features, acked_features;
> > +        bool ok;
> > +
> > +        ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > +        if (ret != 0) {
> > +            error_report("Can't get vdpa device features, got (%d)", ret);
> > +            return ret;
> > +        }
> > +
> > +        svq_features = dev_features;
> > +        ok = vhost_svq_valid_device_features(&svq_features);
> > +        if (unlikely(!ok)) {
> > +            error_report("SVQ Invalid device feature flags, offer: 0x%"
> > +                         PRIx64", ok: 0x%"PRIx64, dev->features, svq_features);
> > +            return -1;
> > +        }
> > +
> > +        ok = vhost_svq_valid_guest_features(&features);
> > +        if (unlikely(!ok)) {
> > +            error_report(
> > +                "Invalid guest acked feature flag, acked: 0x%"
> > +                PRIx64", ok: 0x%"PRIx64, dev->acked_features, features);
> > +            return -1;
> > +        }
> > +
> > +        ok = vhost_svq_ack_guest_features(svq_features, features,
> > +                                          &acked_features);
> > +        if (unlikely(!ok)) {
> > +            return -1;
> > +        }
> > +
> > +        features = acked_features;
> > +    }
> > +
> >       trace_vhost_vdpa_set_features(dev, features);
> >       ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> >       if (ret) {
> > @@ -822,13 +903,31 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
> >       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
> >   }
> >
> > -static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> > -                    struct vhost_vring_addr *addr, struct vhost_virtqueue *vq)
> > +static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
> > +                                         struct vhost_virtqueue *vq)
> >   {
> > -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> >       addr->desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
> >       addr->avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
> >       addr->used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
> > +}
> > +
> > +static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> > +                                  struct vhost_vring_addr *addr,
> > +                                  struct vhost_virtqueue *vq)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +
> > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        int idx = vhost_vdpa_get_vq_index(dev, addr->index);
> > +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> > +
> > +        vhost_svq_get_vring_addr(svq, addr);
> > +    } else {
> > +        vhost_vdpa_vq_get_guest_addr(addr, vq);
> > +    }
> > +
> >       trace_vhost_vdpa_vq_get_addr(dev, vq, addr->desc_user_addr,
> >                                    addr->avail_user_addr, addr->used_user_addr);
> >       return 0;
> > @@ -849,6 +948,12 @@ static void vhost_psvq_free(gpointer svq)
> >       vhost_svq_free(svq);
> >   }
> >
> > +static int vhost_vdpa_get_max_queue_size(struct vhost_dev *dev,
> > +                                         uint16_t *qsize)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_NUM, qsize);
> > +}
> > +
> >   static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >                                  Error **errp)
> >   {
> > @@ -857,6 +962,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >                                                              vhost_psvq_free);
> >       uint64_t dev_features;
> >       uint64_t svq_features;
> > +    uint16_t qsize;
> >       int r;
> >       bool ok;
> >
> > @@ -864,7 +970,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >           goto out;
> >       }
> >
> > -    r = vhost_vdpa_get_features(hdev, &dev_features);
> > +    r = vhost_vdpa_get_dev_features(hdev, &dev_features);
> >       if (r != 0) {
> >           error_setg(errp, "Can't get vdpa device features, got (%d)", r);
> >           return r;
> > @@ -879,9 +985,14 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >           return -1;
> >       }
> >
> > +    r = vhost_vdpa_get_max_queue_size(hdev, &qsize);
> > +    if (unlikely(r)) {
> > +        qsize = 256;
> > +    }
>
>
> Should we fail instead of having a "default" value here?
>

Maybe it is better to fail here, yes. I guess there is no safe default value.

Thanks!

> Thanks
>
>
> > +
> >       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
> >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > -        VhostShadowVirtqueue *svq = vhost_svq_new();
> > +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
> >
> >           if (unlikely(!svq)) {
> >               error_setg(errp, "Cannot create svq %u", n);
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
  2022-01-30  5:57     ` Jason Wang
  (?)
@ 2022-01-31 19:11     ` Eugenio Perez Martin
  2022-02-08  8:19         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-01-31 19:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 6:58 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Use translations added in VhostIOVATree in SVQ.
> >
> > Only introduce usage here, not allocation and deallocation. As with
> > previous patches, we use the dead code paths of shadow_vqs_enabled to
> > avoid commiting too many changes at once. These are impossible to take
> > at the moment.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |   3 +-
> >   include/hw/virtio/vhost-vdpa.h     |   3 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 111 ++++++++++++++++----
> >   hw/virtio/vhost-vdpa.c             | 161 +++++++++++++++++++++++++----
> >   4 files changed, 238 insertions(+), 40 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 19c934af49..c6f67d6f76 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -12,6 +12,7 @@
> >
> >   #include "hw/virtio/vhost.h"
> >   #include "qemu/event_notifier.h"
> > +#include "hw/virtio/vhost-iova-tree.h"
> >
> >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >
> > @@ -37,7 +38,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> >                        VirtQueue *vq);
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> > -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_map);
> >
> >   void vhost_svq_free(VhostShadowVirtqueue *vq);
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 009a9f3b6b..cd2388b3be 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -14,6 +14,7 @@
> >
> >   #include <gmodule.h>
> >
> > +#include "hw/virtio/vhost-iova-tree.h"
> >   #include "hw/virtio/virtio.h"
> >   #include "standard-headers/linux/vhost_types.h"
> >
> > @@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
> >       MemoryListener listener;
> >       struct vhost_vdpa_iova_range iova_range;
> >       bool shadow_vqs_enabled;
> > +    /* IOVA mapping used by Shadow Virtqueue */
> > +    VhostIOVATree *iova_tree;
> >       GPtrArray *shadow_vqs;
> >       struct vhost_dev *dev;
> >       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index a1a404f68f..c7888eb8cf 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -11,6 +11,7 @@
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >   #include "hw/virtio/vhost.h"
> >   #include "hw/virtio/virtio-access.h"
> > +#include "hw/virtio/vhost-iova-tree.h"
> >   #include "standard-headers/linux/vhost_types.h"
> >
> >   #include "qemu/error-report.h"
> > @@ -45,6 +46,9 @@ typedef struct VhostShadowVirtqueue {
> >       /* Virtio device */
> >       VirtIODevice *vdev;
> >
> > +    /* IOVA mapping */
> > +    VhostIOVATree *iova_tree;
> > +
> >       /* Map for returning guest's descriptors */
> >       VirtQueueElement **ring_id_maps;
> >
> > @@ -97,13 +101,7 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> >               continue;
> >
> >           case VIRTIO_F_ACCESS_PLATFORM:
> > -            /* SVQ does not know how to translate addresses */
> > -            if (*dev_features & BIT_ULL(b)) {
> > -                clear_bit(b, dev_features);
> > -                r = false;
> > -            }
> > -            break;
> > -
> > +            /* SVQ trust in host's IOMMU to translate addresses */
> >           case VIRTIO_F_VERSION_1:
> >               /* SVQ trust that guest vring is little endian */
> >               if (!(*dev_features & BIT_ULL(b))) {
> > @@ -205,7 +203,55 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> >       }
> >   }
> >
> > +/**
> > + * Translate addresses between qemu's virtual address and SVQ IOVA
> > + *
> > + * @svq    Shadow VirtQueue
> > + * @vaddr  Translated IOVA addresses
> > + * @iovec  Source qemu's VA addresses
> > + * @num    Length of iovec and minimum length of vaddr
> > + */
> > +static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > +                                     void **addrs, const struct iovec *iovec,
> > +                                     size_t num)
> > +{
> > +    size_t i;
> > +
> > +    if (num == 0) {
> > +        return true;
> > +    }
> > +
> > +    for (i = 0; i < num; ++i) {
> > +        DMAMap needle = {
> > +            .translated_addr = (hwaddr)iovec[i].iov_base,
> > +            .size = iovec[i].iov_len,
> > +        };
> > +        size_t off;
> > +
> > +        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
> > +        /*
> > +         * Map cannot be NULL since iova map contains all guest space and
> > +         * qemu already has a physical address mapped
> > +         */
> > +        if (unlikely(!map)) {
> > +            error_report("Invalid address 0x%"HWADDR_PRIx" given by guest",
> > +                         needle.translated_addr);
>
>
> This can be triggered by guest, we need use once or log_guest_error() etc.
>

Ok I see the issue, I will change.

>
> > +            return false;
> > +        }
> > +
> > +        /*
> > +         * Map->iova chunk size is ignored. What to do if descriptor
> > +         * (addr, size) does not fit is delegated to the device.
> > +         */
>
>
> I think we need at least check the size and fail if the size doesn't
> match here. Or is it possible that we have a buffer that may cross two
> memory regions?
>

It should be impossible, since both iova_tree and VirtQueue should be
in sync regarding the memory regions updates. If a VirtQueue buffer
crosses many memory regions, iovec has more entries.

I can add a return false, but I'm not able to trigger that situation
even with a malformed driver.

>
> > +        off = needle.translated_addr - map->translated_addr;
> > +        addrs[i] = (void *)(map->iova + off);
> > +    }
> > +
> > +    return true;
> > +}
> > +
> >   static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > +                                    void * const *vaddr_sg,
> >                                       const struct iovec *iovec,
> >                                       size_t num, bool more_descs, bool write)
> >   {
> > @@ -224,7 +270,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> >           } else {
> >               descs[i].flags = flags;
> >           }
> > -        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > +        descs[i].addr = cpu_to_le64((hwaddr)vaddr_sg[n]);
> >           descs[i].len = cpu_to_le32(iovec[n].iov_len);
> >
> >           last = i;
> > @@ -234,42 +280,60 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> >       svq->free_head = le16_to_cpu(descs[last].next);
> >   }
> >
> > -static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > -                                    VirtQueueElement *elem)
> > +static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > +                                VirtQueueElement *elem,
> > +                                unsigned *head)
>
>
> I'd suggest to make it returns bool since the patch that introduces this
> function.
>

Ok I will do it from the start.

>
> >   {
> > -    int head;
> >       unsigned avail_idx;
> >       vring_avail_t *avail = svq->vring.avail;
> > +    bool ok;
> > +    g_autofree void **sgs = g_new(void *, MAX(elem->out_num, elem->in_num));
> >
> > -    head = svq->free_head;
> > +    *head = svq->free_head;
> >
> >       /* We need some descriptors here */
> >       assert(elem->out_num || elem->in_num);
> >
> > -    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > +    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> >                               elem->in_num > 0, false);
> > -    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > +
> > +
> > +    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
> >
> >       /*
> >        * Put entry in available array (but don't update avail->idx until they
> >        * do sync).
> >        */
> >       avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > -    avail->ring[avail_idx] = cpu_to_le16(head);
> > +    avail->ring[avail_idx] = cpu_to_le16(*head);
> >       svq->avail_idx_shadow++;
> >
> >       /* Update avail index after the descriptor is wrote */
> >       smp_wmb();
> >       avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> >
> > -    return head;
> > +    return true;
> >   }
> >
> > -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > +static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> >   {
> > -    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > +    unsigned qemu_head;
> > +    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> >
> >       svq->ring_id_maps[qemu_head] = elem;
> > +    return true;
> >   }
> >
> >   static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> > @@ -309,6 +373,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >
> >           while (true) {
> >               VirtQueueElement *elem;
> > +            bool ok;
> >
> >               if (svq->next_guest_avail_elem) {
> >                   elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > @@ -337,7 +402,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >                   return;
> >               }
> >
> > -            vhost_svq_add(svq, elem);
> > +            ok = vhost_svq_add(svq, elem);
> > +            if (unlikely(!ok)) {
> > +                /* VQ is broken, just return and ignore any other kicks */
> > +                return;
> > +            }
> >               vhost_svq_kick(svq);
> >           }
> >
> > @@ -619,12 +688,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >    * methods and file descriptors.
> >    *
> >    * @qsize Shadow VirtQueue size
> > + * @iova_tree Tree to perform descriptors translations
> >    *
> >    * Returns the new virtqueue or NULL.
> >    *
> >    * In case of error, reason is reported through error_report.
> >    */
> > -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_tree)
> >   {
> >       size_t desc_size = sizeof(vring_desc_t) * qsize;
> >       size_t device_size, driver_size;
> > @@ -656,6 +726,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >       memset(svq->vring.desc, 0, driver_size);
> >       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> >       memset(svq->vring.used, 0, device_size);
> > +    svq->iova_tree = iova_tree;
> >       svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
> >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >       return g_steal_pointer(&svq);
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 0e5c00ed7e..276a559649 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -209,6 +209,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> >                                            vaddr, section->readonly);
> >
> >       llsize = int128_sub(llend, int128_make64(iova));
> > +    if (v->shadow_vqs_enabled) {
> > +        DMAMap mem_region = {
> > +            .translated_addr = (hwaddr)vaddr,
> > +            .size = int128_get64(llsize) - 1,
> > +            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> > +        };
> > +
> > +        int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
> > +        assert(r == IOVA_OK);
>
>
> It's better to fail or warn here.
>

Sure, a warning is possible.

>
> > +
> > +        iova = mem_region.iova;
> > +    }
> >
> >       vhost_vdpa_iotlb_batch_begin_once(v);
> >       ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > @@ -261,6 +273,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> >
> >       llsize = int128_sub(llend, int128_make64(iova));
> >
> > +    if (v->shadow_vqs_enabled) {
> > +        const DMAMap *result;
> > +        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
> > +            section->offset_within_region +
> > +            (iova - section->offset_within_address_space);
> > +        DMAMap mem_region = {
> > +            .translated_addr = (hwaddr)vaddr,
> > +            .size = int128_get64(llsize) - 1,
> > +        };
> > +
> > +        result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
> > +        iova = result->iova;
> > +        vhost_iova_tree_remove(v->iova_tree, &mem_region);
> > +    }
> >       vhost_vdpa_iotlb_batch_begin_once(v);
> >       ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> >       if (ret) {
> > @@ -783,33 +809,70 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> >   /**
> >    * Unmap SVQ area in the device
> >    */
> > -static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> > -                                      hwaddr size)
> > +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> > +                                      const DMAMap *needle)
> >   {
> > +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> > +    hwaddr size;
> >       int r;
> >
> > -    size = ROUND_UP(size, qemu_real_host_page_size);
> > -    r = vhost_vdpa_dma_unmap(v, iova, size);
> > +    if (unlikely(!result)) {
> > +        error_report("Unable to find SVQ address to unmap");
> > +        return false;
> > +    }
> > +
> > +    size = ROUND_UP(result->size, qemu_real_host_page_size);
> > +    r = vhost_vdpa_dma_unmap(v, result->iova, size);
> >       return r == 0;
> >   }
> >
> >   static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> >                                          const VhostShadowVirtqueue *svq)
> >   {
> > +    DMAMap needle;
> >       struct vhost_vdpa *v = dev->opaque;
> >       struct vhost_vring_addr svq_addr;
> > -    size_t device_size = vhost_svq_device_area_size(svq);
> > -    size_t driver_size = vhost_svq_driver_area_size(svq);
> >       bool ok;
> >
> >       vhost_svq_get_vring_addr(svq, &svq_addr);
> >
> > -    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> > +    needle = (DMAMap) {
> > +        .translated_addr = svq_addr.desc_user_addr,
> > +    };
> > +    ok = vhost_vdpa_svq_unmap_ring(v, &needle);
> >       if (unlikely(!ok)) {
> >           return false;
> >       }
> >
> > -    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> > +    needle = (DMAMap) {
> > +        .translated_addr = svq_addr.used_user_addr,
> > +    };
> > +    return vhost_vdpa_svq_unmap_ring(v, &needle);
> > +}
> > +
> > +/**
> > + * Map SVQ area in the device
> > + *
> > + * @v          Vhost-vdpa device
> > + * @needle     The area to search iova
> > + * @readonly   Permissions of the area
> > + */
> > +static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, const DMAMap *needle,
> > +                                    bool readonly)
> > +{
> > +    hwaddr off;
> > +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> > +    int r;
> > +
> > +    if (unlikely(!result)) {
> > +        error_report("Can't locate SVQ ring");
> > +        return false;
> > +    }
> > +
> > +    off = needle->translated_addr - result->translated_addr;
> > +    r = vhost_vdpa_dma_map(v, result->iova + off, needle->size,
> > +                           (void *)needle->translated_addr, readonly);
> > +    return r == 0;
> >   }
> >
> >   /**
> > @@ -821,23 +884,29 @@ static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> >   static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> >                                        const VhostShadowVirtqueue *svq)
> >   {
> > +    DMAMap needle;
> >       struct vhost_vdpa *v = dev->opaque;
> >       struct vhost_vring_addr svq_addr;
> >       size_t device_size = vhost_svq_device_area_size(svq);
> >       size_t driver_size = vhost_svq_driver_area_size(svq);
> > -    int r;
> > +    bool ok;
> >
> >       vhost_svq_get_vring_addr(svq, &svq_addr);
> >
> > -    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> > -                           (void *)svq_addr.desc_user_addr, true);
> > -    if (unlikely(r != 0)) {
> > +    needle = (DMAMap) {
> > +        .translated_addr = svq_addr.desc_user_addr,
> > +        .size = driver_size,
> > +    };
> > +    ok = vhost_vdpa_svq_map_ring(v, &needle, true);
> > +    if (unlikely(!ok)) {
> >           return false;
> >       }
> >
> > -    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> > -                           (void *)svq_addr.used_user_addr, false);
> > -    return r == 0;
> > +    needle = (DMAMap) {
> > +        .translated_addr = svq_addr.used_user_addr,
> > +        .size = device_size,
> > +    };
> > +    return vhost_vdpa_svq_map_ring(v, &needle, false);
> >   }
> >
> >   static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > @@ -1006,6 +1075,23 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
> >       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
> >   }
> >
> > +static bool vhost_vdpa_svq_get_vq_region(struct vhost_vdpa *v,
> > +                                         unsigned long long addr,
> > +                                         uint64_t *iova_addr)
> > +{
> > +    const DMAMap needle = {
> > +        .translated_addr = addr,
> > +    };
> > +    const DMAMap *translation = vhost_iova_tree_find_iova(v->iova_tree,
> > +                                                          &needle);
> > +    if (!translation) {
> > +        return false;
> > +    }
> > +
> > +    *iova_addr = translation->iova + (addr - translation->translated_addr);
> > +    return true;
> > +}
> > +
> >   static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
> >                                            struct vhost_virtqueue *vq)
> >   {
> > @@ -1023,10 +1109,23 @@ static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> >       assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> >
> >       if (v->shadow_vqs_enabled) {
> > +        struct vhost_vring_addr svq_addr;
> >           int idx = vhost_vdpa_get_vq_index(dev, addr->index);
> >           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> >
> > -        vhost_svq_get_vring_addr(svq, addr);
> > +        vhost_svq_get_vring_addr(svq, &svq_addr);
> > +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.desc_user_addr,
> > +                                          &addr->desc_user_addr)) {
> > +            return -1;
> > +        }
> > +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.avail_user_addr,
> > +                                          &addr->avail_user_addr)) {
> > +            return -1;
> > +        }
> > +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.used_user_addr,
> > +                                          &addr->used_user_addr)) {
> > +            return -1;
> > +        }
> >       } else {
> >           vhost_vdpa_vq_get_guest_addr(addr, vq);
> >       }
> > @@ -1095,13 +1194,37 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >
> >       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
> >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > -        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
> > -
> > +        DMAMap device_region, driver_region;
> > +        struct vhost_vring_addr addr;
> > +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize, v->iova_tree);
> >           if (unlikely(!svq)) {
> >               error_setg(errp, "Cannot create svq %u", n);
> >               return -1;
> >           }
> > -        g_ptr_array_add(v->shadow_vqs, svq);
> > +
> > +        vhost_svq_get_vring_addr(svq, &addr);
> > +        driver_region = (DMAMap) {
> > +            .translated_addr = (hwaddr)addr.desc_user_addr,
> > +
> > +            /*
> > +             * DMAMAp.size include the last byte included in the range, while
> > +             * sizeof marks one past it. Substract one byte to make them match.
> > +             */
> > +            .size = vhost_svq_driver_area_size(svq) - 1,
> > +            .perm = VHOST_ACCESS_RO,
> > +        };
> > +        device_region = (DMAMap) {
> > +            .translated_addr = (hwaddr)addr.used_user_addr,
> > +            .size = vhost_svq_device_area_size(svq) - 1,
> > +            .perm = VHOST_ACCESS_RW,
> > +        };
> > +
> > +        r = vhost_iova_tree_map_alloc(v->iova_tree, &driver_region);
> > +        assert(r == IOVA_OK);
>
>
> Let's fail instead of assert here.
>

Sure, I see how we must not assert here too.

Thanks!

> Thanks
>
>
> > +        r = vhost_iova_tree_map_alloc(v->iova_tree, &device_region);
> > +        assert(r == IOVA_OK);
> > +
> > +        g_ptr_array_add(shadow_vqs, svq);
> >       }
> >
> >   out:
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-01-31 15:49     ` Eugenio Perez Martin
@ 2022-02-01 10:57       ` Eugenio Perez Martin
  2022-02-08  3:37           ` Jason Wang
  0 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 10:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Mon, Jan 31, 2022 at 4:49 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Sat, Jan 29, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > This allows SVQ to negotiate features with the device. For the device,
> > > SVQ is a driver. While this function needs to bypass all non-transport
> > > features, it needs to disable the features that SVQ does not support
> > > when forwarding buffers. This includes packed vq layout, indirect
> > > descriptors or event idx.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> > >   hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
> > >   hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
> > >   3 files changed, 67 insertions(+)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > index c9ffa11fce..d963867a04 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -15,6 +15,8 @@
> > >
> > >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > >
> > > +bool vhost_svq_valid_device_features(uint64_t *features);
> > > +
> > >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > >   void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 9619c8082c..51442b3dbf 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > >       return &svq->hdev_kick;
> > >   }
> > >
> > > +/**
> > > + * Validate the transport device features that SVQ can use with the device
> > > + *
> > > + * @dev_features  The device features. If success, the acknowledged features.
> > > + *
> > > + * Returns true if SVQ can go with a subset of these, false otherwise.
> > > + */
> > > +bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > +{
> > > +    bool r = true;
> > > +
> > > +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
> > > +         ++b) {
> > > +        switch (b) {
> > > +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> > > +        case VIRTIO_F_ANY_LAYOUT:
> > > +            continue;
> > > +
> > > +        case VIRTIO_F_ACCESS_PLATFORM:
> > > +            /* SVQ does not know how to translate addresses */
> >
> >
> > I may miss something but any reason that we need to disable
> > ACCESS_PLATFORM? I'd expect the vring helper we used for shadow
> > virtqueue can deal with vIOMMU perfectly.
> >
>
> This function is validating SVQ <-> Device communications features,
> that may or may not be the same as guest <-> SVQ. These feature flags
> are valid for guest <-> SVQ communication, same as with indirect
> descriptors one.
>
> Having said that, there is a point in the series where
> VIRTIO_F_ACCESS_PLATFORM is actually mandatory, so I think we could
> use the latter addition of x-svq cmdline parameter and delay the
> feature validations where it makes more sense.
>
> >
> > > +            if (*dev_features & BIT_ULL(b)) {
> > > +                clear_bit(b, dev_features);
> > > +                r = false;
> > > +            }
> > > +            break;
> > > +
> > > +        case VIRTIO_F_VERSION_1:
> >
> >
> > I had the same question here.
> >
>
> For VERSION_1 it's easier to assume that guest is little endian at
> some points, but we could try harder to support both endianness if
> needed.
>

Re-thinking the SVQ feature isolation stuff for this first iteration
based on your comments.

Maybe it's easier to simply fail if the device does not *match* the
expected feature set, and add all of the "feature isolation" later.
While a lot of guest <-> SVQ communication details are already solved
for free with qemu's VirtQueue (indirect, packed, ...), we may
simplify this series in particular and add the support for it later.

For example, at this moment would be valid for the device to export
indirect descriptors feature flag, and SVQ simply forward that feature
flag offering to the guest. So the guest <-> SVQ communication could
have indirect descriptors (qemu's VirtQueue code handles it for free),
but SVQ would not acknowledge it for the device. As a side note, to
negotiate it would have been harmless actually, but it's not the case
of packed vq.

So maybe for the v2 we can simply force the device to just export the
strictly needed features and nothing else with qemu cmdline, and then
enable the feature negotiation isolation for each side of SVQ?

Thanks!


> Thanks!
>
> > Thanks
> >
> >
> > > +            /* SVQ trust that guest vring is little endian */
> > > +            if (!(*dev_features & BIT_ULL(b))) {
> > > +                set_bit(b, dev_features);
> > > +                r = false;
> > > +            }
> > > +            continue;
> > > +
> > > +        default:
> > > +            if (*dev_features & BIT_ULL(b)) {
> > > +                clear_bit(b, dev_features);
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +    return r;
> > > +}
> > > +
> > >   /* Forward guest notifications */
> > >   static void vhost_handle_guest_kick(EventNotifier *n)
> > >   {
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index bdb45c8808..9d801cf907 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> > >       size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
> > >       g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
> > >                                                              vhost_psvq_free);
> > > +    uint64_t dev_features;
> > > +    uint64_t svq_features;
> > > +    int r;
> > > +    bool ok;
> > > +
> > >       if (!v->shadow_vqs_enabled) {
> > >           goto out;
> > >       }
> > >
> > > +    r = vhost_vdpa_get_features(hdev, &dev_features);
> > > +    if (r != 0) {
> > > +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
> > > +        return r;
> > > +    }
> > > +
> > > +    svq_features = dev_features;
> > > +    ok = vhost_svq_valid_device_features(&svq_features);
> > > +    if (unlikely(!ok)) {
> > > +        error_setg(errp,
> > > +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
> > > +            hdev->features, svq_features);
> > > +        return -1;
> > > +    }
> > > +
> > > +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
> > >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > >           VhostShadowVirtqueue *svq = vhost_svq_new();
> > >
> >



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-01-30  6:46     ` Jason Wang
  (?)
@ 2022-02-01 11:25     ` Eugenio Perez Martin
  2022-02-08  8:15         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 11:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >   {
> >       event_notifier_set_handler(&svq->svq_kick, NULL);
> > +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > +
> > +    if (!svq->vq) {
> > +        return;
> > +    }
> > +
> > +    /* Send all pending used descriptors to guest */
> > +    vhost_svq_flush(svq, false);
>
>
> Do we need to wait for all the pending descriptors to be completed here?
>

No, this function does not wait, it only completes the forwarding of
the *used* descriptors.

The best example is the net rx queue in my opinion. This call will
check SVQ's vring used_idx and will forward the last used descriptors
if any, but all available descriptors will remain as available for
qemu's VQ code.

To skip it would miss those last rx descriptors in migration.

Thanks!

> Thanks
>
>
> > +
> > +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > +        g_autofree VirtQueueElement *elem = NULL;
> > +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > +        if (elem) {
> > +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > +        }
> > +    }
> > +
> > +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > +    if (next_avail_elem) {
> > +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > +                                 next_avail_elem->len);
> > +    }
> >   }
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-01-30  6:50     ` Jason Wang
  (?)
@ 2022-02-01 11:45     ` Eugenio Perez Martin
  2022-02-08  8:25         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 11:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > SVQ is able to log the dirty bits by itself, so let's use it to not
> > block migration.
> >
> > Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > enabled. Even if the device supports it, the reports would be nonsense
> > because SVQ memory is in the qemu region.
> >
> > The log region is still allocated. Future changes might skip that, but
> > this series is already long enough.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> >   1 file changed, 20 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index fb0a338baa..75090d65e8 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> >       if (ret == 0 && v->shadow_vqs_enabled) {
> >           /* Filter only features that SVQ can offer to guest */
> >           vhost_svq_valid_guest_features(features);
> > +
> > +        /* Add SVQ logging capabilities */
> > +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> >       }
> >
> >       return ret;
> > @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> >
> >       if (v->shadow_vqs_enabled) {
> >           uint64_t dev_features, svq_features, acked_features;
> > +        uint8_t status = 0;
> >           bool ok;
> >
> > +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > +        if (unlikely(ret)) {
> > +            return ret;
> > +        }
> > +
> > +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +            /*
> > +             * vhost is trying to enable or disable _F_LOG, and the device
> > +             * would report wrong dirty pages. SVQ handles it.
> > +             */
>
>
> I fail to understand this comment, I'd think there's no way to disable
> dirty page tracking for SVQ.
>

vhost_log_global_{start,stop} are called at the beginning and end of
migration. To inform the device that it should start logging, they set
or clean VHOST_F_LOG_ALL at vhost_dev_set_log.

While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
vhost does not block migration. Maybe we need to look for another way
to do this?

Thanks!

> Thanks
>
>
> > +            return 0;
> > +        }
> > +
> > +        /* We must not ack _F_LOG if SVQ is enabled */
> > +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > +
> >           ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> >           if (ret != 0) {
> >               error_report("Can't get vdpa device features, got (%d)", ret);
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-01-30  4:42     ` Jason Wang
  (?)
@ 2022-02-01 17:08     ` Eugenio Perez Martin
  2022-02-08  8:11         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 17:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Initial version of shadow virtqueue that actually forward buffers. There
> > is no iommu support at the moment, and that will be addressed in future
> > patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> > this means that SVQ is not usable at this point of the series on any
> > device.
> >
> > For simplicity it only supports modern devices, that expects vring
> > in little endian, with split ring and no event idx or indirect
> > descriptors. Support for them will not be added in this series.
> >
> > It reuses the VirtQueue code for the device part. The driver part is
> > based on Linux's virtio_ring driver, but with stripped functionality
> > and optimizations so it's easier to review.
> >
> > However, forwarding buffers have some particular pieces: One of the most
> > unexpected ones is that a guest's buffer can expand through more than
> > one descriptor in SVQ. While this is handled gracefully by qemu's
> > emulated virtio devices, it may cause unexpected SVQ queue full. This
> > patch also solves it by checking for this condition at both guest's
> > kicks and device's calls. The code may be more elegant in the future if
> > SVQ code runs in its own iocontext.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |   2 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
> >   hw/virtio/vhost-vdpa.c             | 111 ++++++++-
> >   3 files changed, 462 insertions(+), 16 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 39aef5ffdf..19c934af49 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
> >   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> >   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >
> > +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > +                     VirtQueue *vq);
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> >   VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 7c168075d7..a1a404f68f 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -9,6 +9,8 @@
> >
> >   #include "qemu/osdep.h"
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> > +#include "hw/virtio/vhost.h"
> > +#include "hw/virtio/virtio-access.h"
> >   #include "standard-headers/linux/vhost_types.h"
> >
> >   #include "qemu/error-report.h"
> > @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
> >
> >       /* Guest's call notifier, where SVQ calls guest. */
> >       EventNotifier svq_call;
> > +
> > +    /* Virtio queue shadowing */
> > +    VirtQueue *vq;
> > +
> > +    /* Virtio device */
> > +    VirtIODevice *vdev;
> > +
> > +    /* Map for returning guest's descriptors */
> > +    VirtQueueElement **ring_id_maps;
> > +
> > +    /* Next VirtQueue element that guest made available */
> > +    VirtQueueElement *next_guest_avail_elem;
> > +
> > +    /* Next head to expose to device */
> > +    uint16_t avail_idx_shadow;
> > +
> > +    /* Next free descriptor */
> > +    uint16_t free_head;
> > +
> > +    /* Last seen used idx */
> > +    uint16_t shadow_used_idx;
> > +
> > +    /* Next head to consume from device */
> > +    uint16_t last_used_idx;
> > +
> > +    /* Cache for the exposed notification flag */
> > +    bool notification;
> >   } VhostShadowVirtqueue;
> >
> >   #define INVALID_SVQ_KICK_FD -1
> > @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
> >       return true;
> >   }
> >
> > -/* Forward guest notifications */
> > -static void vhost_handle_guest_kick(EventNotifier *n)
> > +/**
> > + * Number of descriptors that SVQ can make available from the guest.
> > + *
> > + * @svq   The svq
> > + */
> > +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> >   {
> > -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > -                                             svq_kick);
> > +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> > +}
> > +
> > +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > +{
> > +    uint16_t notification_flag;
> >
> > -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > +    if (svq->notification == enable) {
> > +        return;
> > +    }
> > +
> > +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> > +
> > +    svq->notification = enable;
> > +    if (enable) {
> > +        svq->vring.avail->flags &= ~notification_flag;
> > +    } else {
> > +        svq->vring.avail->flags |= notification_flag;
> > +    }
> > +}
> > +
> > +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > +                                    const struct iovec *iovec,
> > +                                    size_t num, bool more_descs, bool write)
> > +{
> > +    uint16_t i = svq->free_head, last = svq->free_head;
> > +    unsigned n;
> > +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > +    vring_desc_t *descs = svq->vring.desc;
> > +
> > +    if (num == 0) {
> > +        return;
> > +    }
> > +
> > +    for (n = 0; n < num; n++) {
> > +        if (more_descs || (n + 1 < num)) {
> > +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > +        } else {
> > +            descs[i].flags = flags;
> > +        }
> > +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > +
> > +        last = i;
> > +        i = cpu_to_le16(descs[i].next);
> > +    }
> > +
> > +    svq->free_head = le16_to_cpu(descs[last].next);
> > +}
> > +
> > +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > +                                    VirtQueueElement *elem)
> > +{
> > +    int head;
> > +    unsigned avail_idx;
> > +    vring_avail_t *avail = svq->vring.avail;
> > +
> > +    head = svq->free_head;
> > +
> > +    /* We need some descriptors here */
> > +    assert(elem->out_num || elem->in_num);
>
>
> Looks like this could be triggered by guest, we need fail instead assert
> here.
>

My understanding was that virtqueue_pop already sanitized that case,
but I'm not able to find where now. I will recheck and, in case it's
not, I will move to a failure.

>
> > +
> > +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > +                            elem->in_num > 0, false);
> > +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > +
> > +    /*
> > +     * Put entry in available array (but don't update avail->idx until they
> > +     * do sync).
> > +     */
> > +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > +    avail->ring[avail_idx] = cpu_to_le16(head);
> > +    svq->avail_idx_shadow++;
> > +
> > +    /* Update avail index after the descriptor is wrote */
> > +    smp_wmb();
> > +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> > +
> > +    return head;
> > +}
> > +
> > +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > +{
> > +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > +
> > +    svq->ring_id_maps[qemu_head] = elem;
> > +}
> > +
> > +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> > +{
> > +    /* We need to expose available array entries before checking used flags */
> > +    smp_mb();
> > +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
> >           return;
> >       }
> >
> >       event_notifier_set(&svq->hdev_kick);
> >   }
> >
> > -/* Forward vhost notifications */
> > +/**
> > + * Forward available buffers.
> > + *
> > + * @svq Shadow VirtQueue
> > + *
> > + * Note that this function does not guarantee that all guest's available
> > + * buffers are available to the device in SVQ avail ring. The guest may have
> > + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> > + * vaddr.
> > + *
> > + * If that happens, guest's kick notifications will be disabled until device
> > + * makes some buffers used.
> > + */
> > +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> > +{
> > +    /* Clear event notifier */
> > +    event_notifier_test_and_clear(&svq->svq_kick);
> > +
> > +    /* Make available as many buffers as possible */
> > +    do {
> > +        if (virtio_queue_get_notification(svq->vq)) {
> > +            virtio_queue_set_notification(svq->vq, false);
>
>
> This looks like an optimization the should belong to
> virtio_queue_set_notification() itself.
>

Sure we can move.

>
> > +        }
> > +
> > +        while (true) {
> > +            VirtQueueElement *elem;
> > +
> > +            if (svq->next_guest_avail_elem) {
> > +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > +            } else {
> > +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > +            }
> > +
> > +            if (!elem) {
> > +                break;
> > +            }
> > +
> > +            if (elem->out_num + elem->in_num >
> > +                vhost_svq_available_slots(svq)) {
> > +                /*
> > +                 * This condition is possible since a contiguous buffer in GPA
> > +                 * does not imply a contiguous buffer in qemu's VA
> > +                 * scatter-gather segments. If that happen, the buffer exposed
> > +                 * to the device needs to be a chain of descriptors at this
> > +                 * moment.
> > +                 *
> > +                 * SVQ cannot hold more available buffers if we are here:
> > +                 * queue the current guest descriptor and ignore further kicks
> > +                 * until some elements are used.
> > +                 */
> > +                svq->next_guest_avail_elem = elem;
> > +                return;
> > +            }
> > +
> > +            vhost_svq_add(svq, elem);
> > +            vhost_svq_kick(svq);
> > +        }
> > +
> > +        virtio_queue_set_notification(svq->vq, true);
> > +    } while (!virtio_queue_empty(svq->vq));
> > +}
> > +
> > +/**
> > + * Handle guest's kick.
> > + *
> > + * @n guest kick event notifier, the one that guest set to notify svq.
> > + */
> > +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> > +{
> > +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > +                                             svq_kick);
> > +    vhost_handle_guest_kick(svq);
> > +}
> > +
> > +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > +{
> > +    if (svq->last_used_idx != svq->shadow_used_idx) {
> > +        return true;
> > +    }
> > +
> > +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> > +
> > +    return svq->last_used_idx != svq->shadow_used_idx;
> > +}
> > +
> > +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > +{
> > +    vring_desc_t *descs = svq->vring.desc;
> > +    const vring_used_t *used = svq->vring.used;
> > +    vring_used_elem_t used_elem;
> > +    uint16_t last_used;
> > +
> > +    if (!vhost_svq_more_used(svq)) {
> > +        return NULL;
> > +    }
> > +
> > +    /* Only get used array entries after they have been exposed by dev */
> > +    smp_rmb();
> > +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> > +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> > +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> > +
> > +    svq->last_used_idx++;
> > +    if (unlikely(used_elem.id >= svq->vring.num)) {
> > +        error_report("Device %s says index %u is used", svq->vdev->name,
> > +                     used_elem.id);
> > +        return NULL;
> > +    }
> > +
> > +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> > +        error_report(
> > +            "Device %s says index %u is used, but it was not available",
> > +            svq->vdev->name, used_elem.id);
> > +        return NULL;
> > +    }
> > +
> > +    descs[used_elem.id].next = svq->free_head;
> > +    svq->free_head = used_elem.id;
> > +
> > +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > +}
> > +
> > +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > +                            bool check_for_avail_queue)
> > +{
> > +    VirtQueue *vq = svq->vq;
> > +
> > +    /* Make as many buffers as possible used. */
> > +    do {
> > +        unsigned i = 0;
> > +
> > +        vhost_svq_set_notification(svq, false);
> > +        while (true) {
> > +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > +            if (!elem) {
> > +                break;
> > +            }
> > +
> > +            if (unlikely(i >= svq->vring.num)) {
> > +                virtio_error(svq->vdev,
> > +                         "More than %u used buffers obtained in a %u size SVQ",
> > +                         i, svq->vring.num);
> > +                virtqueue_fill(vq, elem, elem->len, i);
> > +                virtqueue_flush(vq, i);
>
>
> Let's simply use virtqueue_push() here?
>

virtqueue_push support to fill and flush only one element, instead of
batch. I'm fine with either but I think the less updates to the used
idx, the better.

>
> > +                i = 0;
>
>
> Do we need to bail out here?
>

Yes I guess we can simply return.

>
> > +            }
> > +            virtqueue_fill(vq, elem, elem->len, i++);
> > +        }
> > +
> > +        virtqueue_flush(vq, i);
> > +        event_notifier_set(&svq->svq_call);
> > +
> > +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> > +            /*
> > +             * Avail ring was full when vhost_svq_flush was called, so it's a
> > +             * good moment to make more descriptors available if possible
> > +             */
> > +            vhost_handle_guest_kick(svq);
>
>
> Is there better to have a similar check as vhost_handle_guest_kick() did?
>
>              if (elem->out_num + elem->in_num >
>                  vhost_svq_available_slots(svq)) {
>

It will be duplicated when we call vhost_handle_guest_kick, won't it?

>
> > +        }
> > +
> > +        vhost_svq_set_notification(svq, true);
>
>
> A mb() is needed here? Otherwise we may lost a call here (where
> vhost_svq_more_used() is run before vhost_svq_set_notification()).
>

I'm confused here then, I thought you said this is just a hint so
there was no need? [1]. I think the memory barrier is needed too.

>
> > +    } while (vhost_svq_more_used(svq));
> > +}
> > +
> > +/**
> > + * Forward used buffers.
> > + *
> > + * @n hdev call event notifier, the one that device set to notify svq.
> > + *
> > + * Note that we are not making any buffers available in the loop, there is no
> > + * way that it runs more than virtqueue size times.
> > + */
> >   static void vhost_svq_handle_call(EventNotifier *n)
> >   {
> >       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> >                                                hdev_call);
> >
> > -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > -        return;
> > -    }
> > +    /* Clear event notifier */
> > +    event_notifier_test_and_clear(n);
>
>
> Any reason that we remove the above check?
>

This comes from the previous versions, where this made sure we missed
no used buffers in the process of switching to SVQ mode.

If we enable SVQ from the beginning I think we can rely on getting all
the device's used buffer notifications, so let me think a little bit
and I can move to check the eventfd.

>
> >
> > -    event_notifier_set(&svq->svq_call);
> > +    vhost_svq_flush(svq, true);
> >   }
> >
> >   /**
> > @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >        * need to explicitely check for them.
> >        */
> >       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> > +    event_notifier_set_handler(&svq->svq_kick,
> > +                               vhost_handle_guest_kick_notifier);
> >
> >       if (!check_old || event_notifier_test_and_clear(&tmp)) {
> >           event_notifier_set(&svq->hdev_kick);
> >       }
> >   }
> >
> > +/**
> > + * Start shadow virtqueue operation.
> > + *
> > + * @svq Shadow Virtqueue
> > + * @vdev        VirtIO device
> > + * @vq          Virtqueue to shadow
> > + */
> > +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > +                     VirtQueue *vq)
> > +{
> > +    svq->next_guest_avail_elem = NULL;
> > +    svq->avail_idx_shadow = 0;
> > +    svq->shadow_used_idx = 0;
> > +    svq->last_used_idx = 0;
> > +    svq->vdev = vdev;
> > +    svq->vq = vq;
> > +
> > +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> > +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> > +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> > +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > +    }
> > +}
> > +
> >   /**
> >    * Stop shadow virtqueue operation.
> >    * @svq Shadow Virtqueue
> > @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >   {
> >       event_notifier_set_handler(&svq->svq_kick, NULL);
> > +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > +
> > +    if (!svq->vq) {
> > +        return;
> > +    }
> > +
> > +    /* Send all pending used descriptors to guest */
> > +    vhost_svq_flush(svq, false);
> > +
> > +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > +        g_autofree VirtQueueElement *elem = NULL;
> > +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > +        if (elem) {
> > +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > +        }
> > +    }
> > +
> > +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > +    if (next_avail_elem) {
> > +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > +                                 next_avail_elem->len);
> > +    }
> >   }
> >
> >   /**
> > @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >       memset(svq->vring.desc, 0, driver_size);
> >       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> >       memset(svq->vring.used, 0, device_size);
> > -
> > +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
> >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >       return g_steal_pointer(&svq);
> >
> > @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> >       event_notifier_cleanup(&vq->hdev_kick);
> >       event_notifier_set_handler(&vq->hdev_call, NULL);
> >       event_notifier_cleanup(&vq->hdev_call);
> > +    g_free(vq->ring_id_maps);
> >       qemu_vfree(vq->vring.desc);
> >       qemu_vfree(vq->vring.used);
> >       g_free(vq);
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 53e14bafa0..0e5c00ed7e 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >    * Note that this function does not rewind kick file descriptor if cannot set
> >    * call one.
> >    */
> > -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > -                                VhostShadowVirtqueue *svq,
> > -                                unsigned idx)
> > +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> > +                                  VhostShadowVirtqueue *svq,
> > +                                  unsigned idx)
> >   {
> >       struct vhost_vring_file file = {
> >           .index = dev->vq_index + idx,
> > @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> >       r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> >       if (unlikely(r != 0)) {
> >           error_report("Can't set device kick fd (%d)", -r);
> > -        return false;
> > +        return r;
> >       }
> >
> >       event_notifier = vhost_svq_get_svq_call_notifier(svq);
> > @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> >           error_report("Can't set device call fd (%d)", -r);
> >       }
> >
> > +    return r;
> > +}
> > +
> > +/**
> > + * Unmap SVQ area in the device
> > + */
> > +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> > +                                      hwaddr size)
> > +{
> > +    int r;
> > +
> > +    size = ROUND_UP(size, qemu_real_host_page_size);
> > +    r = vhost_vdpa_dma_unmap(v, iova, size);
> > +    return r == 0;
> > +}
> > +
> > +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> > +                                       const VhostShadowVirtqueue *svq)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    struct vhost_vring_addr svq_addr;
> > +    size_t device_size = vhost_svq_device_area_size(svq);
> > +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > +    bool ok;
> > +
> > +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > +
> > +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> > +}
> > +
> > +/**
> > + * Map shadow virtqueue rings in device
> > + *
> > + * @dev   The vhost device
> > + * @svq   The shadow virtqueue
> > + */
> > +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> > +                                     const VhostShadowVirtqueue *svq)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    struct vhost_vring_addr svq_addr;
> > +    size_t device_size = vhost_svq_device_area_size(svq);
> > +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > +    int r;
> > +
> > +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > +
> > +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> > +                           (void *)svq_addr.desc_user_addr, true);
> > +    if (unlikely(r != 0)) {
> > +        return false;
> > +    }
> > +
> > +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> > +                           (void *)svq_addr.used_user_addr, false);
>
>
> Do we need unmap the driver area if we fail here?
>

Yes, this used to trust in unmap them at the disabling of SVQ. Now I
think we need to unmap as you say.

Thanks!

[1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html

> Thanks
>
>
> > +    return r == 0;
> > +}
> > +
> > +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > +                                VhostShadowVirtqueue *svq,
> > +                                unsigned idx)
> > +{
> > +    uint16_t vq_index = dev->vq_index + idx;
> > +    struct vhost_vring_state s = {
> > +        .index = vq_index,
> > +    };
> > +    int r;
> > +    bool ok;
> > +
> > +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> > +    if (unlikely(r)) {
> > +        error_report("Can't set vring base (%d)", r);
> > +        return false;
> > +    }
> > +
> > +    s.num = vhost_svq_get_num(svq);
> > +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> > +    if (unlikely(r)) {
> > +        error_report("Can't set vring num (%d)", r);
> > +        return false;
> > +    }
> > +
> > +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
> >       return r == 0;
> >   }
> >
> > @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >       if (started) {
> >           vhost_vdpa_host_notifiers_init(dev);
> >           for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
> >               VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> >               bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> >               if (unlikely(!ok)) {
> >                   return -1;
> >               }
> > +            vhost_svq_start(svq, dev->vdev, vq);
> >           }
> >           vhost_vdpa_set_vring_ready(dev);
> >       } else {
> > +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> > +                                                          i);
> > +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> > +            if (unlikely(!ok)) {
> > +                return -1;
> > +            }
> > +        }
> >           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >       }
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 29/31] vdpa: Make ncs autofree
  2022-01-30  6:51     ` Jason Wang
  (?)
@ 2022-02-01 17:10     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 17:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 7:52 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Simplifying memory management.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
> To reduce the size of this series. This can be sent as an separate patch
> if I was not wrong.
>

Sure, I'll send separately to trivial maillist.

Thanks!

> Thanks
>
>
> > ---
> >   net/vhost-vdpa.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 4125d13118..4befba5cc7 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -264,7 +264,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >   {
> >       const NetdevVhostVDPAOptions *opts;
> >       int vdpa_device_fd;
> > -    NetClientState **ncs, *nc;
> > +    g_autofree NetClientState **ncs = NULL;
> > +    NetClientState *nc;
> >       int queue_pairs, i, has_cvq = 0;
> >
> >       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > @@ -302,7 +303,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >               goto err;
> >       }
> >
> > -    g_free(ncs);
> >       return 0;
> >
> >   err:
> > @@ -310,7 +310,6 @@ err:
> >           qemu_del_net_client(ncs[0]);
> >       }
> >       qemu_close(vdpa_device_fd);
> > -    g_free(ncs);
> >
> >       return -1;
> >   }
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
  2022-01-30  6:53     ` Jason Wang
  (?)
@ 2022-02-01 17:11     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 17:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 7:53 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > Since it's a device property, it can be done in net/. This helps SVQ to
> > allocate the rings in vdpa device initialization, rather than delay
> > that.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-vdpa.c | 15 ---------------
> >   net/vhost-vdpa.c       | 32 ++++++++++++++++++++++++--------
>
>
> I don't understand here, since we will support device other than net?
>
>
> >   2 files changed, 24 insertions(+), 23 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 75090d65e8..2491c05d29 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -350,19 +350,6 @@ static int vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
> >       return 0;
> >   }
> >
> > -static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> > -{
> > -    int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
> > -                              &v->iova_range);
> > -    if (ret != 0) {
> > -        v->iova_range.first = 0;
> > -        v->iova_range.last = UINT64_MAX;
> > -    }
> > -
> > -    trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> > -                                    v->iova_range.last);
> > -}
>
>
> Let's just export this instead?
>

Yes, I see now that export is a better solution than moving.

Thanks!

> Thanks
>
>
> > -
> >   static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
> >   {
> >       struct vhost_vdpa *v = dev->opaque;
> > @@ -1295,8 +1282,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >           goto err;
> >       }
> >
> > -    vhost_vdpa_get_iova_range(v);
> > -
> >       if (vhost_vdpa_one_time_request(dev)) {
> >           return 0;
> >       }
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 4befba5cc7..cc9cecf8d1 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -22,6 +22,7 @@
> >   #include <sys/ioctl.h>
> >   #include <err.h>
> >   #include "standard-headers/linux/virtio_net.h"
> > +#include "standard-headers/linux/vhost_types.h"
> >   #include "monitor/monitor.h"
> >   #include "hw/virtio/vhost.h"
> >
> > @@ -187,13 +188,25 @@ static NetClientInfo net_vhost_vdpa_info = {
> >           .check_peer_type = vhost_vdpa_check_peer_type,
> >   };
> >
> > +static void vhost_vdpa_get_iova_range(int fd,
> > +                                      struct vhost_vdpa_iova_range *iova_range)
> > +{
> > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > +
> > +    if (ret < 0) {
> > +        iova_range->first = 0;
> > +        iova_range->last = UINT64_MAX;
> > +    }
> > +}
> > +
> >   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > -                                           const char *device,
> > -                                           const char *name,
> > -                                           int vdpa_device_fd,
> > -                                           int queue_pair_index,
> > -                                           int nvqs,
> > -                                           bool is_datapath)
> > +                                       const char *device,
> > +                                       const char *name,
> > +                                       int vdpa_device_fd,
> > +                                       int queue_pair_index,
> > +                                       int nvqs,
> > +                                       bool is_datapath,
> > +                                       struct vhost_vdpa_iova_range iova_range)
> >   {
> >       NetClientState *nc = NULL;
> >       VhostVDPAState *s;
> > @@ -211,6 +224,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> > +    s->vhost_vdpa.iova_range = iova_range;
> >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >       if (ret) {
> >           qemu_del_net_client(nc);
> > @@ -267,6 +281,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       g_autofree NetClientState **ncs = NULL;
> >       NetClientState *nc;
> >       int queue_pairs, i, has_cvq = 0;
> > +    struct vhost_vdpa_iova_range iova_range;
> >
> >       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >       opts = &netdev->u.vhost_vdpa;
> > @@ -286,19 +301,20 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >           qemu_close(vdpa_device_fd);
> >           return queue_pairs;
> >       }
> > +    vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> >
> >       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >
> >       for (i = 0; i < queue_pairs; i++) {
> >           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                     vdpa_device_fd, i, 2, true);
> > +                                     vdpa_device_fd, i, 2, true, iova_range);
> >           if (!ncs[i])
> >               goto err;
> >       }
> >
> >       if (has_cvq) {
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                 vdpa_device_fd, i, 1, false);
> > +                                 vdpa_device_fd, i, 1, false, iova_range);
> >           if (!nc)
> >               goto err;
> >       }
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
  2022-01-30  5:21     ` Jason Wang
  (?)
@ 2022-02-01 17:27     ` Eugenio Perez Martin
  2022-02-08  8:17         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-01 17:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Sun, Jan 30, 2022 at 6:21 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > This tree is able to look for a translated address from an IOVA address.
> >
> > At first glance it is similar to util/iova-tree. However, SVQ working on
> > devices with limited IOVA space need more capabilities,
>
>
> So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and
> without RMRRs).
>
>
> >   like allocating
> > IOVA chunks or performing reverse translations (qemu addresses to iova).
>
>
> This looks like a general request as well. So I wonder if we can simply
> extend iova tree instead.
>

While both are true, I don't see code that performs allocations or
qemu vaddr to iova translations. But if the changes can be integrated
into iova-tree that would be great for sure.

The main drawback I see is the need to maintain two trees instead of
one for users of iova-tree. While complexity does not grow, it needs
to double the amount of work needed.

Thanks!

> Thanks
>
>
> >
> > The allocation capability, as "assign a free IOVA address to this chunk
> > of memory in qemu's address space" allows shadow virtqueue to create a
> > new address space that is not restricted by guest's addressable one, so
> > we can allocate shadow vqs vrings outside of it.
> >
> > It duplicates the tree so it can search efficiently both directions,
> > and it will signal overlap if iova or the translated address is
> > present in any tree.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-iova-tree.h |  27 +++++++
> >   hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
> >   hw/virtio/meson.build       |   2 +-
> >   3 files changed, 185 insertions(+), 1 deletion(-)
> >   create mode 100644 hw/virtio/vhost-iova-tree.h
> >   create mode 100644 hw/virtio/vhost-iova-tree.c
> >
> > diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> > new file mode 100644
> > index 0000000000..610394eaf1
> > --- /dev/null
> > +++ b/hw/virtio/vhost-iova-tree.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + * vhost software live migration ring
> > + *
> > + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> > +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> > +
> > +#include "qemu/iova-tree.h"
> > +#include "exec/memory.h"
> > +
> > +typedef struct VhostIOVATree VhostIOVATree;
> > +
> > +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
> > +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
> > +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
> > +
> > +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
> > +                                        const DMAMap *map);
> > +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
> > +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
> > +
> > +#endif
> > diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> > new file mode 100644
> > index 0000000000..0021dbaf54
> > --- /dev/null
> > +++ b/hw/virtio/vhost-iova-tree.c
> > @@ -0,0 +1,157 @@
> > +/*
> > + * vhost software live migration ring
> > + *
> > + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/iova-tree.h"
> > +#include "vhost-iova-tree.h"
> > +
> > +#define iova_min_addr qemu_real_host_page_size
> > +
> > +/**
> > + * VhostIOVATree, able to:
> > + * - Translate iova address
> > + * - Reverse translate iova address (from translated to iova)
> > + * - Allocate IOVA regions for translated range (potentially slow operation)
> > + *
> > + * Note that it cannot remove nodes.
> > + */
> > +struct VhostIOVATree {
> > +    /* First addresable iova address in the device */
> > +    uint64_t iova_first;
> > +
> > +    /* Last addressable iova address in the device */
> > +    uint64_t iova_last;
> > +
> > +    /* IOVA address to qemu memory maps. */
> > +    IOVATree *iova_taddr_map;
> > +
> > +    /* QEMU virtual memory address to iova maps */
> > +    GTree *taddr_iova_map;
> > +};
> > +
> > +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
> > +                                      gpointer data)
> > +{
> > +    const DMAMap *m1 = a, *m2 = b;
> > +
> > +    if (m1->translated_addr > m2->translated_addr + m2->size) {
> > +        return 1;
> > +    }
> > +
> > +    if (m1->translated_addr + m1->size < m2->translated_addr) {
> > +        return -1;
> > +    }
> > +
> > +    /* Overlapped */
> > +    return 0;
> > +}
> > +
> > +/**
> > + * Create a new IOVA tree
> > + *
> > + * Returns the new IOVA tree
> > + */
> > +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
> > +{
> > +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
> > +
> > +    /* Some devices does not like 0 addresses */
> > +    tree->iova_first = MAX(iova_first, iova_min_addr);
> > +    tree->iova_last = iova_last;
> > +
> > +    tree->iova_taddr_map = iova_tree_new();
> > +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
> > +                                           NULL, g_free);
> > +    return tree;
> > +}
> > +
> > +/**
> > + * Delete an iova tree
> > + */
> > +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
> > +{
> > +    iova_tree_destroy(iova_tree->iova_taddr_map);
> > +    g_tree_unref(iova_tree->taddr_iova_map);
> > +    g_free(iova_tree);
> > +}
> > +
> > +/**
> > + * Find the IOVA address stored from a memory address
> > + *
> > + * @tree     The iova tree
> > + * @map      The map with the memory address
> > + *
> > + * Return the stored mapping, or NULL if not found.
> > + */
> > +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
> > +                                        const DMAMap *map)
> > +{
> > +    return g_tree_lookup(tree->taddr_iova_map, map);
> > +}
> > +
> > +/**
> > + * Allocate a new mapping
> > + *
> > + * @tree  The iova tree
> > + * @map   The iova map
> > + *
> > + * Returns:
> > + * - IOVA_OK if the map fits in the container
> > + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
> > + * - IOVA_ERR_OVERLAP if the tree already contains that map
> > + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
> > + *
> > + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
> > + */
> > +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
> > +{
> > +    /* Some vhost devices does not like addr 0. Skip first page */
> > +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
> > +    DMAMap *new;
> > +    int r;
> > +
> > +    if (map->translated_addr + map->size < map->translated_addr ||
> > +        map->perm == IOMMU_NONE) {
> > +        return IOVA_ERR_INVALID;
> > +    }
> > +
> > +    /* Check for collisions in translated addresses */
> > +    if (vhost_iova_tree_find_iova(tree, map)) {
> > +        return IOVA_ERR_OVERLAP;
> > +    }
> > +
> > +    /* Allocate a node in IOVA address */
> > +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
> > +                        tree->iova_last);
> > +    if (r != IOVA_OK) {
> > +        return r;
> > +    }
> > +
> > +    /* Allocate node in qemu -> iova translations */
> > +    new = g_malloc(sizeof(*new));
> > +    memcpy(new, map, sizeof(*new));
> > +    g_tree_insert(tree->taddr_iova_map, new, new);
> > +    return IOVA_OK;
> > +}
> > +
> > +/**
> > + * Remove existing mappings from iova tree
> > + *
> > + * @param  iova_tree  The vhost iova tree
> > + * @param  map        The map to remove
> > + */
> > +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
> > +{
> > +    const DMAMap *overlap;
> > +
> > +    iova_tree_remove(iova_tree->iova_taddr_map, map);
> > +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
> > +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
> > +    }
> > +}
> > diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> > index 2dc87613bc..6047670804 100644
> > --- a/hw/virtio/meson.build
> > +++ b/hw/virtio/meson.build
> > @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
> >
> >   virtio_ss = ss.source_set()
> >   virtio_ss.add(files('virtio.c'))
> > -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> > +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
> >   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
> >   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
> >   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-01-31 15:34     ` Eugenio Perez Martin
@ 2022-02-08  3:23         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  3:23 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>    1 file changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 18de14f0fb..029f98feee 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>        }
>>>    }
>>>
>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>> -                                       struct vhost_vring_file *file)
>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>> +                                         struct vhost_vring_file *file)
>>>    {
>>>        trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>> +                                     struct vhost_vring_file *file)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> +
>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>
>> Two questions here (had similar questions for vring kick):
>>
>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>> vhost_vdpa_svq_setup() not here?
>>
> I'm not sure what you mean.
>
> The guest->SVQ call and kick fds are set here and at
> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
> notifier handler since we don't poll it.
>
> On the other hand, the connection SVQ <-> device uses the same fds
> from the beginning to the end, and they will not change with, for
> example, call fd masking. That's why it's setup from
> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
> us add way more logic there.


More logic in general shadow vq code but less codes for vhost-vdpa 
specific code I think.

E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to 
here.

Thanks


>
>> 2) The call could be disabled by using -1 as the fd, I don't see any
>> code to deal with that.
>>
> Right, I didn't take that into account. vhost-kernel takes also -1 as
> kick_fd to unbind, so SVQ can be reworked to take that into account
> for sure.
>
> Thanks!
>
>> Thanks
>>
>>
>>> +        return 0;
>>> +    } else {
>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Set shadow virtqueue descriptors to the device
>>>     *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
@ 2022-02-08  3:23         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  3:23 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>    1 file changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 18de14f0fb..029f98feee 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>        }
>>>    }
>>>
>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>> -                                       struct vhost_vring_file *file)
>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>> +                                         struct vhost_vring_file *file)
>>>    {
>>>        trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>> +                                     struct vhost_vring_file *file)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> +
>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>
>> Two questions here (had similar questions for vring kick):
>>
>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>> vhost_vdpa_svq_setup() not here?
>>
> I'm not sure what you mean.
>
> The guest->SVQ call and kick fds are set here and at
> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
> notifier handler since we don't poll it.
>
> On the other hand, the connection SVQ <-> device uses the same fds
> from the beginning to the end, and they will not change with, for
> example, call fd masking. That's why it's setup from
> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
> us add way more logic there.


More logic in general shadow vq code but less codes for vhost-vdpa 
specific code I think.

E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to 
here.

Thanks


>
>> 2) The call could be disabled by using -1 as the fd, I don't see any
>> code to deal with that.
>>
> Right, I didn't take that into account. vhost-kernel takes also -1 as
> kick_fd to unbind, so SVQ can be reworked to take that into account
> for sure.
>
> Thanks!
>
>> Thanks
>>
>>
>>> +        return 0;
>>> +    } else {
>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Set shadow virtqueue descriptors to the device
>>>     *



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-02-01 10:57       ` Eugenio Perez Martin
@ 2022-02-08  3:37           ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  3:37 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 下午6:57, Eugenio Perez Martin 写道:
> On Mon, Jan 31, 2022 at 4:49 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
>> On Sat, Jan 29, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>> This allows SVQ to negotiate features with the device. For the device,
>>>> SVQ is a driver. While this function needs to bypass all non-transport
>>>> features, it needs to disable the features that SVQ does not support
>>>> when forwarding buffers. This includes packed vq layout, indirect
>>>> descriptors or event idx.
>>>>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>>>>    hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
>>>>    hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
>>>>    3 files changed, 67 insertions(+)
>>>>
>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>> index c9ffa11fce..d963867a04 100644
>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>> @@ -15,6 +15,8 @@
>>>>
>>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>>
>>>> +bool vhost_svq_valid_device_features(uint64_t *features);
>>>> +
>>>>    void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>>    void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>>>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>> index 9619c8082c..51442b3dbf 100644
>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>>        return &svq->hdev_kick;
>>>>    }
>>>>
>>>> +/**
>>>> + * Validate the transport device features that SVQ can use with the device
>>>> + *
>>>> + * @dev_features  The device features. If success, the acknowledged features.
>>>> + *
>>>> + * Returns true if SVQ can go with a subset of these, false otherwise.
>>>> + */
>>>> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
>>>> +{
>>>> +    bool r = true;
>>>> +
>>>> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
>>>> +         ++b) {
>>>> +        switch (b) {
>>>> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
>>>> +        case VIRTIO_F_ANY_LAYOUT:
>>>> +            continue;
>>>> +
>>>> +        case VIRTIO_F_ACCESS_PLATFORM:
>>>> +            /* SVQ does not know how to translate addresses */
>>>
>>> I may miss something but any reason that we need to disable
>>> ACCESS_PLATFORM? I'd expect the vring helper we used for shadow
>>> virtqueue can deal with vIOMMU perfectly.
>>>
>> This function is validating SVQ <-> Device communications features,
>> that may or may not be the same as guest <-> SVQ. These feature flags
>> are valid for guest <-> SVQ communication, same as with indirect
>> descriptors one.
>>
>> Having said that, there is a point in the series where
>> VIRTIO_F_ACCESS_PLATFORM is actually mandatory, so I think we could
>> use the latter addition of x-svq cmdline parameter and delay the
>> feature validations where it makes more sense.
>>
>>>> +            if (*dev_features & BIT_ULL(b)) {
>>>> +                clear_bit(b, dev_features);
>>>> +                r = false;
>>>> +            }
>>>> +            break;
>>>> +
>>>> +        case VIRTIO_F_VERSION_1:
>>>
>>> I had the same question here.
>>>
>> For VERSION_1 it's easier to assume that guest is little endian at
>> some points, but we could try harder to support both endianness if
>> needed.
>>
> Re-thinking the SVQ feature isolation stuff for this first iteration
> based on your comments.
>
> Maybe it's easier to simply fail if the device does not *match* the
> expected feature set, and add all of the "feature isolation" later.
> While a lot of guest <-> SVQ communication details are already solved
> for free with qemu's VirtQueue (indirect, packed, ...), we may
> simplify this series in particular and add the support for it later.
>
> For example, at this moment would be valid for the device to export
> indirect descriptors feature flag, and SVQ simply forward that feature
> flag offering to the guest. So the guest <-> SVQ communication could
> have indirect descriptors (qemu's VirtQueue code handles it for free),
> but SVQ would not acknowledge it for the device. As a side note, to
> negotiate it would have been harmless actually, but it's not the case
> of packed vq.
>
> So maybe for the v2 we can simply force the device to just export the
> strictly needed features and nothing else with qemu cmdline, and then
> enable the feature negotiation isolation for each side of SVQ?


Yes, that's exactly my point.

Thanks


>
> Thanks!
>
>
>> Thanks!
>>
>>> Thanks
>>>
>>>
>>>> +            /* SVQ trust that guest vring is little endian */
>>>> +            if (!(*dev_features & BIT_ULL(b))) {
>>>> +                set_bit(b, dev_features);
>>>> +                r = false;
>>>> +            }
>>>> +            continue;
>>>> +
>>>> +        default:
>>>> +            if (*dev_features & BIT_ULL(b)) {
>>>> +                clear_bit(b, dev_features);
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return r;
>>>> +}
>>>> +
>>>>    /* Forward guest notifications */
>>>>    static void vhost_handle_guest_kick(EventNotifier *n)
>>>>    {
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index bdb45c8808..9d801cf907 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>>>>        size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>>>>        g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>>>>                                                               vhost_psvq_free);
>>>> +    uint64_t dev_features;
>>>> +    uint64_t svq_features;
>>>> +    int r;
>>>> +    bool ok;
>>>> +
>>>>        if (!v->shadow_vqs_enabled) {
>>>>            goto out;
>>>>        }
>>>>
>>>> +    r = vhost_vdpa_get_features(hdev, &dev_features);
>>>> +    if (r != 0) {
>>>> +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    svq_features = dev_features;
>>>> +    ok = vhost_svq_valid_device_features(&svq_features);
>>>> +    if (unlikely(!ok)) {
>>>> +        error_setg(errp,
>>>> +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
>>>> +            hdev->features, svq_features);
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>>>>        for (unsigned n = 0; n < hdev->nvqs; ++n) {
>>>>            VhostShadowVirtqueue *svq = vhost_svq_new();
>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
@ 2022-02-08  3:37           ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  3:37 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/1 下午6:57, Eugenio Perez Martin 写道:
> On Mon, Jan 31, 2022 at 4:49 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
>> On Sat, Jan 29, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>> This allows SVQ to negotiate features with the device. For the device,
>>>> SVQ is a driver. While this function needs to bypass all non-transport
>>>> features, it needs to disable the features that SVQ does not support
>>>> when forwarding buffers. This includes packed vq layout, indirect
>>>> descriptors or event idx.
>>>>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>>>>    hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
>>>>    hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
>>>>    3 files changed, 67 insertions(+)
>>>>
>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>> index c9ffa11fce..d963867a04 100644
>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>> @@ -15,6 +15,8 @@
>>>>
>>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>>
>>>> +bool vhost_svq_valid_device_features(uint64_t *features);
>>>> +
>>>>    void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>>    void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>>>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>> index 9619c8082c..51442b3dbf 100644
>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>>        return &svq->hdev_kick;
>>>>    }
>>>>
>>>> +/**
>>>> + * Validate the transport device features that SVQ can use with the device
>>>> + *
>>>> + * @dev_features  The device features. If success, the acknowledged features.
>>>> + *
>>>> + * Returns true if SVQ can go with a subset of these, false otherwise.
>>>> + */
>>>> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
>>>> +{
>>>> +    bool r = true;
>>>> +
>>>> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
>>>> +         ++b) {
>>>> +        switch (b) {
>>>> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
>>>> +        case VIRTIO_F_ANY_LAYOUT:
>>>> +            continue;
>>>> +
>>>> +        case VIRTIO_F_ACCESS_PLATFORM:
>>>> +            /* SVQ does not know how to translate addresses */
>>>
>>> I may miss something but any reason that we need to disable
>>> ACCESS_PLATFORM? I'd expect the vring helper we used for shadow
>>> virtqueue can deal with vIOMMU perfectly.
>>>
>> This function is validating SVQ <-> Device communications features,
>> that may or may not be the same as guest <-> SVQ. These feature flags
>> are valid for guest <-> SVQ communication, same as with indirect
>> descriptors one.
>>
>> Having said that, there is a point in the series where
>> VIRTIO_F_ACCESS_PLATFORM is actually mandatory, so I think we could
>> use the latter addition of x-svq cmdline parameter and delay the
>> feature validations where it makes more sense.
>>
>>>> +            if (*dev_features & BIT_ULL(b)) {
>>>> +                clear_bit(b, dev_features);
>>>> +                r = false;
>>>> +            }
>>>> +            break;
>>>> +
>>>> +        case VIRTIO_F_VERSION_1:
>>>
>>> I had the same question here.
>>>
>> For VERSION_1 it's easier to assume that guest is little endian at
>> some points, but we could try harder to support both endianness if
>> needed.
>>
> Re-thinking the SVQ feature isolation stuff for this first iteration
> based on your comments.
>
> Maybe it's easier to simply fail if the device does not *match* the
> expected feature set, and add all of the "feature isolation" later.
> While a lot of guest <-> SVQ communication details are already solved
> for free with qemu's VirtQueue (indirect, packed, ...), we may
> simplify this series in particular and add the support for it later.
>
> For example, at this moment would be valid for the device to export
> indirect descriptors feature flag, and SVQ simply forward that feature
> flag offering to the guest. So the guest <-> SVQ communication could
> have indirect descriptors (qemu's VirtQueue code handles it for free),
> but SVQ would not acknowledge it for the device. As a side note, to
> negotiate it would have been harmless actually, but it's not the case
> of packed vq.
>
> So maybe for the v2 we can simply force the device to just export the
> strictly needed features and nothing else with qemu cmdline, and then
> enable the feature negotiation isolation for each side of SVQ?


Yes, that's exactly my point.

Thanks


>
> Thanks!
>
>
>> Thanks!
>>
>>> Thanks
>>>
>>>
>>>> +            /* SVQ trust that guest vring is little endian */
>>>> +            if (!(*dev_features & BIT_ULL(b))) {
>>>> +                set_bit(b, dev_features);
>>>> +                r = false;
>>>> +            }
>>>> +            continue;
>>>> +
>>>> +        default:
>>>> +            if (*dev_features & BIT_ULL(b)) {
>>>> +                clear_bit(b, dev_features);
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return r;
>>>> +}
>>>> +
>>>>    /* Forward guest notifications */
>>>>    static void vhost_handle_guest_kick(EventNotifier *n)
>>>>    {
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index bdb45c8808..9d801cf907 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>>>>        size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>>>>        g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>>>>                                                               vhost_psvq_free);
>>>> +    uint64_t dev_features;
>>>> +    uint64_t svq_features;
>>>> +    int r;
>>>> +    bool ok;
>>>> +
>>>>        if (!v->shadow_vqs_enabled) {
>>>>            goto out;
>>>>        }
>>>>
>>>> +    r = vhost_vdpa_get_features(hdev, &dev_features);
>>>> +    if (r != 0) {
>>>> +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    svq_features = dev_features;
>>>> +    ok = vhost_svq_valid_device_features(&svq_features);
>>>> +    if (unlikely(!ok)) {
>>>> +        error_setg(errp,
>>>> +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
>>>> +            hdev->features, svq_features);
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>>>>        for (unsigned n = 0; n < hdev->nvqs; ++n) {
>>>>            VhostShadowVirtqueue *svq = vhost_svq_new();
>>>>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-01-31 18:58     ` Eugenio Perez Martin
@ 2022-02-08  3:57         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> First half of the buffers forwarding part, preparing vhost-vdpa
>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
>>> this is effectively dead code at the moment, but it helps to reduce
>>> patch size.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>>>    hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>>>    hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>>>    3 files changed, 143 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 035207a469..39aef5ffdf 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>
>>> -VhostShadowVirtqueue *vhost_svq_new(void);
>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>>
>>>    void vhost_svq_free(VhostShadowVirtqueue *vq);
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index f129ec8395..7c168075d7 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    /**
>>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>     * methods and file descriptors.
>>> + *
>>> + * @qsize Shadow VirtQueue size
>>> + *
>>> + * Returns the new virtqueue or NULL.
>>> + *
>>> + * In case of error, reason is reported through error_report.
>>>     */
>>> -VhostShadowVirtqueue *vhost_svq_new(void)
>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>    {
>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
>>> +    size_t device_size, driver_size;
>>>        g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>>        int r;
>>>
>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>        /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>>        event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>>
>>> +    svq->vring.num = qsize;
>>
>> I wonder if this is the best. E.g some hardware can support up to 32K
>> queue size. So this will probably end up with:
>>
>> 1) SVQ use 32K queue size
>> 2) hardware queue uses 256
>>
> In that case SVQ vring queue size will be 32K and guest's vring can
> negotiate any number with SVQ equal or less than 32K,


Sorry for being unclear what I meant is actually

1) SVQ uses 32K queue size

2) guest vq uses 256

This looks like a burden that needs extra logic and may damage the 
performance.

And this can lead other interesting situation:

1) SVQ uses 256

2) guest vq uses 1024

Where a lot of more SVQ logic is needed.


> including 256.
> Is that what you mean?


I mean, it looks to me the logic will be much more simplified if we just 
allocate the shadow virtqueue with the size what guest can see (guest 
vring).

Then we don't need to think if the difference of the queue size can have 
any side effects.


>
> If with hardware queues you mean guest's vring, not sure why it is
> "probably 256". I'd say that in that case with the virtio-net kernel
> driver the ring size will be the same as the device export, for
> example, isn't it?
>
> The implementation should support any combination of sizes, but the
> ring size exposed to the guest is never bigger than hardware one.
>
>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
>> to add event index support?
>>
> I think we should not have any problem with event idx. If you mean
> that the guest could mark more buffers available than SVQ vring's
> size, that should not happen because there must be less entries in the
> guest than SVQ.
>
> But if I understood you correctly, a similar situation could happen if
> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> Even if that would happen, the situation should be ok too: SVQ knows
> the guest's avail idx and, if SVQ is full, it will continue forwarding
> avail buffers when the device uses more buffers.
>
> Does that make sense to you?


Yes.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
@ 2022-02-08  3:57         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> First half of the buffers forwarding part, preparing vhost-vdpa
>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
>>> this is effectively dead code at the moment, but it helps to reduce
>>> patch size.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>>>    hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>>>    hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>>>    3 files changed, 143 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 035207a469..39aef5ffdf 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>
>>> -VhostShadowVirtqueue *vhost_svq_new(void);
>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>>
>>>    void vhost_svq_free(VhostShadowVirtqueue *vq);
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index f129ec8395..7c168075d7 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    /**
>>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>     * methods and file descriptors.
>>> + *
>>> + * @qsize Shadow VirtQueue size
>>> + *
>>> + * Returns the new virtqueue or NULL.
>>> + *
>>> + * In case of error, reason is reported through error_report.
>>>     */
>>> -VhostShadowVirtqueue *vhost_svq_new(void)
>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>    {
>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
>>> +    size_t device_size, driver_size;
>>>        g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>>        int r;
>>>
>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>        /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>>        event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>>
>>> +    svq->vring.num = qsize;
>>
>> I wonder if this is the best. E.g some hardware can support up to 32K
>> queue size. So this will probably end up with:
>>
>> 1) SVQ use 32K queue size
>> 2) hardware queue uses 256
>>
> In that case SVQ vring queue size will be 32K and guest's vring can
> negotiate any number with SVQ equal or less than 32K,


Sorry for being unclear what I meant is actually

1) SVQ uses 32K queue size

2) guest vq uses 256

This looks like a burden that needs extra logic and may damage the 
performance.

And this can lead other interesting situation:

1) SVQ uses 256

2) guest vq uses 1024

Where a lot of more SVQ logic is needed.


> including 256.
> Is that what you mean?


I mean, it looks to me the logic will be much more simplified if we just 
allocate the shadow virtqueue with the size what guest can see (guest 
vring).

Then we don't need to think if the difference of the queue size can have 
any side effects.


>
> If with hardware queues you mean guest's vring, not sure why it is
> "probably 256". I'd say that in that case with the virtio-net kernel
> driver the ring size will be the same as the device export, for
> example, isn't it?
>
> The implementation should support any combination of sizes, but the
> ring size exposed to the guest is never bigger than hardware one.
>
>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
>> to add event index support?
>>
> I think we should not have any problem with event idx. If you mean
> that the guest could mark more buffers available than SVQ vring's
> size, that should not happen because there must be less entries in the
> guest than SVQ.
>
> But if I understood you correctly, a similar situation could happen if
> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> Even if that would happen, the situation should be ok too: SVQ knows
> the guest's avail idx and, if SVQ is full, it will continue forwarding
> avail buffers when the device uses more buffers.
>
> Does that make sense to you?


Yes.

Thanks



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
  2022-01-31 17:44     ` Eugenio Perez Martin
@ 2022-02-08  6:58         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  6:58 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 上午1:44, Eugenio Perez Martin 写道:
> On Sat, Jan 29, 2022 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Doing that way allows vhost backend to know what address to return.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 7b03efccec..64b955ba0c 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>>>                                        struct vhost_virtqueue *vq,
>>>                                        unsigned idx, bool enable_log)
>>>    {
>>> -    struct vhost_vring_addr addr;
>>> +    struct vhost_vring_addr addr = {
>>> +        .index = idx,
>>> +    };
>>>        int r;
>>> -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
>>>
>>>        if (dev->vhost_ops->vhost_vq_get_addr) {
>>>            r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
>>> @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>>>            addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
>>>            addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
>>>        }
>>
>> I'm a bit lost in the logic above, any reason we need call
>> vhost_vq_get_addr() :) ?
>>
> It's the way vhost_virtqueue_set_addr works if the backend has a
> vhost_vq_get_addr operation (currently, only vhost-vdpa). vhost first
> ask the address to the back end and then set it.


Right it's because vhost-vdpa doesn't use VA but GPA. But I'm not sure 
it's worth a dedicated vhost_ops. But consider we introduce shadow 
virtqueue stuffs, it should be ok now.

(In the future, we may consider to generalize non vhost-vdpa specific 
stuffs to VhostShadowVirtqueue, then we can get rid of this vhost_ops.


>
> Previously, index was not needed because all the information was in
> vhost_virtqueue. However to extract queue index from vhost_virtqueue
> is tricky, so I think it's easier to simply have that information at
> request, something similar to get_base or get_num when asking vdpa
> device. We can extract the index from vq - dev->vqs or something
> similar if it's prefered.


It looks odd for the caller to tell the index consider vhost_virtqueue 
is already passed. So I think we need deduce it from vhost_virtqueue as 
you mentioned here.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> -    addr.index = idx;
>>>        addr.log_guest_addr = vq->used_phys;
>>>        addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>>>        r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
@ 2022-02-08  6:58         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  6:58 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/1 上午1:44, Eugenio Perez Martin 写道:
> On Sat, Jan 29, 2022 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Doing that way allows vhost backend to know what address to return.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 7b03efccec..64b955ba0c 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>>>                                        struct vhost_virtqueue *vq,
>>>                                        unsigned idx, bool enable_log)
>>>    {
>>> -    struct vhost_vring_addr addr;
>>> +    struct vhost_vring_addr addr = {
>>> +        .index = idx,
>>> +    };
>>>        int r;
>>> -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
>>>
>>>        if (dev->vhost_ops->vhost_vq_get_addr) {
>>>            r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
>>> @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>>>            addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
>>>            addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
>>>        }
>>
>> I'm a bit lost in the logic above, any reason we need call
>> vhost_vq_get_addr() :) ?
>>
> It's the way vhost_virtqueue_set_addr works if the backend has a
> vhost_vq_get_addr operation (currently, only vhost-vdpa). vhost first
> ask the address to the back end and then set it.


Right it's because vhost-vdpa doesn't use VA but GPA. But I'm not sure 
it's worth a dedicated vhost_ops. But consider we introduce shadow 
virtqueue stuffs, it should be ok now.

(In the future, we may consider to generalize non vhost-vdpa specific 
stuffs to VhostShadowVirtqueue, then we can get rid of this vhost_ops.


>
> Previously, index was not needed because all the information was in
> vhost_virtqueue. However to extract queue index from vhost_virtqueue
> is tricky, so I think it's easier to simply have that information at
> request, something similar to get_base or get_num when asking vdpa
> device. We can extract the index from vq - dev->vqs or something
> similar if it's prefered.


It looks odd for the caller to tell the index consider vhost_virtqueue 
is already passed. So I think we need deduce it from vhost_virtqueue as 
you mentioned here.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> -    addr.index = idx;
>>>        addr.log_guest_addr = vq->used_phys;
>>>        addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>>>        r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-01 17:08     ` Eugenio Perez Martin
@ 2022-02-08  8:11         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:11 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Initial version of shadow virtqueue that actually forward buffers. There
>>> is no iommu support at the moment, and that will be addressed in future
>>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
>>> this means that SVQ is not usable at this point of the series on any
>>> device.
>>>
>>> For simplicity it only supports modern devices, that expects vring
>>> in little endian, with split ring and no event idx or indirect
>>> descriptors. Support for them will not be added in this series.
>>>
>>> It reuses the VirtQueue code for the device part. The driver part is
>>> based on Linux's virtio_ring driver, but with stripped functionality
>>> and optimizations so it's easier to review.
>>>
>>> However, forwarding buffers have some particular pieces: One of the most
>>> unexpected ones is that a guest's buffer can expand through more than
>>> one descriptor in SVQ. While this is handled gracefully by qemu's
>>> emulated virtio devices, it may cause unexpected SVQ queue full. This
>>> patch also solves it by checking for this condition at both guest's
>>> kicks and device's calls. The code may be more elegant in the future if
>>> SVQ code runs in its own iocontext.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
>>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
>>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
>>>    3 files changed, 462 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 39aef5ffdf..19c934af49 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>
>>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>> +                     VirtQueue *vq);
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>
>>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 7c168075d7..a1a404f68f 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -9,6 +9,8 @@
>>>
>>>    #include "qemu/osdep.h"
>>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
>>> +#include "hw/virtio/vhost.h"
>>> +#include "hw/virtio/virtio-access.h"
>>>    #include "standard-headers/linux/vhost_types.h"
>>>
>>>    #include "qemu/error-report.h"
>>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
>>>
>>>        /* Guest's call notifier, where SVQ calls guest. */
>>>        EventNotifier svq_call;
>>> +
>>> +    /* Virtio queue shadowing */
>>> +    VirtQueue *vq;
>>> +
>>> +    /* Virtio device */
>>> +    VirtIODevice *vdev;
>>> +
>>> +    /* Map for returning guest's descriptors */
>>> +    VirtQueueElement **ring_id_maps;
>>> +
>>> +    /* Next VirtQueue element that guest made available */
>>> +    VirtQueueElement *next_guest_avail_elem;
>>> +
>>> +    /* Next head to expose to device */
>>> +    uint16_t avail_idx_shadow;
>>> +
>>> +    /* Next free descriptor */
>>> +    uint16_t free_head;
>>> +
>>> +    /* Last seen used idx */
>>> +    uint16_t shadow_used_idx;
>>> +
>>> +    /* Next head to consume from device */
>>> +    uint16_t last_used_idx;
>>> +
>>> +    /* Cache for the exposed notification flag */
>>> +    bool notification;
>>>    } VhostShadowVirtqueue;
>>>
>>>    #define INVALID_SVQ_KICK_FD -1
>>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
>>>        return true;
>>>    }
>>>
>>> -/* Forward guest notifications */
>>> -static void vhost_handle_guest_kick(EventNotifier *n)
>>> +/**
>>> + * Number of descriptors that SVQ can make available from the guest.
>>> + *
>>> + * @svq   The svq
>>> + */
>>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>>>    {
>>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> -                                             svq_kick);
>>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
>>> +}
>>> +
>>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
>>> +{
>>> +    uint16_t notification_flag;
>>>
>>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
>>> +    if (svq->notification == enable) {
>>> +        return;
>>> +    }
>>> +
>>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
>>> +
>>> +    svq->notification = enable;
>>> +    if (enable) {
>>> +        svq->vring.avail->flags &= ~notification_flag;
>>> +    } else {
>>> +        svq->vring.avail->flags |= notification_flag;
>>> +    }
>>> +}
>>> +
>>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>>> +                                    const struct iovec *iovec,
>>> +                                    size_t num, bool more_descs, bool write)
>>> +{
>>> +    uint16_t i = svq->free_head, last = svq->free_head;
>>> +    unsigned n;
>>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>>> +    vring_desc_t *descs = svq->vring.desc;
>>> +
>>> +    if (num == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    for (n = 0; n < num; n++) {
>>> +        if (more_descs || (n + 1 < num)) {
>>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
>>> +        } else {
>>> +            descs[i].flags = flags;
>>> +        }
>>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
>>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
>>> +
>>> +        last = i;
>>> +        i = cpu_to_le16(descs[i].next);
>>> +    }
>>> +
>>> +    svq->free_head = le16_to_cpu(descs[last].next);
>>> +}
>>> +
>>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
>>> +                                    VirtQueueElement *elem)
>>> +{
>>> +    int head;
>>> +    unsigned avail_idx;
>>> +    vring_avail_t *avail = svq->vring.avail;
>>> +
>>> +    head = svq->free_head;
>>> +
>>> +    /* We need some descriptors here */
>>> +    assert(elem->out_num || elem->in_num);
>>
>> Looks like this could be triggered by guest, we need fail instead assert
>> here.
>>
> My understanding was that virtqueue_pop already sanitized that case,
> but I'm not able to find where now. I will recheck and, in case it's
> not, I will move to a failure.
>
>>> +
>>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
>>> +                            elem->in_num > 0, false);
>>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
>>> +
>>> +    /*
>>> +     * Put entry in available array (but don't update avail->idx until they
>>> +     * do sync).
>>> +     */
>>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
>>> +    avail->ring[avail_idx] = cpu_to_le16(head);
>>> +    svq->avail_idx_shadow++;
>>> +
>>> +    /* Update avail index after the descriptor is wrote */
>>> +    smp_wmb();
>>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>>> +
>>> +    return head;
>>> +}
>>> +
>>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>>> +{
>>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
>>> +
>>> +    svq->ring_id_maps[qemu_head] = elem;
>>> +}
>>> +
>>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>>> +{
>>> +    /* We need to expose available array entries before checking used flags */
>>> +    smp_mb();
>>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>>>            return;
>>>        }
>>>
>>>        event_notifier_set(&svq->hdev_kick);
>>>    }
>>>
>>> -/* Forward vhost notifications */
>>> +/**
>>> + * Forward available buffers.
>>> + *
>>> + * @svq Shadow VirtQueue
>>> + *
>>> + * Note that this function does not guarantee that all guest's available
>>> + * buffers are available to the device in SVQ avail ring. The guest may have
>>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
>>> + * vaddr.
>>> + *
>>> + * If that happens, guest's kick notifications will be disabled until device
>>> + * makes some buffers used.
>>> + */
>>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>>> +{
>>> +    /* Clear event notifier */
>>> +    event_notifier_test_and_clear(&svq->svq_kick);
>>> +
>>> +    /* Make available as many buffers as possible */
>>> +    do {
>>> +        if (virtio_queue_get_notification(svq->vq)) {
>>> +            virtio_queue_set_notification(svq->vq, false);
>>
>> This looks like an optimization the should belong to
>> virtio_queue_set_notification() itself.
>>
> Sure we can move.
>
>>> +        }
>>> +
>>> +        while (true) {
>>> +            VirtQueueElement *elem;
>>> +
>>> +            if (svq->next_guest_avail_elem) {
>>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +            } else {
>>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
>>> +            }
>>> +
>>> +            if (!elem) {
>>> +                break;
>>> +            }
>>> +
>>> +            if (elem->out_num + elem->in_num >
>>> +                vhost_svq_available_slots(svq)) {
>>> +                /*
>>> +                 * This condition is possible since a contiguous buffer in GPA
>>> +                 * does not imply a contiguous buffer in qemu's VA
>>> +                 * scatter-gather segments. If that happen, the buffer exposed
>>> +                 * to the device needs to be a chain of descriptors at this
>>> +                 * moment.
>>> +                 *
>>> +                 * SVQ cannot hold more available buffers if we are here:
>>> +                 * queue the current guest descriptor and ignore further kicks
>>> +                 * until some elements are used.
>>> +                 */
>>> +                svq->next_guest_avail_elem = elem;
>>> +                return;
>>> +            }
>>> +
>>> +            vhost_svq_add(svq, elem);
>>> +            vhost_svq_kick(svq);
>>> +        }
>>> +
>>> +        virtio_queue_set_notification(svq->vq, true);
>>> +    } while (!virtio_queue_empty(svq->vq));
>>> +}
>>> +
>>> +/**
>>> + * Handle guest's kick.
>>> + *
>>> + * @n guest kick event notifier, the one that guest set to notify svq.
>>> + */
>>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
>>> +{
>>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> +                                             svq_kick);
>>> +    vhost_handle_guest_kick(svq);
>>> +}
>>> +
>>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
>>> +{
>>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
>>> +        return true;
>>> +    }
>>> +
>>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
>>> +
>>> +    return svq->last_used_idx != svq->shadow_used_idx;
>>> +}
>>> +
>>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
>>> +{
>>> +    vring_desc_t *descs = svq->vring.desc;
>>> +    const vring_used_t *used = svq->vring.used;
>>> +    vring_used_elem_t used_elem;
>>> +    uint16_t last_used;
>>> +
>>> +    if (!vhost_svq_more_used(svq)) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    /* Only get used array entries after they have been exposed by dev */
>>> +    smp_rmb();
>>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
>>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
>>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
>>> +
>>> +    svq->last_used_idx++;
>>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
>>> +        error_report("Device %s says index %u is used", svq->vdev->name,
>>> +                     used_elem.id);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
>>> +        error_report(
>>> +            "Device %s says index %u is used, but it was not available",
>>> +            svq->vdev->name, used_elem.id);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    descs[used_elem.id].next = svq->free_head;
>>> +    svq->free_head = used_elem.id;
>>> +
>>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
>>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>>> +}
>>> +
>>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>> +                            bool check_for_avail_queue)
>>> +{
>>> +    VirtQueue *vq = svq->vq;
>>> +
>>> +    /* Make as many buffers as possible used. */
>>> +    do {
>>> +        unsigned i = 0;
>>> +
>>> +        vhost_svq_set_notification(svq, false);
>>> +        while (true) {
>>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
>>> +            if (!elem) {
>>> +                break;
>>> +            }
>>> +
>>> +            if (unlikely(i >= svq->vring.num)) {
>>> +                virtio_error(svq->vdev,
>>> +                         "More than %u used buffers obtained in a %u size SVQ",
>>> +                         i, svq->vring.num);
>>> +                virtqueue_fill(vq, elem, elem->len, i);
>>> +                virtqueue_flush(vq, i);
>>
>> Let's simply use virtqueue_push() here?
>>
> virtqueue_push support to fill and flush only one element, instead of
> batch. I'm fine with either but I think the less updates to the used
> idx, the better.


Fine.


>
>>> +                i = 0;
>>
>> Do we need to bail out here?
>>
> Yes I guess we can simply return.
>
>>> +            }
>>> +            virtqueue_fill(vq, elem, elem->len, i++);
>>> +        }
>>> +
>>> +        virtqueue_flush(vq, i);
>>> +        event_notifier_set(&svq->svq_call);
>>> +
>>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
>>> +            /*
>>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
>>> +             * good moment to make more descriptors available if possible
>>> +             */
>>> +            vhost_handle_guest_kick(svq);
>>
>> Is there better to have a similar check as vhost_handle_guest_kick() did?
>>
>>               if (elem->out_num + elem->in_num >
>>                   vhost_svq_available_slots(svq)) {
>>
> It will be duplicated when we call vhost_handle_guest_kick, won't it?


Right, I mis-read the code.


>
>>> +        }
>>> +
>>> +        vhost_svq_set_notification(svq, true);
>>
>> A mb() is needed here? Otherwise we may lost a call here (where
>> vhost_svq_more_used() is run before vhost_svq_set_notification()).
>>
> I'm confused here then, I thought you said this is just a hint so
> there was no need? [1]. I think the memory barrier is needed too.


Yes, it's a hint but:

1) When we disable the notification, consider the notification disable 
is just a hint, device can still raise an interrupt, so the ordering is 
meaningless and a memory barrier is not necessary (the 
vhost_svq_set_notification(svq, false))

2) When we enable the notification, though it's a hint, the device can 
choose to implement it by enabling the interrupt, in this case, the 
notification enable should be done before checking the used. Otherwise, 
the checking of more used might be done before enable the notification:

1) driver check more used
2) device add more used but no notification
3) driver enable the notification then we lost a notification here


>>> +    } while (vhost_svq_more_used(svq));
>>> +}
>>> +
>>> +/**
>>> + * Forward used buffers.
>>> + *
>>> + * @n hdev call event notifier, the one that device set to notify svq.
>>> + *
>>> + * Note that we are not making any buffers available in the loop, there is no
>>> + * way that it runs more than virtqueue size times.
>>> + */
>>>    static void vhost_svq_handle_call(EventNotifier *n)
>>>    {
>>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>>                                                 hdev_call);
>>>
>>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
>>> -        return;
>>> -    }
>>> +    /* Clear event notifier */
>>> +    event_notifier_test_and_clear(n);
>>
>> Any reason that we remove the above check?
>>
> This comes from the previous versions, where this made sure we missed
> no used buffers in the process of switching to SVQ mode.


I'm not sure I get here. Even if for the switching, it should be more 
safe the handle the flush unconditionally?

Thanks


>
> If we enable SVQ from the beginning I think we can rely on getting all
> the device's used buffer notifications, so let me think a little bit
> and I can move to check the eventfd.
>
>>> -    event_notifier_set(&svq->svq_call);
>>> +    vhost_svq_flush(svq, true);
>>>    }
>>>
>>>    /**
>>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>         * need to explicitely check for them.
>>>         */
>>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
>>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
>>> +    event_notifier_set_handler(&svq->svq_kick,
>>> +                               vhost_handle_guest_kick_notifier);
>>>
>>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
>>>            event_notifier_set(&svq->hdev_kick);
>>>        }
>>>    }
>>>
>>> +/**
>>> + * Start shadow virtqueue operation.
>>> + *
>>> + * @svq Shadow Virtqueue
>>> + * @vdev        VirtIO device
>>> + * @vq          Virtqueue to shadow
>>> + */
>>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>> +                     VirtQueue *vq)
>>> +{
>>> +    svq->next_guest_avail_elem = NULL;
>>> +    svq->avail_idx_shadow = 0;
>>> +    svq->shadow_used_idx = 0;
>>> +    svq->last_used_idx = 0;
>>> +    svq->vdev = vdev;
>>> +    svq->vq = vq;
>>> +
>>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
>>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
>>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
>>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Stop shadow virtqueue operation.
>>>     * @svq Shadow Virtqueue
>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    {
>>>        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>> +
>>> +    if (!svq->vq) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Send all pending used descriptors to guest */
>>> +    vhost_svq_flush(svq, false);
>>> +
>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>> +        g_autofree VirtQueueElement *elem = NULL;
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        if (elem) {
>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>> +        }
>>> +    }
>>> +
>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +    if (next_avail_elem) {
>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>> +                                 next_avail_elem->len);
>>> +    }
>>>    }
>>>
>>>    /**
>>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>        memset(svq->vring.desc, 0, driver_size);
>>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>>>        memset(svq->vring.used, 0, device_size);
>>> -
>>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>>>        return g_steal_pointer(&svq);
>>>
>>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>>>        event_notifier_cleanup(&vq->hdev_kick);
>>>        event_notifier_set_handler(&vq->hdev_call, NULL);
>>>        event_notifier_cleanup(&vq->hdev_call);
>>> +    g_free(vq->ring_id_maps);
>>>        qemu_vfree(vq->vring.desc);
>>>        qemu_vfree(vq->vring.used);
>>>        g_free(vq);
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 53e14bafa0..0e5c00ed7e 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>     * Note that this function does not rewind kick file descriptor if cannot set
>>>     * call one.
>>>     */
>>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> -                                VhostShadowVirtqueue *svq,
>>> -                                unsigned idx)
>>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
>>> +                                  VhostShadowVirtqueue *svq,
>>> +                                  unsigned idx)
>>>    {
>>>        struct vhost_vring_file file = {
>>>            .index = dev->vq_index + idx,
>>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>>>        if (unlikely(r != 0)) {
>>>            error_report("Can't set device kick fd (%d)", -r);
>>> -        return false;
>>> +        return r;
>>>        }
>>>
>>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
>>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>>            error_report("Can't set device call fd (%d)", -r);
>>>        }
>>>
>>> +    return r;
>>> +}
>>> +
>>> +/**
>>> + * Unmap SVQ area in the device
>>> + */
>>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
>>> +                                      hwaddr size)
>>> +{
>>> +    int r;
>>> +
>>> +    size = ROUND_UP(size, qemu_real_host_page_size);
>>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
>>> +    return r == 0;
>>> +}
>>> +
>>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>>> +                                       const VhostShadowVirtqueue *svq)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    struct vhost_vring_addr svq_addr;
>>> +    size_t device_size = vhost_svq_device_area_size(svq);
>>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
>>> +    bool ok;
>>> +
>>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
>>> +
>>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
>>> +    if (unlikely(!ok)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
>>> +}
>>> +
>>> +/**
>>> + * Map shadow virtqueue rings in device
>>> + *
>>> + * @dev   The vhost device
>>> + * @svq   The shadow virtqueue
>>> + */
>>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
>>> +                                     const VhostShadowVirtqueue *svq)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    struct vhost_vring_addr svq_addr;
>>> +    size_t device_size = vhost_svq_device_area_size(svq);
>>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
>>> +    int r;
>>> +
>>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
>>> +
>>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
>>> +                           (void *)svq_addr.desc_user_addr, true);
>>> +    if (unlikely(r != 0)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
>>> +                           (void *)svq_addr.used_user_addr, false);
>>
>> Do we need unmap the driver area if we fail here?
>>
> Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> think we need to unmap as you say.
>
> Thanks!
>
> [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
>
>> Thanks
>>
>>
>>> +    return r == 0;
>>> +}
>>> +
>>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> +                                VhostShadowVirtqueue *svq,
>>> +                                unsigned idx)
>>> +{
>>> +    uint16_t vq_index = dev->vq_index + idx;
>>> +    struct vhost_vring_state s = {
>>> +        .index = vq_index,
>>> +    };
>>> +    int r;
>>> +    bool ok;
>>> +
>>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
>>> +    if (unlikely(r)) {
>>> +        error_report("Can't set vring base (%d)", r);
>>> +        return false;
>>> +    }
>>> +
>>> +    s.num = vhost_svq_get_num(svq);
>>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
>>> +    if (unlikely(r)) {
>>> +        error_report("Can't set vring num (%d)", r);
>>> +        return false;
>>> +    }
>>> +
>>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
>>> +    if (unlikely(!ok)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
>>>        return r == 0;
>>>    }
>>>
>>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>        if (started) {
>>>            vhost_vdpa_host_notifiers_init(dev);
>>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>>>                if (unlikely(!ok)) {
>>>                    return -1;
>>>                }
>>> +            vhost_svq_start(svq, dev->vdev, vq);
>>>            }
>>>            vhost_vdpa_set_vring_ready(dev);
>>>        } else {
>>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
>>> +                                                          i);
>>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
>>> +            if (unlikely(!ok)) {
>>> +                return -1;
>>> +            }
>>> +        }
>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>        }
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-02-08  8:11         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:11 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Initial version of shadow virtqueue that actually forward buffers. There
>>> is no iommu support at the moment, and that will be addressed in future
>>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
>>> this means that SVQ is not usable at this point of the series on any
>>> device.
>>>
>>> For simplicity it only supports modern devices, that expects vring
>>> in little endian, with split ring and no event idx or indirect
>>> descriptors. Support for them will not be added in this series.
>>>
>>> It reuses the VirtQueue code for the device part. The driver part is
>>> based on Linux's virtio_ring driver, but with stripped functionality
>>> and optimizations so it's easier to review.
>>>
>>> However, forwarding buffers have some particular pieces: One of the most
>>> unexpected ones is that a guest's buffer can expand through more than
>>> one descriptor in SVQ. While this is handled gracefully by qemu's
>>> emulated virtio devices, it may cause unexpected SVQ queue full. This
>>> patch also solves it by checking for this condition at both guest's
>>> kicks and device's calls. The code may be more elegant in the future if
>>> SVQ code runs in its own iocontext.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
>>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
>>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
>>>    3 files changed, 462 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 39aef5ffdf..19c934af49 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>
>>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>> +                     VirtQueue *vq);
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>
>>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 7c168075d7..a1a404f68f 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -9,6 +9,8 @@
>>>
>>>    #include "qemu/osdep.h"
>>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
>>> +#include "hw/virtio/vhost.h"
>>> +#include "hw/virtio/virtio-access.h"
>>>    #include "standard-headers/linux/vhost_types.h"
>>>
>>>    #include "qemu/error-report.h"
>>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
>>>
>>>        /* Guest's call notifier, where SVQ calls guest. */
>>>        EventNotifier svq_call;
>>> +
>>> +    /* Virtio queue shadowing */
>>> +    VirtQueue *vq;
>>> +
>>> +    /* Virtio device */
>>> +    VirtIODevice *vdev;
>>> +
>>> +    /* Map for returning guest's descriptors */
>>> +    VirtQueueElement **ring_id_maps;
>>> +
>>> +    /* Next VirtQueue element that guest made available */
>>> +    VirtQueueElement *next_guest_avail_elem;
>>> +
>>> +    /* Next head to expose to device */
>>> +    uint16_t avail_idx_shadow;
>>> +
>>> +    /* Next free descriptor */
>>> +    uint16_t free_head;
>>> +
>>> +    /* Last seen used idx */
>>> +    uint16_t shadow_used_idx;
>>> +
>>> +    /* Next head to consume from device */
>>> +    uint16_t last_used_idx;
>>> +
>>> +    /* Cache for the exposed notification flag */
>>> +    bool notification;
>>>    } VhostShadowVirtqueue;
>>>
>>>    #define INVALID_SVQ_KICK_FD -1
>>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
>>>        return true;
>>>    }
>>>
>>> -/* Forward guest notifications */
>>> -static void vhost_handle_guest_kick(EventNotifier *n)
>>> +/**
>>> + * Number of descriptors that SVQ can make available from the guest.
>>> + *
>>> + * @svq   The svq
>>> + */
>>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>>>    {
>>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> -                                             svq_kick);
>>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
>>> +}
>>> +
>>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
>>> +{
>>> +    uint16_t notification_flag;
>>>
>>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
>>> +    if (svq->notification == enable) {
>>> +        return;
>>> +    }
>>> +
>>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
>>> +
>>> +    svq->notification = enable;
>>> +    if (enable) {
>>> +        svq->vring.avail->flags &= ~notification_flag;
>>> +    } else {
>>> +        svq->vring.avail->flags |= notification_flag;
>>> +    }
>>> +}
>>> +
>>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>>> +                                    const struct iovec *iovec,
>>> +                                    size_t num, bool more_descs, bool write)
>>> +{
>>> +    uint16_t i = svq->free_head, last = svq->free_head;
>>> +    unsigned n;
>>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>>> +    vring_desc_t *descs = svq->vring.desc;
>>> +
>>> +    if (num == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    for (n = 0; n < num; n++) {
>>> +        if (more_descs || (n + 1 < num)) {
>>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
>>> +        } else {
>>> +            descs[i].flags = flags;
>>> +        }
>>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
>>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
>>> +
>>> +        last = i;
>>> +        i = cpu_to_le16(descs[i].next);
>>> +    }
>>> +
>>> +    svq->free_head = le16_to_cpu(descs[last].next);
>>> +}
>>> +
>>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
>>> +                                    VirtQueueElement *elem)
>>> +{
>>> +    int head;
>>> +    unsigned avail_idx;
>>> +    vring_avail_t *avail = svq->vring.avail;
>>> +
>>> +    head = svq->free_head;
>>> +
>>> +    /* We need some descriptors here */
>>> +    assert(elem->out_num || elem->in_num);
>>
>> Looks like this could be triggered by guest, we need fail instead assert
>> here.
>>
> My understanding was that virtqueue_pop already sanitized that case,
> but I'm not able to find where now. I will recheck and, in case it's
> not, I will move to a failure.
>
>>> +
>>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
>>> +                            elem->in_num > 0, false);
>>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
>>> +
>>> +    /*
>>> +     * Put entry in available array (but don't update avail->idx until they
>>> +     * do sync).
>>> +     */
>>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
>>> +    avail->ring[avail_idx] = cpu_to_le16(head);
>>> +    svq->avail_idx_shadow++;
>>> +
>>> +    /* Update avail index after the descriptor is wrote */
>>> +    smp_wmb();
>>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>>> +
>>> +    return head;
>>> +}
>>> +
>>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>>> +{
>>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
>>> +
>>> +    svq->ring_id_maps[qemu_head] = elem;
>>> +}
>>> +
>>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>>> +{
>>> +    /* We need to expose available array entries before checking used flags */
>>> +    smp_mb();
>>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>>>            return;
>>>        }
>>>
>>>        event_notifier_set(&svq->hdev_kick);
>>>    }
>>>
>>> -/* Forward vhost notifications */
>>> +/**
>>> + * Forward available buffers.
>>> + *
>>> + * @svq Shadow VirtQueue
>>> + *
>>> + * Note that this function does not guarantee that all guest's available
>>> + * buffers are available to the device in SVQ avail ring. The guest may have
>>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
>>> + * vaddr.
>>> + *
>>> + * If that happens, guest's kick notifications will be disabled until device
>>> + * makes some buffers used.
>>> + */
>>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>>> +{
>>> +    /* Clear event notifier */
>>> +    event_notifier_test_and_clear(&svq->svq_kick);
>>> +
>>> +    /* Make available as many buffers as possible */
>>> +    do {
>>> +        if (virtio_queue_get_notification(svq->vq)) {
>>> +            virtio_queue_set_notification(svq->vq, false);
>>
>> This looks like an optimization the should belong to
>> virtio_queue_set_notification() itself.
>>
> Sure we can move.
>
>>> +        }
>>> +
>>> +        while (true) {
>>> +            VirtQueueElement *elem;
>>> +
>>> +            if (svq->next_guest_avail_elem) {
>>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +            } else {
>>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
>>> +            }
>>> +
>>> +            if (!elem) {
>>> +                break;
>>> +            }
>>> +
>>> +            if (elem->out_num + elem->in_num >
>>> +                vhost_svq_available_slots(svq)) {
>>> +                /*
>>> +                 * This condition is possible since a contiguous buffer in GPA
>>> +                 * does not imply a contiguous buffer in qemu's VA
>>> +                 * scatter-gather segments. If that happen, the buffer exposed
>>> +                 * to the device needs to be a chain of descriptors at this
>>> +                 * moment.
>>> +                 *
>>> +                 * SVQ cannot hold more available buffers if we are here:
>>> +                 * queue the current guest descriptor and ignore further kicks
>>> +                 * until some elements are used.
>>> +                 */
>>> +                svq->next_guest_avail_elem = elem;
>>> +                return;
>>> +            }
>>> +
>>> +            vhost_svq_add(svq, elem);
>>> +            vhost_svq_kick(svq);
>>> +        }
>>> +
>>> +        virtio_queue_set_notification(svq->vq, true);
>>> +    } while (!virtio_queue_empty(svq->vq));
>>> +}
>>> +
>>> +/**
>>> + * Handle guest's kick.
>>> + *
>>> + * @n guest kick event notifier, the one that guest set to notify svq.
>>> + */
>>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
>>> +{
>>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> +                                             svq_kick);
>>> +    vhost_handle_guest_kick(svq);
>>> +}
>>> +
>>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
>>> +{
>>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
>>> +        return true;
>>> +    }
>>> +
>>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
>>> +
>>> +    return svq->last_used_idx != svq->shadow_used_idx;
>>> +}
>>> +
>>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
>>> +{
>>> +    vring_desc_t *descs = svq->vring.desc;
>>> +    const vring_used_t *used = svq->vring.used;
>>> +    vring_used_elem_t used_elem;
>>> +    uint16_t last_used;
>>> +
>>> +    if (!vhost_svq_more_used(svq)) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    /* Only get used array entries after they have been exposed by dev */
>>> +    smp_rmb();
>>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
>>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
>>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
>>> +
>>> +    svq->last_used_idx++;
>>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
>>> +        error_report("Device %s says index %u is used", svq->vdev->name,
>>> +                     used_elem.id);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
>>> +        error_report(
>>> +            "Device %s says index %u is used, but it was not available",
>>> +            svq->vdev->name, used_elem.id);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    descs[used_elem.id].next = svq->free_head;
>>> +    svq->free_head = used_elem.id;
>>> +
>>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
>>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>>> +}
>>> +
>>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>> +                            bool check_for_avail_queue)
>>> +{
>>> +    VirtQueue *vq = svq->vq;
>>> +
>>> +    /* Make as many buffers as possible used. */
>>> +    do {
>>> +        unsigned i = 0;
>>> +
>>> +        vhost_svq_set_notification(svq, false);
>>> +        while (true) {
>>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
>>> +            if (!elem) {
>>> +                break;
>>> +            }
>>> +
>>> +            if (unlikely(i >= svq->vring.num)) {
>>> +                virtio_error(svq->vdev,
>>> +                         "More than %u used buffers obtained in a %u size SVQ",
>>> +                         i, svq->vring.num);
>>> +                virtqueue_fill(vq, elem, elem->len, i);
>>> +                virtqueue_flush(vq, i);
>>
>> Let's simply use virtqueue_push() here?
>>
> virtqueue_push support to fill and flush only one element, instead of
> batch. I'm fine with either but I think the less updates to the used
> idx, the better.


Fine.


>
>>> +                i = 0;
>>
>> Do we need to bail out here?
>>
> Yes I guess we can simply return.
>
>>> +            }
>>> +            virtqueue_fill(vq, elem, elem->len, i++);
>>> +        }
>>> +
>>> +        virtqueue_flush(vq, i);
>>> +        event_notifier_set(&svq->svq_call);
>>> +
>>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
>>> +            /*
>>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
>>> +             * good moment to make more descriptors available if possible
>>> +             */
>>> +            vhost_handle_guest_kick(svq);
>>
>> Is there better to have a similar check as vhost_handle_guest_kick() did?
>>
>>               if (elem->out_num + elem->in_num >
>>                   vhost_svq_available_slots(svq)) {
>>
> It will be duplicated when we call vhost_handle_guest_kick, won't it?


Right, I mis-read the code.


>
>>> +        }
>>> +
>>> +        vhost_svq_set_notification(svq, true);
>>
>> A mb() is needed here? Otherwise we may lost a call here (where
>> vhost_svq_more_used() is run before vhost_svq_set_notification()).
>>
> I'm confused here then, I thought you said this is just a hint so
> there was no need? [1]. I think the memory barrier is needed too.


Yes, it's a hint but:

1) When we disable the notification, consider the notification disable 
is just a hint, device can still raise an interrupt, so the ordering is 
meaningless and a memory barrier is not necessary (the 
vhost_svq_set_notification(svq, false))

2) When we enable the notification, though it's a hint, the device can 
choose to implement it by enabling the interrupt, in this case, the 
notification enable should be done before checking the used. Otherwise, 
the checking of more used might be done before enable the notification:

1) driver check more used
2) device add more used but no notification
3) driver enable the notification then we lost a notification here


>>> +    } while (vhost_svq_more_used(svq));
>>> +}
>>> +
>>> +/**
>>> + * Forward used buffers.
>>> + *
>>> + * @n hdev call event notifier, the one that device set to notify svq.
>>> + *
>>> + * Note that we are not making any buffers available in the loop, there is no
>>> + * way that it runs more than virtqueue size times.
>>> + */
>>>    static void vhost_svq_handle_call(EventNotifier *n)
>>>    {
>>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>>                                                 hdev_call);
>>>
>>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
>>> -        return;
>>> -    }
>>> +    /* Clear event notifier */
>>> +    event_notifier_test_and_clear(n);
>>
>> Any reason that we remove the above check?
>>
> This comes from the previous versions, where this made sure we missed
> no used buffers in the process of switching to SVQ mode.


I'm not sure I get here. Even if for the switching, it should be more 
safe the handle the flush unconditionally?

Thanks


>
> If we enable SVQ from the beginning I think we can rely on getting all
> the device's used buffer notifications, so let me think a little bit
> and I can move to check the eventfd.
>
>>> -    event_notifier_set(&svq->svq_call);
>>> +    vhost_svq_flush(svq, true);
>>>    }
>>>
>>>    /**
>>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>         * need to explicitely check for them.
>>>         */
>>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
>>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
>>> +    event_notifier_set_handler(&svq->svq_kick,
>>> +                               vhost_handle_guest_kick_notifier);
>>>
>>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
>>>            event_notifier_set(&svq->hdev_kick);
>>>        }
>>>    }
>>>
>>> +/**
>>> + * Start shadow virtqueue operation.
>>> + *
>>> + * @svq Shadow Virtqueue
>>> + * @vdev        VirtIO device
>>> + * @vq          Virtqueue to shadow
>>> + */
>>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>> +                     VirtQueue *vq)
>>> +{
>>> +    svq->next_guest_avail_elem = NULL;
>>> +    svq->avail_idx_shadow = 0;
>>> +    svq->shadow_used_idx = 0;
>>> +    svq->last_used_idx = 0;
>>> +    svq->vdev = vdev;
>>> +    svq->vq = vq;
>>> +
>>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
>>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
>>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
>>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Stop shadow virtqueue operation.
>>>     * @svq Shadow Virtqueue
>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    {
>>>        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>> +
>>> +    if (!svq->vq) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Send all pending used descriptors to guest */
>>> +    vhost_svq_flush(svq, false);
>>> +
>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>> +        g_autofree VirtQueueElement *elem = NULL;
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        if (elem) {
>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>> +        }
>>> +    }
>>> +
>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +    if (next_avail_elem) {
>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>> +                                 next_avail_elem->len);
>>> +    }
>>>    }
>>>
>>>    /**
>>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>        memset(svq->vring.desc, 0, driver_size);
>>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>>>        memset(svq->vring.used, 0, device_size);
>>> -
>>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>>>        return g_steal_pointer(&svq);
>>>
>>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>>>        event_notifier_cleanup(&vq->hdev_kick);
>>>        event_notifier_set_handler(&vq->hdev_call, NULL);
>>>        event_notifier_cleanup(&vq->hdev_call);
>>> +    g_free(vq->ring_id_maps);
>>>        qemu_vfree(vq->vring.desc);
>>>        qemu_vfree(vq->vring.used);
>>>        g_free(vq);
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 53e14bafa0..0e5c00ed7e 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>     * Note that this function does not rewind kick file descriptor if cannot set
>>>     * call one.
>>>     */
>>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> -                                VhostShadowVirtqueue *svq,
>>> -                                unsigned idx)
>>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
>>> +                                  VhostShadowVirtqueue *svq,
>>> +                                  unsigned idx)
>>>    {
>>>        struct vhost_vring_file file = {
>>>            .index = dev->vq_index + idx,
>>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>>>        if (unlikely(r != 0)) {
>>>            error_report("Can't set device kick fd (%d)", -r);
>>> -        return false;
>>> +        return r;
>>>        }
>>>
>>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
>>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>>            error_report("Can't set device call fd (%d)", -r);
>>>        }
>>>
>>> +    return r;
>>> +}
>>> +
>>> +/**
>>> + * Unmap SVQ area in the device
>>> + */
>>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
>>> +                                      hwaddr size)
>>> +{
>>> +    int r;
>>> +
>>> +    size = ROUND_UP(size, qemu_real_host_page_size);
>>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
>>> +    return r == 0;
>>> +}
>>> +
>>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>>> +                                       const VhostShadowVirtqueue *svq)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    struct vhost_vring_addr svq_addr;
>>> +    size_t device_size = vhost_svq_device_area_size(svq);
>>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
>>> +    bool ok;
>>> +
>>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
>>> +
>>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
>>> +    if (unlikely(!ok)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
>>> +}
>>> +
>>> +/**
>>> + * Map shadow virtqueue rings in device
>>> + *
>>> + * @dev   The vhost device
>>> + * @svq   The shadow virtqueue
>>> + */
>>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
>>> +                                     const VhostShadowVirtqueue *svq)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    struct vhost_vring_addr svq_addr;
>>> +    size_t device_size = vhost_svq_device_area_size(svq);
>>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
>>> +    int r;
>>> +
>>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
>>> +
>>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
>>> +                           (void *)svq_addr.desc_user_addr, true);
>>> +    if (unlikely(r != 0)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
>>> +                           (void *)svq_addr.used_user_addr, false);
>>
>> Do we need unmap the driver area if we fail here?
>>
> Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> think we need to unmap as you say.
>
> Thanks!
>
> [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
>
>> Thanks
>>
>>
>>> +    return r == 0;
>>> +}
>>> +
>>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> +                                VhostShadowVirtqueue *svq,
>>> +                                unsigned idx)
>>> +{
>>> +    uint16_t vq_index = dev->vq_index + idx;
>>> +    struct vhost_vring_state s = {
>>> +        .index = vq_index,
>>> +    };
>>> +    int r;
>>> +    bool ok;
>>> +
>>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
>>> +    if (unlikely(r)) {
>>> +        error_report("Can't set vring base (%d)", r);
>>> +        return false;
>>> +    }
>>> +
>>> +    s.num = vhost_svq_get_num(svq);
>>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
>>> +    if (unlikely(r)) {
>>> +        error_report("Can't set vring num (%d)", r);
>>> +        return false;
>>> +    }
>>> +
>>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
>>> +    if (unlikely(!ok)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
>>>        return r == 0;
>>>    }
>>>
>>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>        if (started) {
>>>            vhost_vdpa_host_notifiers_init(dev);
>>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>>>                if (unlikely(!ok)) {
>>>                    return -1;
>>>                }
>>> +            vhost_svq_start(svq, dev->vdev, vq);
>>>            }
>>>            vhost_vdpa_set_vring_ready(dev);
>>>        } else {
>>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
>>> +                                                          i);
>>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
>>> +            if (unlikely(!ok)) {
>>> +                return -1;
>>> +            }
>>> +        }
>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>        }
>>>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-01 11:25     ` Eugenio Perez Martin
@ 2022-02-08  8:15         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:15 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    {
>>>        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>> +
>>> +    if (!svq->vq) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Send all pending used descriptors to guest */
>>> +    vhost_svq_flush(svq, false);
>>
>> Do we need to wait for all the pending descriptors to be completed here?
>>
> No, this function does not wait, it only completes the forwarding of
> the *used* descriptors.
>
> The best example is the net rx queue in my opinion. This call will
> check SVQ's vring used_idx and will forward the last used descriptors
> if any, but all available descriptors will remain as available for
> qemu's VQ code.
>
> To skip it would miss those last rx descriptors in migration.
>
> Thanks!


So it's probably to not the best place to ask. It's more about the 
inflight descriptors so it should be TX instead of RX.

I can imagine the migration last phase, we should stop the vhost-vDPA 
before calling vhost_svq_stop(). Then we should be fine regardless of 
inflight descriptors.

Thanks


>
>> Thanks
>>
>>
>>> +
>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>> +        g_autofree VirtQueueElement *elem = NULL;
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        if (elem) {
>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>> +        }
>>> +    }
>>> +
>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +    if (next_avail_elem) {
>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>> +                                 next_avail_elem->len);
>>> +    }
>>>    }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-02-08  8:15         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:15 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    {
>>>        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>> +
>>> +    if (!svq->vq) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Send all pending used descriptors to guest */
>>> +    vhost_svq_flush(svq, false);
>>
>> Do we need to wait for all the pending descriptors to be completed here?
>>
> No, this function does not wait, it only completes the forwarding of
> the *used* descriptors.
>
> The best example is the net rx queue in my opinion. This call will
> check SVQ's vring used_idx and will forward the last used descriptors
> if any, but all available descriptors will remain as available for
> qemu's VQ code.
>
> To skip it would miss those last rx descriptors in migration.
>
> Thanks!


So it's probably to not the best place to ask. It's more about the 
inflight descriptors so it should be TX instead of RX.

I can imagine the migration last phase, we should stop the vhost-vDPA 
before calling vhost_svq_stop(). Then we should be fine regardless of 
inflight descriptors.

Thanks


>
>> Thanks
>>
>>
>>> +
>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>> +        g_autofree VirtQueueElement *elem = NULL;
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        if (elem) {
>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>> +        }
>>> +    }
>>> +
>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +    if (next_avail_elem) {
>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>> +                                 next_avail_elem->len);
>>> +    }
>>>    }



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
  2022-02-01 17:27     ` Eugenio Perez Martin
@ 2022-02-08  8:17         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:17 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/2 上午1:27, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 6:21 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This tree is able to look for a translated address from an IOVA address.
>>>
>>> At first glance it is similar to util/iova-tree. However, SVQ working on
>>> devices with limited IOVA space need more capabilities,
>>
>> So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and
>> without RMRRs).
>>
>>
>>>    like allocating
>>> IOVA chunks or performing reverse translations (qemu addresses to iova).
>>
>> This looks like a general request as well. So I wonder if we can simply
>> extend iova tree instead.
>>
> While both are true, I don't see code that performs allocations or
> qemu vaddr to iova translations. But if the changes can be integrated
> into iova-tree that would be great for sure.
>
> The main drawback I see is the need to maintain two trees instead of
> one for users of iova-tree. While complexity does not grow, it needs
> to double the amount of work needed.


If you care about the performance, we can disable the reverse mapping 
during the allocation. For vIOMMU users it won't notice any performance 
penalty.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> The allocation capability, as "assign a free IOVA address to this chunk
>>> of memory in qemu's address space" allows shadow virtqueue to create a
>>> new address space that is not restricted by guest's addressable one, so
>>> we can allocate shadow vqs vrings outside of it.
>>>
>>> It duplicates the tree so it can search efficiently both directions,
>>> and it will signal overlap if iova or the translated address is
>>> present in any tree.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-iova-tree.h |  27 +++++++
>>>    hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
>>>    hw/virtio/meson.build       |   2 +-
>>>    3 files changed, 185 insertions(+), 1 deletion(-)
>>>    create mode 100644 hw/virtio/vhost-iova-tree.h
>>>    create mode 100644 hw/virtio/vhost-iova-tree.c
>>>
>>> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
>>> new file mode 100644
>>> index 0000000000..610394eaf1
>>> --- /dev/null
>>> +++ b/hw/virtio/vhost-iova-tree.h
>>> @@ -0,0 +1,27 @@
>>> +/*
>>> + * vhost software live migration ring
>>> + *
>>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
>>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
>>> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
>>> +
>>> +#include "qemu/iova-tree.h"
>>> +#include "exec/memory.h"
>>> +
>>> +typedef struct VhostIOVATree VhostIOVATree;
>>> +
>>> +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
>>> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
>>> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
>>> +
>>> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
>>> +                                        const DMAMap *map);
>>> +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
>>> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
>>> +
>>> +#endif
>>> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
>>> new file mode 100644
>>> index 0000000000..0021dbaf54
>>> --- /dev/null
>>> +++ b/hw/virtio/vhost-iova-tree.c
>>> @@ -0,0 +1,157 @@
>>> +/*
>>> + * vhost software live migration ring
>>> + *
>>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
>>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/iova-tree.h"
>>> +#include "vhost-iova-tree.h"
>>> +
>>> +#define iova_min_addr qemu_real_host_page_size
>>> +
>>> +/**
>>> + * VhostIOVATree, able to:
>>> + * - Translate iova address
>>> + * - Reverse translate iova address (from translated to iova)
>>> + * - Allocate IOVA regions for translated range (potentially slow operation)
>>> + *
>>> + * Note that it cannot remove nodes.
>>> + */
>>> +struct VhostIOVATree {
>>> +    /* First addresable iova address in the device */
>>> +    uint64_t iova_first;
>>> +
>>> +    /* Last addressable iova address in the device */
>>> +    uint64_t iova_last;
>>> +
>>> +    /* IOVA address to qemu memory maps. */
>>> +    IOVATree *iova_taddr_map;
>>> +
>>> +    /* QEMU virtual memory address to iova maps */
>>> +    GTree *taddr_iova_map;
>>> +};
>>> +
>>> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
>>> +                                      gpointer data)
>>> +{
>>> +    const DMAMap *m1 = a, *m2 = b;
>>> +
>>> +    if (m1->translated_addr > m2->translated_addr + m2->size) {
>>> +        return 1;
>>> +    }
>>> +
>>> +    if (m1->translated_addr + m1->size < m2->translated_addr) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    /* Overlapped */
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>> + * Create a new IOVA tree
>>> + *
>>> + * Returns the new IOVA tree
>>> + */
>>> +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
>>> +{
>>> +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
>>> +
>>> +    /* Some devices does not like 0 addresses */
>>> +    tree->iova_first = MAX(iova_first, iova_min_addr);
>>> +    tree->iova_last = iova_last;
>>> +
>>> +    tree->iova_taddr_map = iova_tree_new();
>>> +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
>>> +                                           NULL, g_free);
>>> +    return tree;
>>> +}
>>> +
>>> +/**
>>> + * Delete an iova tree
>>> + */
>>> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
>>> +{
>>> +    iova_tree_destroy(iova_tree->iova_taddr_map);
>>> +    g_tree_unref(iova_tree->taddr_iova_map);
>>> +    g_free(iova_tree);
>>> +}
>>> +
>>> +/**
>>> + * Find the IOVA address stored from a memory address
>>> + *
>>> + * @tree     The iova tree
>>> + * @map      The map with the memory address
>>> + *
>>> + * Return the stored mapping, or NULL if not found.
>>> + */
>>> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
>>> +                                        const DMAMap *map)
>>> +{
>>> +    return g_tree_lookup(tree->taddr_iova_map, map);
>>> +}
>>> +
>>> +/**
>>> + * Allocate a new mapping
>>> + *
>>> + * @tree  The iova tree
>>> + * @map   The iova map
>>> + *
>>> + * Returns:
>>> + * - IOVA_OK if the map fits in the container
>>> + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
>>> + * - IOVA_ERR_OVERLAP if the tree already contains that map
>>> + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
>>> + *
>>> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
>>> + */
>>> +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
>>> +{
>>> +    /* Some vhost devices does not like addr 0. Skip first page */
>>> +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
>>> +    DMAMap *new;
>>> +    int r;
>>> +
>>> +    if (map->translated_addr + map->size < map->translated_addr ||
>>> +        map->perm == IOMMU_NONE) {
>>> +        return IOVA_ERR_INVALID;
>>> +    }
>>> +
>>> +    /* Check for collisions in translated addresses */
>>> +    if (vhost_iova_tree_find_iova(tree, map)) {
>>> +        return IOVA_ERR_OVERLAP;
>>> +    }
>>> +
>>> +    /* Allocate a node in IOVA address */
>>> +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
>>> +                        tree->iova_last);
>>> +    if (r != IOVA_OK) {
>>> +        return r;
>>> +    }
>>> +
>>> +    /* Allocate node in qemu -> iova translations */
>>> +    new = g_malloc(sizeof(*new));
>>> +    memcpy(new, map, sizeof(*new));
>>> +    g_tree_insert(tree->taddr_iova_map, new, new);
>>> +    return IOVA_OK;
>>> +}
>>> +
>>> +/**
>>> + * Remove existing mappings from iova tree
>>> + *
>>> + * @param  iova_tree  The vhost iova tree
>>> + * @param  map        The map to remove
>>> + */
>>> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
>>> +{
>>> +    const DMAMap *overlap;
>>> +
>>> +    iova_tree_remove(iova_tree->iova_taddr_map, map);
>>> +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
>>> +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
>>> +    }
>>> +}
>>> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
>>> index 2dc87613bc..6047670804 100644
>>> --- a/hw/virtio/meson.build
>>> +++ b/hw/virtio/meson.build
>>> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>>>
>>>    virtio_ss = ss.source_set()
>>>    virtio_ss.add(files('virtio.c'))
>>> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
>>> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
>>>    virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>>>    virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>>>    virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
@ 2022-02-08  8:17         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:17 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/2 上午1:27, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 6:21 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This tree is able to look for a translated address from an IOVA address.
>>>
>>> At first glance it is similar to util/iova-tree. However, SVQ working on
>>> devices with limited IOVA space need more capabilities,
>>
>> So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and
>> without RMRRs).
>>
>>
>>>    like allocating
>>> IOVA chunks or performing reverse translations (qemu addresses to iova).
>>
>> This looks like a general request as well. So I wonder if we can simply
>> extend iova tree instead.
>>
> While both are true, I don't see code that performs allocations or
> qemu vaddr to iova translations. But if the changes can be integrated
> into iova-tree that would be great for sure.
>
> The main drawback I see is the need to maintain two trees instead of
> one for users of iova-tree. While complexity does not grow, it needs
> to double the amount of work needed.


If you care about the performance, we can disable the reverse mapping 
during the allocation. For vIOMMU users it won't notice any performance 
penalty.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> The allocation capability, as "assign a free IOVA address to this chunk
>>> of memory in qemu's address space" allows shadow virtqueue to create a
>>> new address space that is not restricted by guest's addressable one, so
>>> we can allocate shadow vqs vrings outside of it.
>>>
>>> It duplicates the tree so it can search efficiently both directions,
>>> and it will signal overlap if iova or the translated address is
>>> present in any tree.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-iova-tree.h |  27 +++++++
>>>    hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
>>>    hw/virtio/meson.build       |   2 +-
>>>    3 files changed, 185 insertions(+), 1 deletion(-)
>>>    create mode 100644 hw/virtio/vhost-iova-tree.h
>>>    create mode 100644 hw/virtio/vhost-iova-tree.c
>>>
>>> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
>>> new file mode 100644
>>> index 0000000000..610394eaf1
>>> --- /dev/null
>>> +++ b/hw/virtio/vhost-iova-tree.h
>>> @@ -0,0 +1,27 @@
>>> +/*
>>> + * vhost software live migration ring
>>> + *
>>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
>>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
>>> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
>>> +
>>> +#include "qemu/iova-tree.h"
>>> +#include "exec/memory.h"
>>> +
>>> +typedef struct VhostIOVATree VhostIOVATree;
>>> +
>>> +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
>>> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
>>> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
>>> +
>>> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
>>> +                                        const DMAMap *map);
>>> +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
>>> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
>>> +
>>> +#endif
>>> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
>>> new file mode 100644
>>> index 0000000000..0021dbaf54
>>> --- /dev/null
>>> +++ b/hw/virtio/vhost-iova-tree.c
>>> @@ -0,0 +1,157 @@
>>> +/*
>>> + * vhost software live migration ring
>>> + *
>>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
>>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/iova-tree.h"
>>> +#include "vhost-iova-tree.h"
>>> +
>>> +#define iova_min_addr qemu_real_host_page_size
>>> +
>>> +/**
>>> + * VhostIOVATree, able to:
>>> + * - Translate iova address
>>> + * - Reverse translate iova address (from translated to iova)
>>> + * - Allocate IOVA regions for translated range (potentially slow operation)
>>> + *
>>> + * Note that it cannot remove nodes.
>>> + */
>>> +struct VhostIOVATree {
>>> +    /* First addresable iova address in the device */
>>> +    uint64_t iova_first;
>>> +
>>> +    /* Last addressable iova address in the device */
>>> +    uint64_t iova_last;
>>> +
>>> +    /* IOVA address to qemu memory maps. */
>>> +    IOVATree *iova_taddr_map;
>>> +
>>> +    /* QEMU virtual memory address to iova maps */
>>> +    GTree *taddr_iova_map;
>>> +};
>>> +
>>> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
>>> +                                      gpointer data)
>>> +{
>>> +    const DMAMap *m1 = a, *m2 = b;
>>> +
>>> +    if (m1->translated_addr > m2->translated_addr + m2->size) {
>>> +        return 1;
>>> +    }
>>> +
>>> +    if (m1->translated_addr + m1->size < m2->translated_addr) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    /* Overlapped */
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>> + * Create a new IOVA tree
>>> + *
>>> + * Returns the new IOVA tree
>>> + */
>>> +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
>>> +{
>>> +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
>>> +
>>> +    /* Some devices does not like 0 addresses */
>>> +    tree->iova_first = MAX(iova_first, iova_min_addr);
>>> +    tree->iova_last = iova_last;
>>> +
>>> +    tree->iova_taddr_map = iova_tree_new();
>>> +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
>>> +                                           NULL, g_free);
>>> +    return tree;
>>> +}
>>> +
>>> +/**
>>> + * Delete an iova tree
>>> + */
>>> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
>>> +{
>>> +    iova_tree_destroy(iova_tree->iova_taddr_map);
>>> +    g_tree_unref(iova_tree->taddr_iova_map);
>>> +    g_free(iova_tree);
>>> +}
>>> +
>>> +/**
>>> + * Find the IOVA address stored from a memory address
>>> + *
>>> + * @tree     The iova tree
>>> + * @map      The map with the memory address
>>> + *
>>> + * Return the stored mapping, or NULL if not found.
>>> + */
>>> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
>>> +                                        const DMAMap *map)
>>> +{
>>> +    return g_tree_lookup(tree->taddr_iova_map, map);
>>> +}
>>> +
>>> +/**
>>> + * Allocate a new mapping
>>> + *
>>> + * @tree  The iova tree
>>> + * @map   The iova map
>>> + *
>>> + * Returns:
>>> + * - IOVA_OK if the map fits in the container
>>> + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
>>> + * - IOVA_ERR_OVERLAP if the tree already contains that map
>>> + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
>>> + *
>>> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
>>> + */
>>> +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
>>> +{
>>> +    /* Some vhost devices does not like addr 0. Skip first page */
>>> +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
>>> +    DMAMap *new;
>>> +    int r;
>>> +
>>> +    if (map->translated_addr + map->size < map->translated_addr ||
>>> +        map->perm == IOMMU_NONE) {
>>> +        return IOVA_ERR_INVALID;
>>> +    }
>>> +
>>> +    /* Check for collisions in translated addresses */
>>> +    if (vhost_iova_tree_find_iova(tree, map)) {
>>> +        return IOVA_ERR_OVERLAP;
>>> +    }
>>> +
>>> +    /* Allocate a node in IOVA address */
>>> +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
>>> +                        tree->iova_last);
>>> +    if (r != IOVA_OK) {
>>> +        return r;
>>> +    }
>>> +
>>> +    /* Allocate node in qemu -> iova translations */
>>> +    new = g_malloc(sizeof(*new));
>>> +    memcpy(new, map, sizeof(*new));
>>> +    g_tree_insert(tree->taddr_iova_map, new, new);
>>> +    return IOVA_OK;
>>> +}
>>> +
>>> +/**
>>> + * Remove existing mappings from iova tree
>>> + *
>>> + * @param  iova_tree  The vhost iova tree
>>> + * @param  map        The map to remove
>>> + */
>>> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
>>> +{
>>> +    const DMAMap *overlap;
>>> +
>>> +    iova_tree_remove(iova_tree->iova_taddr_map, map);
>>> +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
>>> +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
>>> +    }
>>> +}
>>> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
>>> index 2dc87613bc..6047670804 100644
>>> --- a/hw/virtio/meson.build
>>> +++ b/hw/virtio/meson.build
>>> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>>>
>>>    virtio_ss = ss.source_set()
>>>    virtio_ss.add(files('virtio.c'))
>>> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
>>> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
>>>    virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>>>    virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>>>    virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
  2022-01-31 19:11     ` Eugenio Perez Martin
@ 2022-02-08  8:19         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:19 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 上午3:11, Eugenio Perez Martin 写道:
>>> +            return false;
>>> +        }
>>> +
>>> +        /*
>>> +         * Map->iova chunk size is ignored. What to do if descriptor
>>> +         * (addr, size) does not fit is delegated to the device.
>>> +         */
>> I think we need at least check the size and fail if the size doesn't
>> match here. Or is it possible that we have a buffer that may cross two
>> memory regions?
>>
> It should be impossible, since both iova_tree and VirtQueue should be
> in sync regarding the memory regions updates. If a VirtQueue buffer
> crosses many memory regions, iovec has more entries.
>
> I can add a return false, but I'm not able to trigger that situation
> even with a malformed driver.
>

Ok, but it won't harm to add a warn here.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
@ 2022-02-08  8:19         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:19 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/1 上午3:11, Eugenio Perez Martin 写道:
>>> +            return false;
>>> +        }
>>> +
>>> +        /*
>>> +         * Map->iova chunk size is ignored. What to do if descriptor
>>> +         * (addr, size) does not fit is delegated to the device.
>>> +         */
>> I think we need at least check the size and fail if the size doesn't
>> match here. Or is it possible that we have a buffer that may cross two
>> memory regions?
>>
> It should be impossible, since both iova_tree and VirtQueue should be
> in sync regarding the memory regions updates. If a VirtQueue buffer
> crosses many memory regions, iovec has more entries.
>
> I can add a return false, but I'm not able to trigger that situation
> even with a malformed driver.
>

Ok, but it won't harm to add a warn here.

Thanks



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-01 11:45     ` Eugenio Perez Martin
@ 2022-02-08  8:25         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> SVQ is able to log the dirty bits by itself, so let's use it to not
>>> block migration.
>>>
>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
>>> enabled. Even if the device supports it, the reports would be nonsense
>>> because SVQ memory is in the qemu region.
>>>
>>> The log region is still allocated. Future changes might skip that, but
>>> this series is already long enough.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>>>    1 file changed, 20 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index fb0a338baa..75090d65e8 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>>>        if (ret == 0 && v->shadow_vqs_enabled) {
>>>            /* Filter only features that SVQ can offer to guest */
>>>            vhost_svq_valid_guest_features(features);
>>> +
>>> +        /* Add SVQ logging capabilities */
>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>>>        }
>>>
>>>        return ret;
>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>>
>>>        if (v->shadow_vqs_enabled) {
>>>            uint64_t dev_features, svq_features, acked_features;
>>> +        uint8_t status = 0;
>>>            bool ok;
>>>
>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>> +        if (unlikely(ret)) {
>>> +            return ret;
>>> +        }
>>> +
>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>> +            /*
>>> +             * vhost is trying to enable or disable _F_LOG, and the device
>>> +             * would report wrong dirty pages. SVQ handles it.
>>> +             */
>>
>> I fail to understand this comment, I'd think there's no way to disable
>> dirty page tracking for SVQ.
>>
> vhost_log_global_{start,stop} are called at the beginning and end of
> migration. To inform the device that it should start logging, they set
> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.


Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The 
only thing is to ignore or filter out the F_LOG_ALL and pretend to be 
enabled and disabled.


>
> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> vhost does not block migration. Maybe we need to look for another way
> to do this?


I'm fine with filtering since it's much more simpler, but I fail to 
understand why we need to check DRIVER_OK.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> +            return 0;
>>> +        }
>>> +
>>> +        /* We must not ack _F_LOG if SVQ is enabled */
>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
>>> +
>>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>>>            if (ret != 0) {
>>>                error_report("Can't get vdpa device features, got (%d)", ret);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
@ 2022-02-08  8:25         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> SVQ is able to log the dirty bits by itself, so let's use it to not
>>> block migration.
>>>
>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
>>> enabled. Even if the device supports it, the reports would be nonsense
>>> because SVQ memory is in the qemu region.
>>>
>>> The log region is still allocated. Future changes might skip that, but
>>> this series is already long enough.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>>>    1 file changed, 20 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index fb0a338baa..75090d65e8 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>>>        if (ret == 0 && v->shadow_vqs_enabled) {
>>>            /* Filter only features that SVQ can offer to guest */
>>>            vhost_svq_valid_guest_features(features);
>>> +
>>> +        /* Add SVQ logging capabilities */
>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>>>        }
>>>
>>>        return ret;
>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>>
>>>        if (v->shadow_vqs_enabled) {
>>>            uint64_t dev_features, svq_features, acked_features;
>>> +        uint8_t status = 0;
>>>            bool ok;
>>>
>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>> +        if (unlikely(ret)) {
>>> +            return ret;
>>> +        }
>>> +
>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>> +            /*
>>> +             * vhost is trying to enable or disable _F_LOG, and the device
>>> +             * would report wrong dirty pages. SVQ handles it.
>>> +             */
>>
>> I fail to understand this comment, I'd think there's no way to disable
>> dirty page tracking for SVQ.
>>
> vhost_log_global_{start,stop} are called at the beginning and end of
> migration. To inform the device that it should start logging, they set
> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.


Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The 
only thing is to ignore or filter out the F_LOG_ALL and pretend to be 
enabled and disabled.


>
> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> vhost does not block migration. Maybe we need to look for another way
> to do this?


I'm fine with filtering since it's much more simpler, but I fail to 
understand why we need to check DRIVER_OK.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> +            return 0;
>>> +        }
>>> +
>>> +        /* We must not ack _F_LOG if SVQ is enabled */
>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
>>> +
>>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>>>            if (ret != 0) {
>>>                error_report("Can't get vdpa device features, got (%d)", ret);



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
  2022-01-31  9:15   ` Eugenio Perez Martin
@ 2022-02-08  8:27       ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:27 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午5:15, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
>>> is intended as a new method of tracking the memory the devices touch
>>> during a migration process: Instead of relay on vhost device's dirty
>>> logging capability, SVQ intercepts the VQ dataplane forwarding the
>>> descriptors between VM and device. This way qemu is the effective
>>> writer of guests memory, like in qemu's emulated virtio device
>>> operation.
>>>
>>> When SVQ is enabled qemu offers a new virtual address space to the
>>> device to read and write into, and it maps new vrings and the guest
>>> memory in it. SVQ also intercepts kicks and calls between the device
>>> and the guest. Used buffers relay would cause dirty memory being
>>> tracked, but at this RFC SVQ is not enabled on migration automatically.
>>>
>>> Thanks of being a buffers relay system, SVQ can be used also to
>>> communicate devices and drivers with different capabilities, like
>>> devices that only support packed vring and not split and old guests with
>>> no driver packed support.
>>>
>>> It is based on the ideas of DPDK SW assisted LM, in the series of
>>> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
>>> not map the shadow vq in guest's VA, but in qemu's.
>>>
>>> This version of SVQ is limited in the amount of features it can use with
>>> guest and device, because this series is already very big otherwise.
>>> Features like indirect or event_idx will be addressed in future series.
>>>
>>> SVQ needs to be enabled with cmdline parameter x-svq, like:
>>>
>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
>>>
>>> In this version it cannot be enabled or disabled in runtime. Further
>>> series will remove this limitation and will enable it only for migration
>>> time.
>>>
>>> Some patches are intentionally very small to ease review, but they can
>>> be squashed if preferred.
>>>
>>> Patches 1-10 prepares the SVQ and QEMU to support both guest to device
>>> and device to guest notifications forwarding, with the extra qemu hop.
>>> That part can be tested in isolation if cmdline change is reproduced.
>>>
>>> Patches from 11 to 18 implement the actual buffer forwarding, but with
>>> no IOMMU support. It requires a vdpa device capable of addressing all
>>> qemu vaddr.
>>>
>>> Patches 19 to 23 adds the iommu support, so the device with address
>>> range limitations can access SVQ through this new virtual address space
>>> created.
>>>
>>> The rest of the series add the last pieces needed for migration.
>>>
>>> Comments are welcome.
>>
>> I wonder the performance impact. So performance numbers are more than
>> welcomed.
>>
> Sure, I'll do it for the next revision. Since this one brings a decent
> amount of changes, I chose to collect the feedback first.


A simple single TCP_STREAM netperf test should be sufficient to give 
some basic understanding about the performance impact.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> TODO:
>>> * Event, indirect, packed, and other features of virtio.
>>> * To separate buffers forwarding in its own AIO context, so we can
>>>     throw more threads to that task and we don't need to stop the main
>>>     event loop.
>>> * Support virtio-net control vq.
>>> * Proper documentation.
>>>
>>> Changes from v5 RFC:
>>> * Remove dynamic enablement of SVQ, making less dependent of the device.
>>> * Enable live migration if SVQ is enabled.
>>> * Fix SVQ when driver reset.
>>> * Comments addressed, specially in the iova area.
>>> * Rebase on latest master, adding multiqueue support (but no networking
>>>     control vq processing).
>>> v5 link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
>>>
>>> Changes from v4 RFC:
>>> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>>>     already present iova-tree for that.
>>> * Proper validation of guest features. Now SVQ can negotiate a
>>>     different set of features with the device when enabled.
>>> * Support of host notifiers memory regions
>>> * Handling of SVQ full queue in case guest's descriptors span to
>>>     different memory regions (qemu's VA chunks).
>>> * Flush pending used buffers at end of SVQ operation.
>>> * QMP command now looks by NetClientState name. Other devices will need
>>>     to implement it's way to enable vdpa.
>>> * Rename QMP command to set, so it looks more like a way of working
>>> * Better use of qemu error system
>>> * Make a few assertions proper error-handling paths.
>>> * Add more documentation
>>> * Less coupling of virtio / vhost, that could cause friction on changes
>>> * Addressed many other small comments and small fixes.
>>>
>>> Changes from v3 RFC:
>>>     * Move everything to vhost-vdpa backend. A big change, this allowed
>>>       some cleanup but more code has been added in other places.
>>>     * More use of glib utilities, especially to manage memory.
>>> v3 link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>>>
>>> Changes from v2 RFC:
>>>     * Adding vhost-vdpa devices support
>>>     * Fixed some memory leaks pointed by different comments
>>> v2 link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>>>
>>> Changes from v1 RFC:
>>>     * Use QMP instead of migration to start SVQ mode.
>>>     * Only accepting IOMMU devices, closer behavior with target devices
>>>       (vDPA)
>>>     * Fix invalid masking/unmasking of vhost call fd.
>>>     * Use of proper methods for synchronization.
>>>     * No need to modify VirtIO device code, all of the changes are
>>>       contained in vhost code.
>>>     * Delete superfluous code.
>>>     * An intermediate RFC was sent with only the notifications forwarding
>>>       changes. It can be seen in
>>>       https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
>>> v1 link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>>>
>>> Eugenio Pérez (20):
>>>         virtio: Add VIRTIO_F_QUEUE_STATE
>>>         virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>>>         virtio: Add virtio_queue_is_host_notifier_enabled
>>>         vhost: Make vhost_virtqueue_{start,stop} public
>>>         vhost: Add x-vhost-enable-shadow-vq qmp
>>>         vhost: Add VhostShadowVirtqueue
>>>         vdpa: Register vdpa devices in a list
>>>         vhost: Route guest->host notification through shadow virtqueue
>>>         Add vhost_svq_get_svq_call_notifier
>>>         Add vhost_svq_set_guest_call_notifier
>>>         vdpa: Save call_fd in vhost-vdpa
>>>         vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>>>         vhost: Route host->guest notification through shadow virtqueue
>>>         virtio: Add vhost_shadow_vq_get_vring_addr
>>>         vdpa: Save host and guest features
>>>         vhost: Add vhost_svq_valid_device_features to shadow vq
>>>         vhost: Shadow virtqueue buffers forwarding
>>>         vhost: Add VhostIOVATree
>>>         vhost: Use a tree to store memory mappings
>>>         vdpa: Add custom IOTLB translations to SVQ
>>>
>>> Eugenio Pérez (31):
>>>     vdpa: Reorder virtio/vhost-vdpa.c functions
>>>     vhost: Add VhostShadowVirtqueue
>>>     vdpa: Add vhost_svq_get_dev_kick_notifier
>>>     vdpa: Add vhost_svq_set_svq_kick_fd
>>>     vhost: Add Shadow VirtQueue kick forwarding capabilities
>>>     vhost: Route guest->host notification through shadow virtqueue
>>>     vhost: dd vhost_svq_get_svq_call_notifier
>>>     vhost: Add vhost_svq_set_guest_call_notifier
>>>     vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>>>     vhost: Route host->guest notification through shadow virtqueue
>>>     vhost: Add vhost_svq_valid_device_features to shadow vq
>>>     vhost: Add vhost_svq_valid_guest_features to shadow vq
>>>     vhost: Add vhost_svq_ack_guest_features to shadow vq
>>>     virtio: Add vhost_shadow_vq_get_vring_addr
>>>     vdpa: Add vhost_svq_get_num
>>>     vhost: pass queue index to vhost_vq_get_addr
>>>     vdpa: adapt vhost_ops callbacks to svq
>>>     vhost: Shadow virtqueue buffers forwarding
>>>     utils: Add internal DMAMap to iova-tree
>>>     util: Store DMA entries in a list
>>>     util: Add iova_tree_alloc
>>>     vhost: Add VhostIOVATree
>>>     vdpa: Add custom IOTLB translations to SVQ
>>>     vhost: Add vhost_svq_get_last_used_idx
>>>     vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>>>     vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
>>>     vdpa: Never set log_base addr if SVQ is enabled
>>>     vdpa: Expose VHOST_F_LOG_ALL on SVQ
>>>     vdpa: Make ncs autofree
>>>     vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
>>>     vdpa: Add x-svq to NetdevVhostVDPAOptions
>>>
>>>    qapi/net.json                      |   5 +-
>>>    hw/virtio/vhost-iova-tree.h        |  27 +
>>>    hw/virtio/vhost-shadow-virtqueue.h |  46 ++
>>>    include/hw/virtio/vhost-vdpa.h     |   7 +
>>>    include/qemu/iova-tree.h           |  17 +
>>>    hw/virtio/vhost-iova-tree.c        | 157 ++++++
>>>    hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
>>>    hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
>>>    hw/virtio/vhost.c                  |   6 +-
>>>    net/vhost-vdpa.c                   |  58 ++-
>>>    util/iova-tree.c                   | 161 +++++-
>>>    hw/virtio/meson.build              |   2 +-
>>>    12 files changed, 1852 insertions(+), 135 deletions(-)
>>>    create mode 100644 hw/virtio/vhost-iova-tree.h
>>>    create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>>>    create mode 100644 hw/virtio/vhost-iova-tree.c
>>>    create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
@ 2022-02-08  8:27       ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:27 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/31 下午5:15, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
>>> is intended as a new method of tracking the memory the devices touch
>>> during a migration process: Instead of relay on vhost device's dirty
>>> logging capability, SVQ intercepts the VQ dataplane forwarding the
>>> descriptors between VM and device. This way qemu is the effective
>>> writer of guests memory, like in qemu's emulated virtio device
>>> operation.
>>>
>>> When SVQ is enabled qemu offers a new virtual address space to the
>>> device to read and write into, and it maps new vrings and the guest
>>> memory in it. SVQ also intercepts kicks and calls between the device
>>> and the guest. Used buffers relay would cause dirty memory being
>>> tracked, but at this RFC SVQ is not enabled on migration automatically.
>>>
>>> Thanks of being a buffers relay system, SVQ can be used also to
>>> communicate devices and drivers with different capabilities, like
>>> devices that only support packed vring and not split and old guests with
>>> no driver packed support.
>>>
>>> It is based on the ideas of DPDK SW assisted LM, in the series of
>>> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
>>> not map the shadow vq in guest's VA, but in qemu's.
>>>
>>> This version of SVQ is limited in the amount of features it can use with
>>> guest and device, because this series is already very big otherwise.
>>> Features like indirect or event_idx will be addressed in future series.
>>>
>>> SVQ needs to be enabled with cmdline parameter x-svq, like:
>>>
>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
>>>
>>> In this version it cannot be enabled or disabled in runtime. Further
>>> series will remove this limitation and will enable it only for migration
>>> time.
>>>
>>> Some patches are intentionally very small to ease review, but they can
>>> be squashed if preferred.
>>>
>>> Patches 1-10 prepares the SVQ and QEMU to support both guest to device
>>> and device to guest notifications forwarding, with the extra qemu hop.
>>> That part can be tested in isolation if cmdline change is reproduced.
>>>
>>> Patches from 11 to 18 implement the actual buffer forwarding, but with
>>> no IOMMU support. It requires a vdpa device capable of addressing all
>>> qemu vaddr.
>>>
>>> Patches 19 to 23 adds the iommu support, so the device with address
>>> range limitations can access SVQ through this new virtual address space
>>> created.
>>>
>>> The rest of the series add the last pieces needed for migration.
>>>
>>> Comments are welcome.
>>
>> I wonder the performance impact. So performance numbers are more than
>> welcomed.
>>
> Sure, I'll do it for the next revision. Since this one brings a decent
> amount of changes, I chose to collect the feedback first.


A simple single TCP_STREAM netperf test should be sufficient to give 
some basic understanding about the performance impact.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> TODO:
>>> * Event, indirect, packed, and other features of virtio.
>>> * To separate buffers forwarding in its own AIO context, so we can
>>>     throw more threads to that task and we don't need to stop the main
>>>     event loop.
>>> * Support virtio-net control vq.
>>> * Proper documentation.
>>>
>>> Changes from v5 RFC:
>>> * Remove dynamic enablement of SVQ, making less dependent of the device.
>>> * Enable live migration if SVQ is enabled.
>>> * Fix SVQ when driver reset.
>>> * Comments addressed, specially in the iova area.
>>> * Rebase on latest master, adding multiqueue support (but no networking
>>>     control vq processing).
>>> v5 link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
>>>
>>> Changes from v4 RFC:
>>> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>>>     already present iova-tree for that.
>>> * Proper validation of guest features. Now SVQ can negotiate a
>>>     different set of features with the device when enabled.
>>> * Support of host notifiers memory regions
>>> * Handling of SVQ full queue in case guest's descriptors span to
>>>     different memory regions (qemu's VA chunks).
>>> * Flush pending used buffers at end of SVQ operation.
>>> * QMP command now looks by NetClientState name. Other devices will need
>>>     to implement it's way to enable vdpa.
>>> * Rename QMP command to set, so it looks more like a way of working
>>> * Better use of qemu error system
>>> * Make a few assertions proper error-handling paths.
>>> * Add more documentation
>>> * Less coupling of virtio / vhost, that could cause friction on changes
>>> * Addressed many other small comments and small fixes.
>>>
>>> Changes from v3 RFC:
>>>     * Move everything to vhost-vdpa backend. A big change, this allowed
>>>       some cleanup but more code has been added in other places.
>>>     * More use of glib utilities, especially to manage memory.
>>> v3 link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>>>
>>> Changes from v2 RFC:
>>>     * Adding vhost-vdpa devices support
>>>     * Fixed some memory leaks pointed by different comments
>>> v2 link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>>>
>>> Changes from v1 RFC:
>>>     * Use QMP instead of migration to start SVQ mode.
>>>     * Only accepting IOMMU devices, closer behavior with target devices
>>>       (vDPA)
>>>     * Fix invalid masking/unmasking of vhost call fd.
>>>     * Use of proper methods for synchronization.
>>>     * No need to modify VirtIO device code, all of the changes are
>>>       contained in vhost code.
>>>     * Delete superfluous code.
>>>     * An intermediate RFC was sent with only the notifications forwarding
>>>       changes. It can be seen in
>>>       https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
>>> v1 link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>>>
>>> Eugenio Pérez (20):
>>>         virtio: Add VIRTIO_F_QUEUE_STATE
>>>         virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>>>         virtio: Add virtio_queue_is_host_notifier_enabled
>>>         vhost: Make vhost_virtqueue_{start,stop} public
>>>         vhost: Add x-vhost-enable-shadow-vq qmp
>>>         vhost: Add VhostShadowVirtqueue
>>>         vdpa: Register vdpa devices in a list
>>>         vhost: Route guest->host notification through shadow virtqueue
>>>         Add vhost_svq_get_svq_call_notifier
>>>         Add vhost_svq_set_guest_call_notifier
>>>         vdpa: Save call_fd in vhost-vdpa
>>>         vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>>>         vhost: Route host->guest notification through shadow virtqueue
>>>         virtio: Add vhost_shadow_vq_get_vring_addr
>>>         vdpa: Save host and guest features
>>>         vhost: Add vhost_svq_valid_device_features to shadow vq
>>>         vhost: Shadow virtqueue buffers forwarding
>>>         vhost: Add VhostIOVATree
>>>         vhost: Use a tree to store memory mappings
>>>         vdpa: Add custom IOTLB translations to SVQ
>>>
>>> Eugenio Pérez (31):
>>>     vdpa: Reorder virtio/vhost-vdpa.c functions
>>>     vhost: Add VhostShadowVirtqueue
>>>     vdpa: Add vhost_svq_get_dev_kick_notifier
>>>     vdpa: Add vhost_svq_set_svq_kick_fd
>>>     vhost: Add Shadow VirtQueue kick forwarding capabilities
>>>     vhost: Route guest->host notification through shadow virtqueue
>>>     vhost: dd vhost_svq_get_svq_call_notifier
>>>     vhost: Add vhost_svq_set_guest_call_notifier
>>>     vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>>>     vhost: Route host->guest notification through shadow virtqueue
>>>     vhost: Add vhost_svq_valid_device_features to shadow vq
>>>     vhost: Add vhost_svq_valid_guest_features to shadow vq
>>>     vhost: Add vhost_svq_ack_guest_features to shadow vq
>>>     virtio: Add vhost_shadow_vq_get_vring_addr
>>>     vdpa: Add vhost_svq_get_num
>>>     vhost: pass queue index to vhost_vq_get_addr
>>>     vdpa: adapt vhost_ops callbacks to svq
>>>     vhost: Shadow virtqueue buffers forwarding
>>>     utils: Add internal DMAMap to iova-tree
>>>     util: Store DMA entries in a list
>>>     util: Add iova_tree_alloc
>>>     vhost: Add VhostIOVATree
>>>     vdpa: Add custom IOTLB translations to SVQ
>>>     vhost: Add vhost_svq_get_last_used_idx
>>>     vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>>>     vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
>>>     vdpa: Never set log_base addr if SVQ is enabled
>>>     vdpa: Expose VHOST_F_LOG_ALL on SVQ
>>>     vdpa: Make ncs autofree
>>>     vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
>>>     vdpa: Add x-svq to NetdevVhostVDPAOptions
>>>
>>>    qapi/net.json                      |   5 +-
>>>    hw/virtio/vhost-iova-tree.h        |  27 +
>>>    hw/virtio/vhost-shadow-virtqueue.h |  46 ++
>>>    include/hw/virtio/vhost-vdpa.h     |   7 +
>>>    include/qemu/iova-tree.h           |  17 +
>>>    hw/virtio/vhost-iova-tree.c        | 157 ++++++
>>>    hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
>>>    hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
>>>    hw/virtio/vhost.c                  |   6 +-
>>>    net/vhost-vdpa.c                   |  58 ++-
>>>    util/iova-tree.c                   | 161 +++++-
>>>    hw/virtio/meson.build              |   2 +-
>>>    12 files changed, 1852 insertions(+), 135 deletions(-)
>>>    create mode 100644 hw/virtio/vhost-iova-tree.h
>>>    create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>>>    create mode 100644 hw/virtio/vhost-iova-tree.c
>>>    create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>>>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
  2022-01-31 10:18     ` Eugenio Perez Martin
@ 2022-02-08  8:47         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:47 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午6:18, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:29 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This function allows the vhost-vdpa backend to override kick_fd.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |  1 +
>>>    hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
>>>    2 files changed, 46 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 400effd9f2..a56ecfc09d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -15,6 +15,7 @@
>>>
>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>
>>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>                                                  const VhostShadowVirtqueue *svq);
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index bd87110073..21534bc94d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -11,6 +11,7 @@
>>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
>>>
>>>    #include "qemu/error-report.h"
>>> +#include "qemu/main-loop.h"
>>>
>>>    /* Shadow virtqueue to relay notifications */
>>>    typedef struct VhostShadowVirtqueue {
>>> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
>>>        EventNotifier hdev_kick;
>>>        /* Shadow call notifier, sent to vhost */
>>>        EventNotifier hdev_call;
>>> +
>>> +    /*
>>> +     * Borrowed virtqueue's guest to host notifier.
>>> +     * To borrow it in this event notifier allows to register on the event
>>> +     * loop and access the associated shadow virtqueue easily. If we use the
>>> +     * VirtQueue, we don't have an easy way to retrieve it.
>>> +     *
>>> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>>> +     */
>>> +    EventNotifier svq_kick;
>>>    } VhostShadowVirtqueue;
>>>
>>> +#define INVALID_SVQ_KICK_FD -1
>>> +
>>>    /**
>>>     * The notifier that SVQ will use to notify the device.
>>>     */
>>> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>        return &svq->hdev_kick;
>>>    }
>>>
>>> +/**
>>> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
>>> + *
>>> + * @svq          The svq
>>> + * @svq_kick_fd  The new svq kick fd
>>> + */
>>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>> +{
>>> +    EventNotifier tmp;
>>> +    bool check_old = INVALID_SVQ_KICK_FD !=
>>> +                     event_notifier_get_fd(&svq->svq_kick);
>>> +
>>> +    if (check_old) {
>>> +        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
>>> +    }
>>
>> It looks to me we don't do similar things in vhost-net. Any reason for
>> caring about the old svq_kick?
>>
> Do you mean to check for old kick_fd in case we miss notifications,
> and explicitly omit the INVALID_SVQ_KICK_FD?


Yes.


>
> If you mean qemu's vhost-net, I guess it's because the device's kick
> fd is never changed in all the vhost device lifecycle, it's only set
> at the beginning. Previous RFC also depended on that, but you
> suggested better vhost and SVQ in v4 feedback if I understood
> correctly [1]. Or am I missing something?


No, I forgot that. But in this case we should have a better dealing with 
the the conversion from valid fd to -1 by disabling the handler.


>
> Qemu's vhost-net does not need to use this because it is not polling
> it. For kernel's vhost, I guess the closest is the use of pollstop and
> pollstart at vhost_vring_ioctl.
>
> In my opinion, I think that SVQ code size can benefit from now
> allowing to override kick_fd from the start of the operation. Not from
> initialization, but start. But I can see the benefits of having the
> change into account from this moment so it's more resilient to the
> future.
>
>>> +
>>> +    /*
>>> +     * event_notifier_set_handler already checks for guest's notifications if
>>> +     * they arrive to the new file descriptor in the switch, so there is no
>>> +     * need to explicitely check for them.
>>> +     */
>>> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
>>> +
>>> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
>>> +        event_notifier_set(&svq->hdev_kick);
>>
>> Any reason we need to kick the device directly here?
>>
> At this point of the series only notifications are forwarded, not
> buffers. If kick_fd is set, we need to check the old one, the same way
> as vhost checks the masked notifier in case of change.


I meant we need to kick the svq instead of vhost-vdpa in this case?

Thanks


>
> Thanks!
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg03152.html
> , from "I'd suggest to not depend on this since it:"
>
>
>> Thanks
>>
>>
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>     * methods and file descriptors.
>>> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>            goto err_init_hdev_call;
>>>        }
>>>
>>> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>> +
>>>        return g_steal_pointer(&svq);
>>>
>>>    err_init_hdev_call:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
@ 2022-02-08  8:47         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  8:47 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/31 下午6:18, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:29 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This function allows the vhost-vdpa backend to override kick_fd.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |  1 +
>>>    hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
>>>    2 files changed, 46 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 400effd9f2..a56ecfc09d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -15,6 +15,7 @@
>>>
>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>
>>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>                                                  const VhostShadowVirtqueue *svq);
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index bd87110073..21534bc94d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -11,6 +11,7 @@
>>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
>>>
>>>    #include "qemu/error-report.h"
>>> +#include "qemu/main-loop.h"
>>>
>>>    /* Shadow virtqueue to relay notifications */
>>>    typedef struct VhostShadowVirtqueue {
>>> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
>>>        EventNotifier hdev_kick;
>>>        /* Shadow call notifier, sent to vhost */
>>>        EventNotifier hdev_call;
>>> +
>>> +    /*
>>> +     * Borrowed virtqueue's guest to host notifier.
>>> +     * To borrow it in this event notifier allows to register on the event
>>> +     * loop and access the associated shadow virtqueue easily. If we use the
>>> +     * VirtQueue, we don't have an easy way to retrieve it.
>>> +     *
>>> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>>> +     */
>>> +    EventNotifier svq_kick;
>>>    } VhostShadowVirtqueue;
>>>
>>> +#define INVALID_SVQ_KICK_FD -1
>>> +
>>>    /**
>>>     * The notifier that SVQ will use to notify the device.
>>>     */
>>> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>        return &svq->hdev_kick;
>>>    }
>>>
>>> +/**
>>> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
>>> + *
>>> + * @svq          The svq
>>> + * @svq_kick_fd  The new svq kick fd
>>> + */
>>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>> +{
>>> +    EventNotifier tmp;
>>> +    bool check_old = INVALID_SVQ_KICK_FD !=
>>> +                     event_notifier_get_fd(&svq->svq_kick);
>>> +
>>> +    if (check_old) {
>>> +        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
>>> +    }
>>
>> It looks to me we don't do similar things in vhost-net. Any reason for
>> caring about the old svq_kick?
>>
> Do you mean to check for old kick_fd in case we miss notifications,
> and explicitly omit the INVALID_SVQ_KICK_FD?


Yes.


>
> If you mean qemu's vhost-net, I guess it's because the device's kick
> fd is never changed in all the vhost device lifecycle, it's only set
> at the beginning. Previous RFC also depended on that, but you
> suggested better vhost and SVQ in v4 feedback if I understood
> correctly [1]. Or am I missing something?


No, I forgot that. But in this case we should have a better dealing with 
the the conversion from valid fd to -1 by disabling the handler.


>
> Qemu's vhost-net does not need to use this because it is not polling
> it. For kernel's vhost, I guess the closest is the use of pollstop and
> pollstart at vhost_vring_ioctl.
>
> In my opinion, I think that SVQ code size can benefit from now
> allowing to override kick_fd from the start of the operation. Not from
> initialization, but start. But I can see the benefits of having the
> change into account from this moment so it's more resilient to the
> future.
>
>>> +
>>> +    /*
>>> +     * event_notifier_set_handler already checks for guest's notifications if
>>> +     * they arrive to the new file descriptor in the switch, so there is no
>>> +     * need to explicitely check for them.
>>> +     */
>>> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
>>> +
>>> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
>>> +        event_notifier_set(&svq->hdev_kick);
>>
>> Any reason we need to kick the device directly here?
>>
> At this point of the series only notifications are forwarded, not
> buffers. If kick_fd is set, we need to check the old one, the same way
> as vhost checks the masked notifier in case of change.


I meant we need to kick the svq instead of vhost-vdpa in this case?

Thanks


>
> Thanks!
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg03152.html
> , from "I'd suggest to not depend on this since it:"
>
>
>> Thanks
>>
>>
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>     * methods and file descriptors.
>>> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>            goto err_init_hdev_call;
>>>        }
>>>
>>> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>> +
>>>        return g_steal_pointer(&svq);
>>>
>>>    err_init_hdev_call:



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
  2022-01-31 11:33     ` Eugenio Perez Martin
@ 2022-02-08  9:02         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  9:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午7:33, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:57 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> At this moment no buffer forwarding will be performed in SVQ mode: Qemu
>>> just forward the guest's kicks to the device. This commit also set up
>>> SVQs in the vhost device.
>>>
>>> Host memory notifiers regions are left out for simplicity, and they will
>>> not be addressed in this series.
>>
>> I wonder if it's better to squash this into patch 5 since it gives us a
>> full guest->host forwarding.
>>
> I'm fine with that if you think it makes the review easier.


Yes please.


>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    include/hw/virtio/vhost-vdpa.h |   4 ++
>>>    hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
>>>    2 files changed, 124 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
>>> index 3ce79a646d..009a9f3b6b 100644
>>> --- a/include/hw/virtio/vhost-vdpa.h
>>> +++ b/include/hw/virtio/vhost-vdpa.h
>>> @@ -12,6 +12,8 @@
>>>    #ifndef HW_VIRTIO_VHOST_VDPA_H
>>>    #define HW_VIRTIO_VHOST_VDPA_H
>>>
>>> +#include <gmodule.h>
>>> +
>>>    #include "hw/virtio/virtio.h"
>>>    #include "standard-headers/linux/vhost_types.h"
>>>
>>> @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
>>>        bool iotlb_batch_begin_sent;
>>>        MemoryListener listener;
>>>        struct vhost_vdpa_iova_range iova_range;
>>> +    bool shadow_vqs_enabled;
>>> +    GPtrArray *shadow_vqs;
>>>        struct vhost_dev *dev;
>>>        VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>>>    } VhostVDPA;
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 6c10a7f05f..18de14f0fb 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -17,12 +17,14 @@
>>>    #include "hw/virtio/vhost.h"
>>>    #include "hw/virtio/vhost-backend.h"
>>>    #include "hw/virtio/virtio-net.h"
>>> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>>>    #include "hw/virtio/vhost-vdpa.h"
>>>    #include "exec/address-spaces.h"
>>>    #include "qemu/main-loop.h"
>>>    #include "cpu.h"
>>>    #include "trace.h"
>>>    #include "qemu-common.h"
>>> +#include "qapi/error.h"
>>>
>>>    /*
>>>     * Return one past the end of the end of section. Be careful with uint64_t
>>> @@ -409,8 +411,14 @@ err:
>>>
>>>    static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>>>    {
>>> +    struct vhost_vdpa *v = dev->opaque;
>>>        int i;
>>>
>>> +    if (v->shadow_vqs_enabled) {
>>> +        /* SVQ is not compatible with host notifiers mr */
>>
>> I guess there should be a TODO or FIXME here.
>>
> Sure I can add it.
>
>>> +        return;
>>> +    }
>>> +
>>>        for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
>>>            if (vhost_vdpa_host_notifier_init(dev, i)) {
>>>                goto err;
>>> @@ -424,6 +432,17 @@ err:
>>>        return;
>>>    }
>>>
>>> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    size_t idx;
>>> +
>>> +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
>>> +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
>>> +    }
>>> +    g_ptr_array_free(v->shadow_vqs, true);
>>> +}
>>> +
>>>    static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>>    {
>>>        struct vhost_vdpa *v;
>>> @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>>        trace_vhost_vdpa_cleanup(dev, v);
>>>        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>        memory_listener_unregister(&v->listener);
>>> +    vhost_vdpa_svq_cleanup(dev);
>>>
>>>        dev->opaque = NULL;
>>>        ram_block_discard_disable(false);
>>> @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
>>>
>>>    static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>>    {
>>> +    struct vhost_vdpa *v = dev->opaque;
>>>        int ret;
>>>        uint8_t status = 0;
>>>
>>> +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>> +        vhost_svq_stop(svq);
>>> +    }
>>> +
>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>        return ret;
>>> @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>>>        return ret;
>>>    }
>>>
>>> -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>> -                                       struct vhost_vring_file *file)
>>> +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
>>> +                                         struct vhost_vring_file *file)
>>>    {
>>>        trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
>>>    }
>>>
>>> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>> +                                       struct vhost_vring_file *file)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> +        vhost_svq_set_svq_kick_fd(svq, file->fd);
>>> +        return 0;
>>> +    } else {
>>> +        return vhost_vdpa_set_vring_dev_kick(dev, file);
>>> +    }
>>> +}
>>> +
>>>    static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>                                           struct vhost_vring_file *file)
>>>    {
>>> @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +/**
>>> + * Set shadow virtqueue descriptors to the device
>>> + *
>>> + * @dev   The vhost device model
>>> + * @svq   The shadow virtqueue
>>> + * @idx   The index of the virtqueue in the vhost device
>>> + */
>>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> +                                VhostShadowVirtqueue *svq,
>>> +                                unsigned idx)
>>> +{
>>> +    struct vhost_vring_file file = {
>>> +        .index = dev->vq_index + idx,
>>> +    };
>>> +    const EventNotifier *event_notifier;
>>> +    int r;
>>> +
>>> +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);
>>
>> A question, any reason for making VhostShadowVirtqueue private? If we
>> export it in .h we don't need helper to access its member like
>> vhost_svq_get_dev_kick_notifier().
>>
> To export it it's always a possibility of course, but that direct
> access will not be thread safe if we decide to move SVQ to its own
> iothread for example.


I don't get this, maybe you can give me an example.


>
> I feel it will be easier to work with it this way but it might be that
> I'm just used to making as much as possible private. Not like it's
> needed to use the helpers in the hot paths, only in the setup and
> teardown.
>
>> Note that vhost_dev is a public structure.
>>
> Sure we could embed in vhost_virtqueue if we choose to do it that way,
> for example.
>
>>> +    file.fd = event_notifier_get_fd(event_notifier);
>>> +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>>> +    if (unlikely(r != 0)) {
>>> +        error_report("Can't set device kick fd (%d)", -r);
>>> +    }
>>
>> I wonder whether or not we can generalize the logic here and
>> vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the
>> vhost_ops->set_vring_kick().
>>
> If we call vhost_ops->set_vring_kick we are setting guest->SVQ kick
> notifier, not SVQ -> vDPA device, because the
> if(v->shadow_vqs_enabled). All of the modified ops callbacks are
> hiding the actual device from the vhost subsystem so we need to
> explicitly use the newly created _dev_ ones.


Ok, I'm fine to start with vhost_vdpa specific code.


>
>>> +
>>> +    return r == 0;
>>> +}
>>> +
>>>    static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>    {
>>>        struct vhost_vdpa *v = dev->opaque;
>>> @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>
>>>        if (started) {
>>>            vhost_vdpa_host_notifiers_init(dev);
>>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>> +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>>> +            if (unlikely(!ok)) {
>>> +                return -1;
>>> +            }
>>> +        }
>>>            vhost_vdpa_set_vring_ready(dev);
>>>        } else {
>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>>        return true;
>>>    }
>>>
>>> +/**
>>> + * Adaptor function to free shadow virtqueue through gpointer
>>> + *
>>> + * @svq   The Shadow Virtqueue
>>> + */
>>> +static void vhost_psvq_free(gpointer svq)
>>> +{
>>> +    vhost_svq_free(svq);
>>> +}
>>
>> Any reason for such indirection? Can we simply use vhost_svq_free()?
>>
> GCC complains about different types. I think we could do a function
> type cast and it's valid for every architecture qemu supports, but the
> indirection seems cleaner to me, and I would be surprised if the
> compiler does not optimize it away in the cases that the casting are
> valid.
>
> ../hw/virtio/vhost-vdpa.c:1186:60: error: incompatible function
> pointer types passing 'void (VhostShadowVirtqueue *)' (aka 'void
> (struct VhostShadowVirtqueue *)') to parameter of type
> 'GDestroyNotify' (aka 'void (*)(void *)')


Or just change vhost_svq_free() to take gpointer instead? Then we don't 
need a cast.

Thanks

>
> Thanks!
>
>> Thanks
>>
>>
>>> +
>>> +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>>> +                               Error **errp)
>>> +{
>>> +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>>> +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>>> +                                                           vhost_psvq_free);
>>> +    if (!v->shadow_vqs_enabled) {
>>> +        goto out;
>>> +    }
>>> +
>>> +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
>>> +        VhostShadowVirtqueue *svq = vhost_svq_new();
>>> +
>>> +        if (unlikely(!svq)) {
>>> +            error_setg(errp, "Cannot create svq %u", n);
>>> +            return -1;
>>> +        }
>>> +        g_ptr_array_add(v->shadow_vqs, svq);
>>> +    }
>>> +
>>> +out:
>>> +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
>>> +    return 0;
>>> +}
>>> +
>>>    static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>    {
>>>        struct vhost_vdpa *v;
>>> @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>        dev->opaque =  opaque ;
>>>        v->listener = vhost_vdpa_memory_listener;
>>>        v->msg_type = VHOST_IOTLB_MSG_V2;
>>> +    ret = vhost_vdpa_init_svq(dev, v, errp);
>>> +    if (ret) {
>>> +        goto err;
>>> +    }
>>>
>>>        vhost_vdpa_get_iova_range(v);
>>>
>>> @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>                                   VIRTIO_CONFIG_S_DRIVER);
>>>
>>>        return 0;
>>> +
>>> +err:
>>> +    ram_block_discard_disable(false);
>>> +    return ret;
>>>    }
>>>
>>>    const VhostOps vdpa_ops = {

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
@ 2022-02-08  9:02         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-08  9:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/31 下午7:33, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:57 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> At this moment no buffer forwarding will be performed in SVQ mode: Qemu
>>> just forward the guest's kicks to the device. This commit also set up
>>> SVQs in the vhost device.
>>>
>>> Host memory notifiers regions are left out for simplicity, and they will
>>> not be addressed in this series.
>>
>> I wonder if it's better to squash this into patch 5 since it gives us a
>> full guest->host forwarding.
>>
> I'm fine with that if you think it makes the review easier.


Yes please.


>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    include/hw/virtio/vhost-vdpa.h |   4 ++
>>>    hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
>>>    2 files changed, 124 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
>>> index 3ce79a646d..009a9f3b6b 100644
>>> --- a/include/hw/virtio/vhost-vdpa.h
>>> +++ b/include/hw/virtio/vhost-vdpa.h
>>> @@ -12,6 +12,8 @@
>>>    #ifndef HW_VIRTIO_VHOST_VDPA_H
>>>    #define HW_VIRTIO_VHOST_VDPA_H
>>>
>>> +#include <gmodule.h>
>>> +
>>>    #include "hw/virtio/virtio.h"
>>>    #include "standard-headers/linux/vhost_types.h"
>>>
>>> @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
>>>        bool iotlb_batch_begin_sent;
>>>        MemoryListener listener;
>>>        struct vhost_vdpa_iova_range iova_range;
>>> +    bool shadow_vqs_enabled;
>>> +    GPtrArray *shadow_vqs;
>>>        struct vhost_dev *dev;
>>>        VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>>>    } VhostVDPA;
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 6c10a7f05f..18de14f0fb 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -17,12 +17,14 @@
>>>    #include "hw/virtio/vhost.h"
>>>    #include "hw/virtio/vhost-backend.h"
>>>    #include "hw/virtio/virtio-net.h"
>>> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>>>    #include "hw/virtio/vhost-vdpa.h"
>>>    #include "exec/address-spaces.h"
>>>    #include "qemu/main-loop.h"
>>>    #include "cpu.h"
>>>    #include "trace.h"
>>>    #include "qemu-common.h"
>>> +#include "qapi/error.h"
>>>
>>>    /*
>>>     * Return one past the end of the end of section. Be careful with uint64_t
>>> @@ -409,8 +411,14 @@ err:
>>>
>>>    static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>>>    {
>>> +    struct vhost_vdpa *v = dev->opaque;
>>>        int i;
>>>
>>> +    if (v->shadow_vqs_enabled) {
>>> +        /* SVQ is not compatible with host notifiers mr */
>>
>> I guess there should be a TODO or FIXME here.
>>
> Sure I can add it.
>
>>> +        return;
>>> +    }
>>> +
>>>        for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
>>>            if (vhost_vdpa_host_notifier_init(dev, i)) {
>>>                goto err;
>>> @@ -424,6 +432,17 @@ err:
>>>        return;
>>>    }
>>>
>>> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    size_t idx;
>>> +
>>> +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
>>> +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
>>> +    }
>>> +    g_ptr_array_free(v->shadow_vqs, true);
>>> +}
>>> +
>>>    static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>>    {
>>>        struct vhost_vdpa *v;
>>> @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>>        trace_vhost_vdpa_cleanup(dev, v);
>>>        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>        memory_listener_unregister(&v->listener);
>>> +    vhost_vdpa_svq_cleanup(dev);
>>>
>>>        dev->opaque = NULL;
>>>        ram_block_discard_disable(false);
>>> @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
>>>
>>>    static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>>    {
>>> +    struct vhost_vdpa *v = dev->opaque;
>>>        int ret;
>>>        uint8_t status = 0;
>>>
>>> +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>> +        vhost_svq_stop(svq);
>>> +    }
>>> +
>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>        return ret;
>>> @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>>>        return ret;
>>>    }
>>>
>>> -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>> -                                       struct vhost_vring_file *file)
>>> +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
>>> +                                         struct vhost_vring_file *file)
>>>    {
>>>        trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
>>>    }
>>>
>>> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>> +                                       struct vhost_vring_file *file)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> +        vhost_svq_set_svq_kick_fd(svq, file->fd);
>>> +        return 0;
>>> +    } else {
>>> +        return vhost_vdpa_set_vring_dev_kick(dev, file);
>>> +    }
>>> +}
>>> +
>>>    static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>                                           struct vhost_vring_file *file)
>>>    {
>>> @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +/**
>>> + * Set shadow virtqueue descriptors to the device
>>> + *
>>> + * @dev   The vhost device model
>>> + * @svq   The shadow virtqueue
>>> + * @idx   The index of the virtqueue in the vhost device
>>> + */
>>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> +                                VhostShadowVirtqueue *svq,
>>> +                                unsigned idx)
>>> +{
>>> +    struct vhost_vring_file file = {
>>> +        .index = dev->vq_index + idx,
>>> +    };
>>> +    const EventNotifier *event_notifier;
>>> +    int r;
>>> +
>>> +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);
>>
>> A question, any reason for making VhostShadowVirtqueue private? If we
>> export it in .h we don't need helper to access its member like
>> vhost_svq_get_dev_kick_notifier().
>>
> To export it it's always a possibility of course, but that direct
> access will not be thread safe if we decide to move SVQ to its own
> iothread for example.


I don't get this, maybe you can give me an example.


>
> I feel it will be easier to work with it this way but it might be that
> I'm just used to making as much as possible private. Not like it's
> needed to use the helpers in the hot paths, only in the setup and
> teardown.
>
>> Note that vhost_dev is a public structure.
>>
> Sure we could embed in vhost_virtqueue if we choose to do it that way,
> for example.
>
>>> +    file.fd = event_notifier_get_fd(event_notifier);
>>> +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>>> +    if (unlikely(r != 0)) {
>>> +        error_report("Can't set device kick fd (%d)", -r);
>>> +    }
>>
>> I wonder whether or not we can generalize the logic here and
>> vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the
>> vhost_ops->set_vring_kick().
>>
> If we call vhost_ops->set_vring_kick we are setting guest->SVQ kick
> notifier, not SVQ -> vDPA device, because the
> if(v->shadow_vqs_enabled). All of the modified ops callbacks are
> hiding the actual device from the vhost subsystem so we need to
> explicitly use the newly created _dev_ ones.


Ok, I'm fine to start with vhost_vdpa specific code.


>
>>> +
>>> +    return r == 0;
>>> +}
>>> +
>>>    static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>    {
>>>        struct vhost_vdpa *v = dev->opaque;
>>> @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>
>>>        if (started) {
>>>            vhost_vdpa_host_notifiers_init(dev);
>>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>> +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>>> +            if (unlikely(!ok)) {
>>> +                return -1;
>>> +            }
>>> +        }
>>>            vhost_vdpa_set_vring_ready(dev);
>>>        } else {
>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>>        return true;
>>>    }
>>>
>>> +/**
>>> + * Adaptor function to free shadow virtqueue through gpointer
>>> + *
>>> + * @svq   The Shadow Virtqueue
>>> + */
>>> +static void vhost_psvq_free(gpointer svq)
>>> +{
>>> +    vhost_svq_free(svq);
>>> +}
>>
>> Any reason for such indirection? Can we simply use vhost_svq_free()?
>>
> GCC complains about different types. I think we could do a function
> type cast and it's valid for every architecture qemu supports, but the
> indirection seems cleaner to me, and I would be surprised if the
> compiler does not optimize it away in the cases that the casting are
> valid.
>
> ../hw/virtio/vhost-vdpa.c:1186:60: error: incompatible function
> pointer types passing 'void (VhostShadowVirtqueue *)' (aka 'void
> (struct VhostShadowVirtqueue *)') to parameter of type
> 'GDestroyNotify' (aka 'void (*)(void *)')


Or just change vhost_svq_free() to take gpointer instead? Then we don't 
need a cast.

Thanks

>
> Thanks!
>
>> Thanks
>>
>>
>>> +
>>> +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>>> +                               Error **errp)
>>> +{
>>> +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>>> +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>>> +                                                           vhost_psvq_free);
>>> +    if (!v->shadow_vqs_enabled) {
>>> +        goto out;
>>> +    }
>>> +
>>> +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
>>> +        VhostShadowVirtqueue *svq = vhost_svq_new();
>>> +
>>> +        if (unlikely(!svq)) {
>>> +            error_setg(errp, "Cannot create svq %u", n);
>>> +            return -1;
>>> +        }
>>> +        g_ptr_array_add(v->shadow_vqs, svq);
>>> +    }
>>> +
>>> +out:
>>> +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
>>> +    return 0;
>>> +}
>>> +
>>>    static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>    {
>>>        struct vhost_vdpa *v;
>>> @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>        dev->opaque =  opaque ;
>>>        v->listener = vhost_vdpa_memory_listener;
>>>        v->msg_type = VHOST_IOTLB_MSG_V2;
>>> +    ret = vhost_vdpa_init_svq(dev, v, errp);
>>> +    if (ret) {
>>> +        goto err;
>>> +    }
>>>
>>>        vhost_vdpa_get_iova_range(v);
>>>
>>> @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>                                   VIRTIO_CONFIG_S_DRIVER);
>>>
>>>        return 0;
>>> +
>>> +err:
>>> +    ram_block_discard_disable(false);
>>> +    return ret;
>>>    }
>>>
>>>    const VhostOps vdpa_ops = {



^ permalink raw reply	[flat|nested] 182+ messages in thread

* (no subject)
  2022-01-28  5:55                   ` Jason Wang
  (?)
  (?)
@ 2022-02-15 19:34                   ` Eugenio Pérez
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Pérez @ 2022-02-15 19:34 UTC (permalink / raw)
  To: Jason Wang, Peter Xu; +Cc: qemu-devel

Please review this new minimal version. It is way shorter, but this
comes with a cost:
* Iteration does not stop at the end of range (but an out of range
  allocation never happens)
* Iteration must start from iova == 0 instead of first valid entry in
  the hole.

These should not be a big deal though.

Another possible optimization that comes to my mind is to allocate
always at the end of range. In the case of having to allocate and
deallocate frequently, this should avoid to iterate over long lived
entries. Better justify this with numbers, so I left that out.

Thanks!




^ permalink raw reply	[flat|nested] 182+ messages in thread

* [PATCH] util: Add iova_tree_alloc
  2022-01-28  5:55                   ` Jason Wang
                                     ` (2 preceding siblings ...)
  (?)
@ 2022-02-15 19:34                   ` Eugenio Pérez
  2022-02-16  7:25                     ` Peter Xu
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Pérez @ 2022-02-15 19:34 UTC (permalink / raw)
  To: Jason Wang, Peter Xu; +Cc: qemu-devel

This iova tree function allows it to look for a hole in allocated
regions and return a totally new translation for a given translated
address.

It's usage is mainly to allow devices to access qemu address space,
remapping guest's one into a new iova space where qemu can add chunks of
addresses.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/qemu/iova-tree.h |  17 +++++
 util/iova-tree.c         | 132 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 149 insertions(+)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8249edd764..eb6b6175a3 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -29,6 +29,7 @@
 #define  IOVA_OK           (0)
 #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
 #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
 
 typedef struct IOVATree IOVATree;
 typedef struct DMAMap {
@@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
  */
 void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
 
+/**
+ * iova_tree_alloc:
+ *
+ * @tree: the iova tree to allocate from
+ * @map: the new map (as translated addr & size) to allocate in iova region
+ * @iova_begin: the minimum address of the allocation
+ * @iova_end: the maximum addressable direction of the allocation
+ *
+ * Allocates a new region of a given size, between iova_min and iova_max.
+ *
+ * Return: Same as iova_tree_insert, but cannot overlap and can be out of
+ * free contiguous range. Caller can get the assigned iova in map->iova.
+ */
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                        hwaddr iova_end);
+
 /**
  * iova_tree_destroy:
  *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 23ea35b7a4..1adefeb086 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -16,6 +16,39 @@ struct IOVATree {
     GTree *tree;
 };
 
+/* Args to pass to iova_tree_alloc foreach function. */
+struct IOVATreeAllocArgs {
+    /* Size of the desired allocation */
+    size_t new_size;
+
+    /* The minimum address allowed in the allocation */
+    hwaddr iova_begin;
+
+    /* Map at the left of the hole, can be NULL if "this" is first one */
+    const DMAMap *prev;
+
+    /* Map at the right of the hole, can be NULL if "prev" is the last one */
+    const DMAMap *this;
+
+    /* If found, we fill in the IOVA here */
+    hwaddr iova_result;
+
+    /* Whether have we found a valid IOVA */
+    bool iova_found;
+};
+
+/**
+ * Iterate args to tne next hole
+ *
+ * @args  The alloc arguments
+ * @next  The next mapping in the tree. Can be NULL to signal the last one
+ */
+static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
+                                         const DMAMap *next) {
+    args->prev = args->this;
+    args->this = next;
+}
+
 static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
 {
     const DMAMap *m1 = a, *m2 = b;
@@ -107,6 +140,105 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
     return IOVA_OK;
 }
 
+/**
+ * Try to find an unallocated IOVA range between prev and this elements.
+ *
+ * @args Arguments to allocation
+ *
+ * Cases:
+ *
+ * (1) !prev, !this: No entries allocated, always succeed
+ *
+ * (2) !prev, this: We're iterating at the 1st element.
+ *
+ * (3) prev, !this: We're iterating at last element.
+ *
+ * (4) prev, this: this is the most common case, we'll try to find a hole
+ * between "prev" and "this" mapping.
+ *
+ * Note that this function assumes last valid iova is HWADDR_MAX, but it
+ * searches linearly so it's easy to discard result if it's not the case.
+ */
+static void iova_tree_alloc_map_in_hole(struct IOVATreeAllocArgs *args)
+{
+    const DMAMap *prev = args->prev, *this = args->this;
+    uint64_t hole_start, hole_last;
+
+    if (this && this->iova + this->size < args->iova_begin) {
+        return;
+    }
+
+    hole_start = MAX(prev ? prev->iova + prev->size + 1 : 0, args->iova_begin);
+    hole_last = this ? this->iova : HWADDR_MAX;
+
+    if (hole_last - hole_start > args->new_size) {
+        args->iova_result = hole_start;
+        args->iova_found = true;
+    }
+}
+
+/**
+ * Foreach dma node in the tree, compare if there is a hole wit its previous
+ * node (or minimum iova address allowed) and the node.
+ *
+ * @key   Node iterating
+ * @value Node iterating
+ * @pargs Struct to communicate with the outside world
+ *
+ * Return: false to keep iterating, true if needs break.
+ */
+static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
+                                         gpointer pargs)
+{
+    struct IOVATreeAllocArgs *args = pargs;
+    DMAMap *node = value;
+
+    assert(key == value);
+
+    iova_tree_alloc_args_iterate(args, node);
+    iova_tree_alloc_map_in_hole(args);
+    return args->iova_found;
+}
+
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                        hwaddr iova_last)
+{
+    struct IOVATreeAllocArgs args = {
+        .new_size = map->size,
+        .iova_begin = iova_begin,
+    };
+
+    assert(iova_begin < iova_last);
+
+    /*
+     * Find a valid hole for the mapping
+     *
+     * Assuming low iova_begin, so no need to do a binary search to
+     * locate the first node.
+     *
+     * TODO: Replace all this with g_tree_node_first/next/last when available
+     * (from glib since 2.68). To do it with g_tree_foreach complicates the
+     * code a lot.
+     *
+     */
+    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
+    if (!args.iova_found) {
+        /*
+         * Tree can be empty or last hole is still not checked because the
+         * g_tree_foreach does not compare (last, iova_end] range.
+         */
+        iova_tree_alloc_args_iterate(&args, NULL);
+        iova_tree_alloc_map_in_hole(&args);
+    }
+
+    if (!args.iova_found || args.iova_result + map->size > iova_last) {
+        return IOVA_ERR_NOMEM;
+    }
+
+    map->iova = args.iova_result;
+    return iova_tree_insert(tree, map);
+}
+
 void iova_tree_destroy(IOVATree *tree)
 {
     g_tree_destroy(tree->tree);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 182+ messages in thread

* Re: [PATCH] util: Add iova_tree_alloc
  2022-02-15 19:34                   ` [PATCH] util: Add iova_tree_alloc Eugenio Pérez
@ 2022-02-16  7:25                     ` Peter Xu
  0 siblings, 0 replies; 182+ messages in thread
From: Peter Xu @ 2022-02-16  7:25 UTC (permalink / raw)
  To: Eugenio Pérez; +Cc: Jason Wang, qemu-devel

On Tue, Feb 15, 2022 at 08:34:23PM +0100, Eugenio Pérez wrote:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
> 
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-08  8:25         ` Jason Wang
  (?)
@ 2022-02-16 15:53         ` Eugenio Perez Martin
  2022-02-17  6:02             ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-16 15:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> SVQ is able to log the dirty bits by itself, so let's use it to not
> >>> block migration.
> >>>
> >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> >>> enabled. Even if the device supports it, the reports would be nonsense
> >>> because SVQ memory is in the qemu region.
> >>>
> >>> The log region is still allocated. Future changes might skip that, but
> >>> this series is already long enough.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> >>>    1 file changed, 20 insertions(+)
> >>>
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index fb0a338baa..75090d65e8 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> >>>        if (ret == 0 && v->shadow_vqs_enabled) {
> >>>            /* Filter only features that SVQ can offer to guest */
> >>>            vhost_svq_valid_guest_features(features);
> >>> +
> >>> +        /* Add SVQ logging capabilities */
> >>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> >>>        }
> >>>
> >>>        return ret;
> >>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> >>>
> >>>        if (v->shadow_vqs_enabled) {
> >>>            uint64_t dev_features, svq_features, acked_features;
> >>> +        uint8_t status = 0;
> >>>            bool ok;
> >>>
> >>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> >>> +        if (unlikely(ret)) {
> >>> +            return ret;
> >>> +        }
> >>> +
> >>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >>> +            /*
> >>> +             * vhost is trying to enable or disable _F_LOG, and the device
> >>> +             * would report wrong dirty pages. SVQ handles it.
> >>> +             */
> >>
> >> I fail to understand this comment, I'd think there's no way to disable
> >> dirty page tracking for SVQ.
> >>
> > vhost_log_global_{start,stop} are called at the beginning and end of
> > migration. To inform the device that it should start logging, they set
> > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
>
>
> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> enabled and disabled.
>

Yes, that's what this patch does.

>
> >
> > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > vhost does not block migration. Maybe we need to look for another way
> > to do this?
>
>
> I'm fine with filtering since it's much more simpler, but I fail to
> understand why we need to check DRIVER_OK.
>

Ok maybe I can make that part more clear,

Since both operations use vhost_vdpa_set_features we must just filter
the one that actually sets or removes VHOST_F_LOG_ALL, without
affecting other features.

In practice, that means to not forward the set features after
DRIVER_OK. The device is not expecting them anymore.

Does that make more sense?

Thanks!

> Thanks
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>
> >>> +            return 0;
> >>> +        }
> >>> +
> >>> +        /* We must not ack _F_LOG if SVQ is enabled */
> >>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> >>> +
> >>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> >>>            if (ret != 0) {
> >>>                error_report("Can't get vdpa device features, got (%d)", ret);
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-16 15:53         ` Eugenio Perez Martin
@ 2022-02-17  6:02             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-17  6:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>> block migration.
> > >>>
> > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > >>> enabled. Even if the device supports it, the reports would be nonsense
> > >>> because SVQ memory is in the qemu region.
> > >>>
> > >>> The log region is still allocated. Future changes might skip that, but
> > >>> this series is already long enough.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > >>>    1 file changed, 20 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index fb0a338baa..75090d65e8 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > >>>        if (ret == 0 && v->shadow_vqs_enabled) {
> > >>>            /* Filter only features that SVQ can offer to guest */
> > >>>            vhost_svq_valid_guest_features(features);
> > >>> +
> > >>> +        /* Add SVQ logging capabilities */
> > >>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>>        }
> > >>>
> > >>>        return ret;
> > >>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > >>>
> > >>>        if (v->shadow_vqs_enabled) {
> > >>>            uint64_t dev_features, svq_features, acked_features;
> > >>> +        uint8_t status = 0;
> > >>>            bool ok;
> > >>>
> > >>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>> +        if (unlikely(ret)) {
> > >>> +            return ret;
> > >>> +        }
> > >>> +
> > >>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>> +            /*
> > >>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > >>> +             * would report wrong dirty pages. SVQ handles it.
> > >>> +             */
> > >>
> > >> I fail to understand this comment, I'd think there's no way to disable
> > >> dirty page tracking for SVQ.
> > >>
> > > vhost_log_global_{start,stop} are called at the beginning and end of
> > > migration. To inform the device that it should start logging, they set
> > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> >
> >
> > Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > enabled and disabled.
> >
>
> Yes, that's what this patch does.
>
> >
> > >
> > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > vhost does not block migration. Maybe we need to look for another way
> > > to do this?
> >
> >
> > I'm fine with filtering since it's much more simpler, but I fail to
> > understand why we need to check DRIVER_OK.
> >
>
> Ok maybe I can make that part more clear,
>
> Since both operations use vhost_vdpa_set_features we must just filter
> the one that actually sets or removes VHOST_F_LOG_ALL, without
> affecting other features.
>
> In practice, that means to not forward the set features after
> DRIVER_OK. The device is not expecting them anymore.

I wonder what happens if we don't do this.

So kernel had this check:

        /*
         * It's not allowed to change the features after they have
         * been negotiated.
         */
if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
        return -EBUSY;

So is it FEATURES_OK actually?

For this patch, I wonder if the thing we need to do is to see whether
it is a enable/disable F_LOG_ALL and simply return.

Thanks

>
> Does that make more sense?
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +            return 0;
> > >>> +        }
> > >>> +
> > >>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > >>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > >>> +
> > >>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > >>>            if (ret != 0) {
> > >>>                error_report("Can't get vdpa device features, got (%d)", ret);
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
@ 2022-02-17  6:02             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-17  6:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>> block migration.
> > >>>
> > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > >>> enabled. Even if the device supports it, the reports would be nonsense
> > >>> because SVQ memory is in the qemu region.
> > >>>
> > >>> The log region is still allocated. Future changes might skip that, but
> > >>> this series is already long enough.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > >>>    1 file changed, 20 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index fb0a338baa..75090d65e8 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > >>>        if (ret == 0 && v->shadow_vqs_enabled) {
> > >>>            /* Filter only features that SVQ can offer to guest */
> > >>>            vhost_svq_valid_guest_features(features);
> > >>> +
> > >>> +        /* Add SVQ logging capabilities */
> > >>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>>        }
> > >>>
> > >>>        return ret;
> > >>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > >>>
> > >>>        if (v->shadow_vqs_enabled) {
> > >>>            uint64_t dev_features, svq_features, acked_features;
> > >>> +        uint8_t status = 0;
> > >>>            bool ok;
> > >>>
> > >>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>> +        if (unlikely(ret)) {
> > >>> +            return ret;
> > >>> +        }
> > >>> +
> > >>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>> +            /*
> > >>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > >>> +             * would report wrong dirty pages. SVQ handles it.
> > >>> +             */
> > >>
> > >> I fail to understand this comment, I'd think there's no way to disable
> > >> dirty page tracking for SVQ.
> > >>
> > > vhost_log_global_{start,stop} are called at the beginning and end of
> > > migration. To inform the device that it should start logging, they set
> > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> >
> >
> > Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > enabled and disabled.
> >
>
> Yes, that's what this patch does.
>
> >
> > >
> > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > vhost does not block migration. Maybe we need to look for another way
> > > to do this?
> >
> >
> > I'm fine with filtering since it's much more simpler, but I fail to
> > understand why we need to check DRIVER_OK.
> >
>
> Ok maybe I can make that part more clear,
>
> Since both operations use vhost_vdpa_set_features we must just filter
> the one that actually sets or removes VHOST_F_LOG_ALL, without
> affecting other features.
>
> In practice, that means to not forward the set features after
> DRIVER_OK. The device is not expecting them anymore.

I wonder what happens if we don't do this.

So kernel had this check:

        /*
         * It's not allowed to change the features after they have
         * been negotiated.
         */
if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
        return -EBUSY;

So is it FEATURES_OK actually?

For this patch, I wonder if the thing we need to do is to see whether
it is a enable/disable F_LOG_ALL and simply return.

Thanks

>
> Does that make more sense?
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +            return 0;
> > >>> +        }
> > >>> +
> > >>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > >>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > >>> +
> > >>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > >>>            if (ret != 0) {
> > >>>                error_report("Can't get vdpa device features, got (%d)", ret);
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-17  6:02             ` Jason Wang
  (?)
@ 2022-02-17  8:22             ` Eugenio Perez Martin
  2022-02-22  7:41                 ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-17  8:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >>
> > > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > >>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > > >>> block migration.
> > > >>>
> > > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > > >>> enabled. Even if the device supports it, the reports would be nonsense
> > > >>> because SVQ memory is in the qemu region.
> > > >>>
> > > >>> The log region is still allocated. Future changes might skip that, but
> > > >>> this series is already long enough.
> > > >>>
> > > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > >>> ---
> > > >>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > > >>>    1 file changed, 20 insertions(+)
> > > >>>
> > > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > >>> index fb0a338baa..75090d65e8 100644
> > > >>> --- a/hw/virtio/vhost-vdpa.c
> > > >>> +++ b/hw/virtio/vhost-vdpa.c
> > > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > > >>>        if (ret == 0 && v->shadow_vqs_enabled) {
> > > >>>            /* Filter only features that SVQ can offer to guest */
> > > >>>            vhost_svq_valid_guest_features(features);
> > > >>> +
> > > >>> +        /* Add SVQ logging capabilities */
> > > >>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > >>>        }
> > > >>>
> > > >>>        return ret;
> > > >>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > > >>>
> > > >>>        if (v->shadow_vqs_enabled) {
> > > >>>            uint64_t dev_features, svq_features, acked_features;
> > > >>> +        uint8_t status = 0;
> > > >>>            bool ok;
> > > >>>
> > > >>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > > >>> +        if (unlikely(ret)) {
> > > >>> +            return ret;
> > > >>> +        }
> > > >>> +
> > > >>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > >>> +            /*
> > > >>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > > >>> +             * would report wrong dirty pages. SVQ handles it.
> > > >>> +             */
> > > >>
> > > >> I fail to understand this comment, I'd think there's no way to disable
> > > >> dirty page tracking for SVQ.
> > > >>
> > > > vhost_log_global_{start,stop} are called at the beginning and end of
> > > > migration. To inform the device that it should start logging, they set
> > > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > >
> > >
> > > Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > > only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > > enabled and disabled.
> > >
> >
> > Yes, that's what this patch does.
> >
> > >
> > > >
> > > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > > vhost does not block migration. Maybe we need to look for another way
> > > > to do this?
> > >
> > >
> > > I'm fine with filtering since it's much more simpler, but I fail to
> > > understand why we need to check DRIVER_OK.
> > >
> >
> > Ok maybe I can make that part more clear,
> >
> > Since both operations use vhost_vdpa_set_features we must just filter
> > the one that actually sets or removes VHOST_F_LOG_ALL, without
> > affecting other features.
> >
> > In practice, that means to not forward the set features after
> > DRIVER_OK. The device is not expecting them anymore.
>
> I wonder what happens if we don't do this.
>

If we simply delete the check vhost_dev_set_features will return an
error, failing the start of the migration. More on this below.

> So kernel had this check:
>
>         /*
>          * It's not allowed to change the features after they have
>          * been negotiated.
>          */
> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
>         return -EBUSY;
>
> So is it FEATURES_OK actually?
>

Yes, FEATURES_OK seems more appropriate actually so I will switch to
it for the next version.

But it should be functionally equivalent, since
vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
be concurrent with it.

> For this patch, I wonder if the thing we need to do is to see whether
> it is a enable/disable F_LOG_ALL and simply return.
>

Yes, that's the intention of the patch.

We have 4 cases here:
a) We're being called from vhost_dev_start, with enable_log = false
b) We're being called from vhost_dev_start, with enable_log = true
c) We're being called from vhost_dev_set_log, with enable_log = false
d) We're being called from vhost_dev_set_log, with enable_log = true

The way to tell the difference between a/b and c/d is to check if
{FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
memory through the memory unmapping, so we clear the bit
unconditionally if we detect that VHOST_SET_FEATURES will be called
(cases a and b).

Another possibility is to track if features have been set with a bool
in vhost_vdpa or something like that. But it seems cleaner to me to
only store that in the actual device.

> Thanks
>
> >
> > Does that make more sense?
> >
> > Thanks!
> >
> > > Thanks
> > >
> > >
> > > >
> > > > Thanks!
> > > >
> > > >> Thanks
> > > >>
> > > >>
> > > >>> +            return 0;
> > > >>> +        }
> > > >>> +
> > > >>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > > >>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > > >>> +
> > > >>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > > >>>            if (ret != 0) {
> > > >>>                error_report("Can't get vdpa device features, got (%d)", ret);
> > >
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-08  8:15         ` Jason Wang
  (?)
@ 2022-02-17 12:48         ` Eugenio Perez Martin
  2022-02-21  7:43             ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-17 12:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> > On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>    {
> >>>        event_notifier_set_handler(&svq->svq_kick, NULL);
> >>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> >>> +
> >>> +    if (!svq->vq) {
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    /* Send all pending used descriptors to guest */
> >>> +    vhost_svq_flush(svq, false);
> >>
> >> Do we need to wait for all the pending descriptors to be completed here?
> >>
> > No, this function does not wait, it only completes the forwarding of
> > the *used* descriptors.
> >
> > The best example is the net rx queue in my opinion. This call will
> > check SVQ's vring used_idx and will forward the last used descriptors
> > if any, but all available descriptors will remain as available for
> > qemu's VQ code.
> >
> > To skip it would miss those last rx descriptors in migration.
> >
> > Thanks!
>
>
> So it's probably to not the best place to ask. It's more about the
> inflight descriptors so it should be TX instead of RX.
>
> I can imagine the migration last phase, we should stop the vhost-vDPA
> before calling vhost_svq_stop(). Then we should be fine regardless of
> inflight descriptors.
>

I think I'm still missing something here.

To be on the same page. Regarding tx this could cause repeated tx
frames (one at source and other at destination), but never a missed
buffer not transmitted. The "stop before" could be interpreted as "SVQ
is not forwarding available buffers anymore". Would that work?

Thanks!

> Thanks
>
>
> >
> >> Thanks
> >>
> >>
> >>> +
> >>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> >>> +        g_autofree VirtQueueElement *elem = NULL;
> >>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> >>> +        if (elem) {
> >>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> >>> +        }
> >>> +    }
> >>> +
> >>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> >>> +    if (next_avail_elem) {
> >>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> >>> +                                 next_avail_elem->len);
> >>> +    }
> >>>    }
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-02-08  3:57         ` Jason Wang
  (?)
@ 2022-02-17 17:13         ` Eugenio Perez Martin
  2022-02-21  7:15             ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-17 17:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> First half of the buffers forwarding part, preparing vhost-vdpa
> >>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> >>> this is effectively dead code at the moment, but it helps to reduce
> >>> patch size.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> >>>    hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> >>>    hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> >>>    3 files changed, 143 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> >>> index 035207a469..39aef5ffdf 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> >>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >>>
> >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >>>
> >>> -VhostShadowVirtqueue *vhost_svq_new(void);
> >>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> >>>
> >>>    void vhost_svq_free(VhostShadowVirtqueue *vq);
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> >>> index f129ec8395..7c168075d7 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> >>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>    /**
> >>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >>>     * methods and file descriptors.
> >>> + *
> >>> + * @qsize Shadow VirtQueue size
> >>> + *
> >>> + * Returns the new virtqueue or NULL.
> >>> + *
> >>> + * In case of error, reason is reported through error_report.
> >>>     */
> >>> -VhostShadowVirtqueue *vhost_svq_new(void)
> >>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >>>    {
> >>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> >>> +    size_t device_size, driver_size;
> >>>        g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> >>>        int r;
> >>>
> >>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> >>>        /* Placeholder descriptor, it should be deleted at set_kick_fd */
> >>>        event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> >>>
> >>> +    svq->vring.num = qsize;
> >>
> >> I wonder if this is the best. E.g some hardware can support up to 32K
> >> queue size. So this will probably end up with:
> >>
> >> 1) SVQ use 32K queue size
> >> 2) hardware queue uses 256
> >>
> > In that case SVQ vring queue size will be 32K and guest's vring can
> > negotiate any number with SVQ equal or less than 32K,
>
>
> Sorry for being unclear what I meant is actually
>
> 1) SVQ uses 32K queue size
>
> 2) guest vq uses 256
>
> This looks like a burden that needs extra logic and may damage the
> performance.
>

Still not getting this point.

An available guest buffer, although contiguous in GPA/GVA, can expand
in multiple buffers if it's not contiguous in qemu's VA (by the while
loop in virtqueue_map_desc [1]). In that scenario it is better to have
"plenty" of SVQ buffers.

I'm ok if we decide to put an upper limit though, or if we decide not
to handle this situation. But we would leave out valid virtio drivers.
Maybe to set a fixed upper limit (1024?)? To add another parameter
(x-svq-size-n=N)?

If you mean we lose performance because memory gets more sparse I
think the only possibility is to limit that way.

> And this can lead other interesting situation:
>
> 1) SVQ uses 256
>
> 2) guest vq uses 1024
>
> Where a lot of more SVQ logic is needed.
>

If we agree that a guest descriptor can expand in multiple SVQ
descriptors, this should be already handled by the previous logic too.

But this should only happen in case that qemu is launched with a "bad"
cmdline, isn't it?

If I run that example with vp_vdpa, L0 qemu will happily accept 1024
as a queue size [2]. But if the vdpa device maximum queue size is
effectively 256, this will result in an error: We're not exposing it
to the guest at any moment but with qemu's cmdline.

>
> > including 256.
> > Is that what you mean?
>
>
> I mean, it looks to me the logic will be much more simplified if we just
> allocate the shadow virtqueue with the size what guest can see (guest
> vring).
>
> Then we don't need to think if the difference of the queue size can have
> any side effects.
>

I think that we cannot avoid that extra logic unless we force GPA to
be contiguous in IOVA. If we are sure the guest's buffers cannot be at
more than one descriptor in SVQ, then yes, we can simplify things. If
not, I think we are forced to carry all of it.

But if we prove it I'm not opposed to simplifying things and making
head at SVQ == head at guest.

Thanks!

[1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
[2] But that's not the whole story: I've been running limited in tx
descriptors because of virtio_net_max_tx_queue_size, which predates
vdpa. I'll send a patch to also un-limit it.

>
> >
> > If with hardware queues you mean guest's vring, not sure why it is
> > "probably 256". I'd say that in that case with the virtio-net kernel
> > driver the ring size will be the same as the device export, for
> > example, isn't it?
> >
> > The implementation should support any combination of sizes, but the
> > ring size exposed to the guest is never bigger than hardware one.
> >
> >> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> >> to add event index support?
> >>
> > I think we should not have any problem with event idx. If you mean
> > that the guest could mark more buffers available than SVQ vring's
> > size, that should not happen because there must be less entries in the
> > guest than SVQ.
> >
> > But if I understood you correctly, a similar situation could happen if
> > a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > Even if that would happen, the situation should be ok too: SVQ knows
> > the guest's avail idx and, if SVQ is full, it will continue forwarding
> > avail buffers when the device uses more buffers.
> >
> > Does that make sense to you?
>
>
> Yes.
>
> Thanks
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-02-08  3:23         ` Jason Wang
  (?)
@ 2022-02-18 12:35         ` Eugenio Perez Martin
  2022-02-21  7:39             ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-18 12:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
> > On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
> >>>    1 file changed, 18 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index 18de14f0fb..029f98feee 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> >>>        }
> >>>    }
> >>>
> >>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >>> -                                       struct vhost_vring_file *file)
> >>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> >>> +                                         struct vhost_vring_file *file)
> >>>    {
> >>>        trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> >>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >>>    }
> >>>
> >>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >>> +                                     struct vhost_vring_file *file)
> >>> +{
> >>> +    struct vhost_vdpa *v = dev->opaque;
> >>> +
> >>> +    if (v->shadow_vqs_enabled) {
> >>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> >>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> >>> +
> >>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
> >>
> >> Two questions here (had similar questions for vring kick):
> >>
> >> 1) Any reason that we setup the eventfd for vhost-vdpa in
> >> vhost_vdpa_svq_setup() not here?
> >>
> > I'm not sure what you mean.
> >
> > The guest->SVQ call and kick fds are set here and at
> > vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
> > SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
> > vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
> > notifier handler since we don't poll it.
> >
> > On the other hand, the connection SVQ <-> device uses the same fds
> > from the beginning to the end, and they will not change with, for
> > example, call fd masking. That's why it's setup from
> > vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
> > us add way more logic there.
>
>
> More logic in general shadow vq code but less codes for vhost-vdpa
> specific code I think.
>
> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
> here.
>

But they are different fds. vhost_vdpa_svq_set_fds sets the
SVQ<->device. This function sets the SVQ->guest call file descriptor.

To move the logic of vhost_vdpa_svq_set_fds here would imply either:
a) Logic to know if we are receiving the first call fd or not. That
code is not in the series at the moment, because setting at
vhost_vdpa_dev_start tells the difference for free. Is just adding
code, not moving.
b) Logic to set again *the same* file descriptor to device, with logic
to tell if we have missed calls. That logic is not implemented for
device->SVQ call file descriptor, because we are assuming it never
changes from vhost_vdpa_svq_set_fds. So this is again adding code.

At this moment, we have:
vhost_vdpa_svq_set_fds:
  set SVQ<->device fds

vhost_vdpa_set_vring_call:
  set guest<-SVQ call

vhost_vdpa_set_vring_kick:
  set guest->SVQ kick.

If I understood correctly, the alternative would be something like:
vhost_vdpa_set_vring_call:
  set guest<-SVQ call
  if(!vq->call_set) {
    - set SVQ<-device call.
    - vq->call_set = true
  }

vhost_vdpa_set_vring_kick:
  set guest<-SVQ call
  if(!vq->dev_kick_set) {
    - set guest->device kick.
    - vq->dev_kick_set = true
  }

dev_reset / dev_stop:
for vq in vqs:
  vq->dev_kick_set = vq->dev_call_set = false
...

Or have I misunderstood something?

Thanks!

> Thanks
>
>
> >
> >> 2) The call could be disabled by using -1 as the fd, I don't see any
> >> code to deal with that.
> >>
> > Right, I didn't take that into account. vhost-kernel takes also -1 as
> > kick_fd to unbind, so SVQ can be reworked to take that into account
> > for sure.
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>
> >>> +        return 0;
> >>> +    } else {
> >>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
> >>> +    }
> >>> +}
> >>> +
> >>>    /**
> >>>     * Set shadow virtqueue descriptors to the device
> >>>     *
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
  2022-02-08  8:47         ` Jason Wang
  (?)
@ 2022-02-18 18:22         ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-18 18:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 8, 2022 at 9:48 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/31 下午6:18, Eugenio Perez Martin 写道:
> > On Fri, Jan 28, 2022 at 7:29 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> This function allows the vhost-vdpa backend to override kick_fd.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-shadow-virtqueue.h |  1 +
> >>>    hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
> >>>    2 files changed, 46 insertions(+)
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> >>> index 400effd9f2..a56ecfc09d 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> >>> @@ -15,6 +15,7 @@
> >>>
> >>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >>>
> >>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >>>                                                  const VhostShadowVirtqueue *svq);
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> >>> index bd87110073..21534bc94d 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> >>> @@ -11,6 +11,7 @@
> >>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
> >>>
> >>>    #include "qemu/error-report.h"
> >>> +#include "qemu/main-loop.h"
> >>>
> >>>    /* Shadow virtqueue to relay notifications */
> >>>    typedef struct VhostShadowVirtqueue {
> >>> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
> >>>        EventNotifier hdev_kick;
> >>>        /* Shadow call notifier, sent to vhost */
> >>>        EventNotifier hdev_call;
> >>> +
> >>> +    /*
> >>> +     * Borrowed virtqueue's guest to host notifier.
> >>> +     * To borrow it in this event notifier allows to register on the event
> >>> +     * loop and access the associated shadow virtqueue easily. If we use the
> >>> +     * VirtQueue, we don't have an easy way to retrieve it.
> >>> +     *
> >>> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> >>> +     */
> >>> +    EventNotifier svq_kick;
> >>>    } VhostShadowVirtqueue;
> >>>
> >>> +#define INVALID_SVQ_KICK_FD -1
> >>> +
> >>>    /**
> >>>     * The notifier that SVQ will use to notify the device.
> >>>     */
> >>> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
> >>>        return &svq->hdev_kick;
> >>>    }
> >>>
> >>> +/**
> >>> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
> >>> + *
> >>> + * @svq          The svq
> >>> + * @svq_kick_fd  The new svq kick fd
> >>> + */
> >>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >>> +{
> >>> +    EventNotifier tmp;
> >>> +    bool check_old = INVALID_SVQ_KICK_FD !=
> >>> +                     event_notifier_get_fd(&svq->svq_kick);
> >>> +
> >>> +    if (check_old) {
> >>> +        event_notifier_set_handler(&svq->svq_kick, NULL);
> >>> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
> >>> +    }
> >>
> >> It looks to me we don't do similar things in vhost-net. Any reason for
> >> caring about the old svq_kick?
> >>
> > Do you mean to check for old kick_fd in case we miss notifications,
> > and explicitly omit the INVALID_SVQ_KICK_FD?
>
>
> Yes.
>
>
> >
> > If you mean qemu's vhost-net, I guess it's because the device's kick
> > fd is never changed in all the vhost device lifecycle, it's only set
> > at the beginning. Previous RFC also depended on that, but you
> > suggested better vhost and SVQ in v4 feedback if I understood
> > correctly [1]. Or am I missing something?
>
>
> No, I forgot that. But in this case we should have a better dealing with
> the the conversion from valid fd to -1 by disabling the handler.
>

Sure, I will do it that way for the next version.

>
> >
> > Qemu's vhost-net does not need to use this because it is not polling
> > it. For kernel's vhost, I guess the closest is the use of pollstop and
> > pollstart at vhost_vring_ioctl.
> >
> > In my opinion, I think that SVQ code size can benefit from now
> > allowing to override kick_fd from the start of the operation. Not from
> > initialization, but start. But I can see the benefits of having the
> > change into account from this moment so it's more resilient to the
> > future.
> >
> >>> +
> >>> +    /*
> >>> +     * event_notifier_set_handler already checks for guest's notifications if
> >>> +     * they arrive to the new file descriptor in the switch, so there is no
> >>> +     * need to explicitely check for them.
> >>> +     */
> >>> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> >>> +
> >>> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
> >>> +        event_notifier_set(&svq->hdev_kick);
> >>
> >> Any reason we need to kick the device directly here?
> >>
> > At this point of the series only notifications are forwarded, not
> > buffers. If kick_fd is set, we need to check the old one, the same way
> > as vhost checks the masked notifier in case of change.
>
>
> I meant we need to kick the svq instead of vhost-vdpa in this case?
>

Actually, yes, you're right.

At this moment of the series is not needed, since SVQ will only relay
the kick to the device. But when SVQ starts to forward buffers that's
needed, so thanks for the catch!

> Thanks
>
>
> >
> > Thanks!
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg03152.html
> > , from "I'd suggest to not depend on this since it:"
> >
> >
> >> Thanks
> >>
> >>
> >>> +    }
> >>> +}
> >>> +
> >>>    /**
> >>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >>>     * methods and file descriptors.
> >>> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> >>>            goto err_init_hdev_call;
> >>>        }
> >>>
> >>> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
> >>> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> >>> +
> >>>        return g_steal_pointer(&svq);
> >>>
> >>>    err_init_hdev_call:
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-02-17 17:13         ` Eugenio Perez Martin
@ 2022-02-21  7:15             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:15 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
>>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> First half of the buffers forwarding part, preparing vhost-vdpa
>>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
>>>>> this is effectively dead code at the moment, but it helps to reduce
>>>>> patch size.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> index 035207a469..39aef5ffdf 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>>>
>>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>>>
>>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
>>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>>>>
>>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> index f129ec8395..7c168075d7 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>     /**
>>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>>>      * methods and file descriptors.
>>>>> + *
>>>>> + * @qsize Shadow VirtQueue size
>>>>> + *
>>>>> + * Returns the new virtqueue or NULL.
>>>>> + *
>>>>> + * In case of error, reason is reported through error_report.
>>>>>      */
>>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
>>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>>>     {
>>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
>>>>> +    size_t device_size, driver_size;
>>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>>>>         int r;
>>>>>
>>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>>>>
>>>>> +    svq->vring.num = qsize;
>>>> I wonder if this is the best. E.g some hardware can support up to 32K
>>>> queue size. So this will probably end up with:
>>>>
>>>> 1) SVQ use 32K queue size
>>>> 2) hardware queue uses 256
>>>>
>>> In that case SVQ vring queue size will be 32K and guest's vring can
>>> negotiate any number with SVQ equal or less than 32K,
>>
>> Sorry for being unclear what I meant is actually
>>
>> 1) SVQ uses 32K queue size
>>
>> 2) guest vq uses 256
>>
>> This looks like a burden that needs extra logic and may damage the
>> performance.
>>
> Still not getting this point.
>
> An available guest buffer, although contiguous in GPA/GVA, can expand
> in multiple buffers if it's not contiguous in qemu's VA (by the while
> loop in virtqueue_map_desc [1]). In that scenario it is better to have
> "plenty" of SVQ buffers.


Yes, but this case should be rare. So in this case we should deal with 
overrun on SVQ, that is

1) SVQ is full
2) guest VQ isn't

We need to

1) check the available buffer slots
2) disable guest kick and wait for the used buffers

But it looks to me the current code is not ready for dealing with this case?


>
> I'm ok if we decide to put an upper limit though, or if we decide not
> to handle this situation. But we would leave out valid virtio drivers.
> Maybe to set a fixed upper limit (1024?)? To add another parameter
> (x-svq-size-n=N)?
>
> If you mean we lose performance because memory gets more sparse I
> think the only possibility is to limit that way.


If guest is not using 32K, having a 32K for svq may gives extra stress 
on the cache since we will end up with a pretty large working set.


>
>> And this can lead other interesting situation:
>>
>> 1) SVQ uses 256
>>
>> 2) guest vq uses 1024
>>
>> Where a lot of more SVQ logic is needed.
>>
> If we agree that a guest descriptor can expand in multiple SVQ
> descriptors, this should be already handled by the previous logic too.
>
> But this should only happen in case that qemu is launched with a "bad"
> cmdline, isn't it?


This seems can happen when we use -device 
virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?


>
> If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> as a queue size [2]. But if the vdpa device maximum queue size is
> effectively 256, this will result in an error: We're not exposing it
> to the guest at any moment but with qemu's cmdline.
>
>>> including 256.
>>> Is that what you mean?
>>
>> I mean, it looks to me the logic will be much more simplified if we just
>> allocate the shadow virtqueue with the size what guest can see (guest
>> vring).
>>
>> Then we don't need to think if the difference of the queue size can have
>> any side effects.
>>
> I think that we cannot avoid that extra logic unless we force GPA to
> be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> more than one descriptor in SVQ, then yes, we can simplify things. If
> not, I think we are forced to carry all of it.


Yes, I agree, the code should be robust to handle any case.

Thanks


>
> But if we prove it I'm not opposed to simplifying things and making
> head at SVQ == head at guest.
>
> Thanks!
>
> [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> [2] But that's not the whole story: I've been running limited in tx
> descriptors because of virtio_net_max_tx_queue_size, which predates
> vdpa. I'll send a patch to also un-limit it.
>
>>> If with hardware queues you mean guest's vring, not sure why it is
>>> "probably 256". I'd say that in that case with the virtio-net kernel
>>> driver the ring size will be the same as the device export, for
>>> example, isn't it?
>>>
>>> The implementation should support any combination of sizes, but the
>>> ring size exposed to the guest is never bigger than hardware one.
>>>
>>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
>>>> to add event index support?
>>>>
>>> I think we should not have any problem with event idx. If you mean
>>> that the guest could mark more buffers available than SVQ vring's
>>> size, that should not happen because there must be less entries in the
>>> guest than SVQ.
>>>
>>> But if I understood you correctly, a similar situation could happen if
>>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
>>> Even if that would happen, the situation should be ok too: SVQ knows
>>> the guest's avail idx and, if SVQ is full, it will continue forwarding
>>> avail buffers when the device uses more buffers.
>>>
>>> Does that make sense to you?
>>
>> Yes.
>>
>> Thanks
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
@ 2022-02-21  7:15             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:15 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
>>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> First half of the buffers forwarding part, preparing vhost-vdpa
>>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
>>>>> this is effectively dead code at the moment, but it helps to reduce
>>>>> patch size.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> index 035207a469..39aef5ffdf 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>>>
>>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>>>
>>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
>>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>>>>
>>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> index f129ec8395..7c168075d7 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>     /**
>>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>>>      * methods and file descriptors.
>>>>> + *
>>>>> + * @qsize Shadow VirtQueue size
>>>>> + *
>>>>> + * Returns the new virtqueue or NULL.
>>>>> + *
>>>>> + * In case of error, reason is reported through error_report.
>>>>>      */
>>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
>>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>>>     {
>>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
>>>>> +    size_t device_size, driver_size;
>>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>>>>         int r;
>>>>>
>>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>>>>
>>>>> +    svq->vring.num = qsize;
>>>> I wonder if this is the best. E.g some hardware can support up to 32K
>>>> queue size. So this will probably end up with:
>>>>
>>>> 1) SVQ use 32K queue size
>>>> 2) hardware queue uses 256
>>>>
>>> In that case SVQ vring queue size will be 32K and guest's vring can
>>> negotiate any number with SVQ equal or less than 32K,
>>
>> Sorry for being unclear what I meant is actually
>>
>> 1) SVQ uses 32K queue size
>>
>> 2) guest vq uses 256
>>
>> This looks like a burden that needs extra logic and may damage the
>> performance.
>>
> Still not getting this point.
>
> An available guest buffer, although contiguous in GPA/GVA, can expand
> in multiple buffers if it's not contiguous in qemu's VA (by the while
> loop in virtqueue_map_desc [1]). In that scenario it is better to have
> "plenty" of SVQ buffers.


Yes, but this case should be rare. So in this case we should deal with 
overrun on SVQ, that is

1) SVQ is full
2) guest VQ isn't

We need to

1) check the available buffer slots
2) disable guest kick and wait for the used buffers

But it looks to me the current code is not ready for dealing with this case?


>
> I'm ok if we decide to put an upper limit though, or if we decide not
> to handle this situation. But we would leave out valid virtio drivers.
> Maybe to set a fixed upper limit (1024?)? To add another parameter
> (x-svq-size-n=N)?
>
> If you mean we lose performance because memory gets more sparse I
> think the only possibility is to limit that way.


If guest is not using 32K, having a 32K for svq may gives extra stress 
on the cache since we will end up with a pretty large working set.


>
>> And this can lead other interesting situation:
>>
>> 1) SVQ uses 256
>>
>> 2) guest vq uses 1024
>>
>> Where a lot of more SVQ logic is needed.
>>
> If we agree that a guest descriptor can expand in multiple SVQ
> descriptors, this should be already handled by the previous logic too.
>
> But this should only happen in case that qemu is launched with a "bad"
> cmdline, isn't it?


This seems can happen when we use -device 
virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?


>
> If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> as a queue size [2]. But if the vdpa device maximum queue size is
> effectively 256, this will result in an error: We're not exposing it
> to the guest at any moment but with qemu's cmdline.
>
>>> including 256.
>>> Is that what you mean?
>>
>> I mean, it looks to me the logic will be much more simplified if we just
>> allocate the shadow virtqueue with the size what guest can see (guest
>> vring).
>>
>> Then we don't need to think if the difference of the queue size can have
>> any side effects.
>>
> I think that we cannot avoid that extra logic unless we force GPA to
> be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> more than one descriptor in SVQ, then yes, we can simplify things. If
> not, I think we are forced to carry all of it.


Yes, I agree, the code should be robust to handle any case.

Thanks


>
> But if we prove it I'm not opposed to simplifying things and making
> head at SVQ == head at guest.
>
> Thanks!
>
> [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> [2] But that's not the whole story: I've been running limited in tx
> descriptors because of virtio_net_max_tx_queue_size, which predates
> vdpa. I'll send a patch to also un-limit it.
>
>>> If with hardware queues you mean guest's vring, not sure why it is
>>> "probably 256". I'd say that in that case with the virtio-net kernel
>>> driver the ring size will be the same as the device export, for
>>> example, isn't it?
>>>
>>> The implementation should support any combination of sizes, but the
>>> ring size exposed to the guest is never bigger than hardware one.
>>>
>>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
>>>> to add event index support?
>>>>
>>> I think we should not have any problem with event idx. If you mean
>>> that the guest could mark more buffers available than SVQ vring's
>>> size, that should not happen because there must be less entries in the
>>> guest than SVQ.
>>>
>>> But if I understood you correctly, a similar situation could happen if
>>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
>>> Even if that would happen, the situation should be ok too: SVQ knows
>>> the guest's avail idx and, if SVQ is full, it will continue forwarding
>>> avail buffers when the device uses more buffers.
>>>
>>> Does that make sense to you?
>>
>> Yes.
>>
>> Thanks
>>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
  2022-01-28  7:57     ` Eugenio Perez Martin
@ 2022-02-21  7:31         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/28 下午3:57, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 6:59 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> vhost_vdpa_set_features and vhost_vdpa_init need to use
>>> vhost_vdpa_get_features in svq mode.
>>>
>>> vhost_vdpa_dev_start needs to use almost all _set_ functions:
>>> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
>>> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
>>>
>>> No functional change intended.
>>
>> Is it related (a must) to the SVQ code?
>>
> Yes, SVQ needs to access the device variants to configure it, while
> exposing the SVQ ones.
>
> For example for set_features, SVQ needs to set device features in the
> start code, but expose SVQ ones to the guest.
>
> Another possibility is to forward-declare them but I feel it pollutes
> the code more, doesn't it? Is there any reason to avoid the reordering
> beyond reducing the number of changes/patches?


No, but for reviewer, it might be easier if you squash the reordering 
logic into the patch which needs that.

Thanks


>
> Thanks!
>
>
>> Thanks
>>
>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
>>>    1 file changed, 82 insertions(+), 82 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 04ea43704f..6c10a7f05f 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>>>        return v->index != 0;
>>>    }
>>>
>>> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> -{
>>> -    struct vhost_vdpa *v;
>>> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>>> -    trace_vhost_vdpa_init(dev, opaque);
>>> -    int ret;
>>> -
>>> -    /*
>>> -     * Similar to VFIO, we end up pinning all guest memory and have to
>>> -     * disable discarding of RAM.
>>> -     */
>>> -    ret = ram_block_discard_disable(true);
>>> -    if (ret) {
>>> -        error_report("Cannot set discarding of RAM broken");
>>> -        return ret;
>>> -    }
>>> -
>>> -    v = opaque;
>>> -    v->dev = dev;
>>> -    dev->opaque =  opaque ;
>>> -    v->listener = vhost_vdpa_memory_listener;
>>> -    v->msg_type = VHOST_IOTLB_MSG_V2;
>>> -
>>> -    vhost_vdpa_get_iova_range(v);
>>> -
>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> -                               VIRTIO_CONFIG_S_DRIVER);
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>    static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
>>>                                                int queue_index)
>>>    {
>>> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
>>>        return 0;
>>>    }
>>>
>>> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> -                                   uint64_t features)
>>> -{
>>> -    int ret;
>>> -
>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    trace_vhost_vdpa_set_features(dev, features);
>>> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>>> -    if (ret) {
>>> -        return ret;
>>> -    }
>>> -
>>> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>>> -}
>>> -
>>>    static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>>    {
>>>        uint64_t features;
>>> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>>>        return ret;
>>>     }
>>>
>>> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> -{
>>> -    struct vhost_vdpa *v = dev->opaque;
>>> -    trace_vhost_vdpa_dev_start(dev, started);
>>> -
>>> -    if (started) {
>>> -        vhost_vdpa_host_notifiers_init(dev);
>>> -        vhost_vdpa_set_vring_ready(dev);
>>> -    } else {
>>> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> -    }
>>> -
>>> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    if (started) {
>>> -        memory_listener_register(&v->listener, &address_space_memory);
>>> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> -    } else {
>>> -        vhost_vdpa_reset_device(dev);
>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>> -        memory_listener_unregister(&v->listener);
>>> -
>>> -        return 0;
>>> -    }
>>> -}
>>> -
>>>    static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>>>                                         struct vhost_log *log)
>>>    {
>>> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    trace_vhost_vdpa_dev_start(dev, started);
>>> +
>>> +    if (started) {
>>> +        vhost_vdpa_host_notifiers_init(dev);
>>> +        vhost_vdpa_set_vring_ready(dev);
>>> +    } else {
>>> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> +    }
>>> +
>>> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (started) {
>>> +        memory_listener_register(&v->listener, &address_space_memory);
>>> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> +    } else {
>>> +        vhost_vdpa_reset_device(dev);
>>> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> +                                   VIRTIO_CONFIG_S_DRIVER);
>>> +        memory_listener_unregister(&v->listener);
>>> +
>>> +        return 0;
>>> +    }
>>> +}
>>> +
>>>    static int vhost_vdpa_get_features(struct vhost_dev *dev,
>>>                                         uint64_t *features)
>>>    {
>>> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>>>        return ret;
>>>    }
>>>
>>> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> +                                   uint64_t features)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (vhost_vdpa_one_time_request(dev)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    trace_vhost_vdpa_set_features(dev, features);
>>> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>>> +}
>>> +
>>>    static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>>>    {
>>>        if (vhost_vdpa_one_time_request(dev)) {
>>> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>>        return true;
>>>    }
>>>
>>> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> +{
>>> +    struct vhost_vdpa *v;
>>> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>>> +    trace_vhost_vdpa_init(dev, opaque);
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Similar to VFIO, we end up pinning all guest memory and have to
>>> +     * disable discarding of RAM.
>>> +     */
>>> +    ret = ram_block_discard_disable(true);
>>> +    if (ret) {
>>> +        error_report("Cannot set discarding of RAM broken");
>>> +        return ret;
>>> +    }
>>> +
>>> +    v = opaque;
>>> +    v->dev = dev;
>>> +    dev->opaque =  opaque ;
>>> +    v->listener = vhost_vdpa_memory_listener;
>>> +    v->msg_type = VHOST_IOTLB_MSG_V2;
>>> +
>>> +    vhost_vdpa_get_iova_range(v);
>>> +
>>> +    if (vhost_vdpa_one_time_request(dev)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    const VhostOps vdpa_ops = {
>>>            .backend_type = VHOST_BACKEND_TYPE_VDPA,
>>>            .vhost_backend_init = vhost_vdpa_init,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
@ 2022-02-21  7:31         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/1/28 下午3:57, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 6:59 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> vhost_vdpa_set_features and vhost_vdpa_init need to use
>>> vhost_vdpa_get_features in svq mode.
>>>
>>> vhost_vdpa_dev_start needs to use almost all _set_ functions:
>>> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
>>> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
>>>
>>> No functional change intended.
>>
>> Is it related (a must) to the SVQ code?
>>
> Yes, SVQ needs to access the device variants to configure it, while
> exposing the SVQ ones.
>
> For example for set_features, SVQ needs to set device features in the
> start code, but expose SVQ ones to the guest.
>
> Another possibility is to forward-declare them but I feel it pollutes
> the code more, doesn't it? Is there any reason to avoid the reordering
> beyond reducing the number of changes/patches?


No, but for reviewer, it might be easier if you squash the reordering 
logic into the patch which needs that.

Thanks


>
> Thanks!
>
>
>> Thanks
>>
>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
>>>    1 file changed, 82 insertions(+), 82 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 04ea43704f..6c10a7f05f 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>>>        return v->index != 0;
>>>    }
>>>
>>> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> -{
>>> -    struct vhost_vdpa *v;
>>> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>>> -    trace_vhost_vdpa_init(dev, opaque);
>>> -    int ret;
>>> -
>>> -    /*
>>> -     * Similar to VFIO, we end up pinning all guest memory and have to
>>> -     * disable discarding of RAM.
>>> -     */
>>> -    ret = ram_block_discard_disable(true);
>>> -    if (ret) {
>>> -        error_report("Cannot set discarding of RAM broken");
>>> -        return ret;
>>> -    }
>>> -
>>> -    v = opaque;
>>> -    v->dev = dev;
>>> -    dev->opaque =  opaque ;
>>> -    v->listener = vhost_vdpa_memory_listener;
>>> -    v->msg_type = VHOST_IOTLB_MSG_V2;
>>> -
>>> -    vhost_vdpa_get_iova_range(v);
>>> -
>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> -                               VIRTIO_CONFIG_S_DRIVER);
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>    static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
>>>                                                int queue_index)
>>>    {
>>> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
>>>        return 0;
>>>    }
>>>
>>> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> -                                   uint64_t features)
>>> -{
>>> -    int ret;
>>> -
>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    trace_vhost_vdpa_set_features(dev, features);
>>> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>>> -    if (ret) {
>>> -        return ret;
>>> -    }
>>> -
>>> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>>> -}
>>> -
>>>    static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>>    {
>>>        uint64_t features;
>>> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>>>        return ret;
>>>     }
>>>
>>> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> -{
>>> -    struct vhost_vdpa *v = dev->opaque;
>>> -    trace_vhost_vdpa_dev_start(dev, started);
>>> -
>>> -    if (started) {
>>> -        vhost_vdpa_host_notifiers_init(dev);
>>> -        vhost_vdpa_set_vring_ready(dev);
>>> -    } else {
>>> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> -    }
>>> -
>>> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    if (started) {
>>> -        memory_listener_register(&v->listener, &address_space_memory);
>>> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> -    } else {
>>> -        vhost_vdpa_reset_device(dev);
>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>> -        memory_listener_unregister(&v->listener);
>>> -
>>> -        return 0;
>>> -    }
>>> -}
>>> -
>>>    static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>>>                                         struct vhost_log *log)
>>>    {
>>> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    trace_vhost_vdpa_dev_start(dev, started);
>>> +
>>> +    if (started) {
>>> +        vhost_vdpa_host_notifiers_init(dev);
>>> +        vhost_vdpa_set_vring_ready(dev);
>>> +    } else {
>>> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> +    }
>>> +
>>> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (started) {
>>> +        memory_listener_register(&v->listener, &address_space_memory);
>>> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> +    } else {
>>> +        vhost_vdpa_reset_device(dev);
>>> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> +                                   VIRTIO_CONFIG_S_DRIVER);
>>> +        memory_listener_unregister(&v->listener);
>>> +
>>> +        return 0;
>>> +    }
>>> +}
>>> +
>>>    static int vhost_vdpa_get_features(struct vhost_dev *dev,
>>>                                         uint64_t *features)
>>>    {
>>> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>>>        return ret;
>>>    }
>>>
>>> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> +                                   uint64_t features)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (vhost_vdpa_one_time_request(dev)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    trace_vhost_vdpa_set_features(dev, features);
>>> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>>> +}
>>> +
>>>    static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>>>    {
>>>        if (vhost_vdpa_one_time_request(dev)) {
>>> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>>        return true;
>>>    }
>>>
>>> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> +{
>>> +    struct vhost_vdpa *v;
>>> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>>> +    trace_vhost_vdpa_init(dev, opaque);
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Similar to VFIO, we end up pinning all guest memory and have to
>>> +     * disable discarding of RAM.
>>> +     */
>>> +    ret = ram_block_discard_disable(true);
>>> +    if (ret) {
>>> +        error_report("Cannot set discarding of RAM broken");
>>> +        return ret;
>>> +    }
>>> +
>>> +    v = opaque;
>>> +    v->dev = dev;
>>> +    dev->opaque =  opaque ;
>>> +    v->listener = vhost_vdpa_memory_listener;
>>> +    v->msg_type = VHOST_IOTLB_MSG_V2;
>>> +
>>> +    vhost_vdpa_get_iova_range(v);
>>> +
>>> +    if (vhost_vdpa_one_time_request(dev)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    const VhostOps vdpa_ops = {
>>>            .backend_type = VHOST_BACKEND_TYPE_VDPA,
>>>            .vhost_backend_init = vhost_vdpa_init,



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-02-18 12:35         ` Eugenio Perez Martin
@ 2022-02-21  7:39             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:39 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
>>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>>>     1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index 18de14f0fb..029f98feee 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>>>         }
>>>>>     }
>>>>>
>>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>> -                                       struct vhost_vring_file *file)
>>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>>>> +                                         struct vhost_vring_file *file)
>>>>>     {
>>>>>         trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>>>         return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>>>     }
>>>>>
>>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>> +                                     struct vhost_vring_file *file)
>>>>> +{
>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>>>> +
>>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>>> Two questions here (had similar questions for vring kick):
>>>>
>>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>>>> vhost_vdpa_svq_setup() not here?
>>>>
>>> I'm not sure what you mean.
>>>
>>> The guest->SVQ call and kick fds are set here and at
>>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
>>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
>>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
>>> notifier handler since we don't poll it.
>>>
>>> On the other hand, the connection SVQ <-> device uses the same fds
>>> from the beginning to the end, and they will not change with, for
>>> example, call fd masking. That's why it's setup from
>>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
>>> us add way more logic there.
>>
>> More logic in general shadow vq code but less codes for vhost-vdpa
>> specific code I think.
>>
>> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
>> here.
>>
> But they are different fds. vhost_vdpa_svq_set_fds sets the
> SVQ<->device. This function sets the SVQ->guest call file descriptor.
>
> To move the logic of vhost_vdpa_svq_set_fds here would imply either:
> a) Logic to know if we are receiving the first call fd or not.


Any reason for this? I guess you meant multiqueue. If yes, it should not 
be much difference since we have idx as the parameter.


>   That
> code is not in the series at the moment, because setting at
> vhost_vdpa_dev_start tells the difference for free. Is just adding
> code, not moving.
> b) Logic to set again *the same* file descriptor to device, with logic
> to tell if we have missed calls. That logic is not implemented for
> device->SVQ call file descriptor, because we are assuming it never
> changes from vhost_vdpa_svq_set_fds. So this is again adding code.
>
> At this moment, we have:
> vhost_vdpa_svq_set_fds:
>    set SVQ<->device fds
>
> vhost_vdpa_set_vring_call:
>    set guest<-SVQ call
>
> vhost_vdpa_set_vring_kick:
>    set guest->SVQ kick.
>
> If I understood correctly, the alternative would be something like:
> vhost_vdpa_set_vring_call:
>    set guest<-SVQ call
>    if(!vq->call_set) {
>      - set SVQ<-device call.
>      - vq->call_set = true
>    }
>
> vhost_vdpa_set_vring_kick:
>    set guest<-SVQ call
>    if(!vq->dev_kick_set) {
>      - set guest->device kick.
>      - vq->dev_kick_set = true
>    }
>
> dev_reset / dev_stop:
> for vq in vqs:
>    vq->dev_kick_set = vq->dev_call_set = false
> ...
>
> Or have I misunderstood something?


I wonder what happens if MSI-X is masking in guest. So if I understand 
correctly, we don't disable the eventfd from device? If yes, this seems 
suboptinal.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>> 2) The call could be disabled by using -1 as the fd, I don't see any
>>>> code to deal with that.
>>>>
>>> Right, I didn't take that into account. vhost-kernel takes also -1 as
>>> kick_fd to unbind, so SVQ can be reworked to take that into account
>>> for sure.
>>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>> +        return 0;
>>>>> +    } else {
>>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * Set shadow virtqueue descriptors to the device
>>>>>      *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
@ 2022-02-21  7:39             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:39 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
>>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>>>     1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index 18de14f0fb..029f98feee 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>>>         }
>>>>>     }
>>>>>
>>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>> -                                       struct vhost_vring_file *file)
>>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>>>> +                                         struct vhost_vring_file *file)
>>>>>     {
>>>>>         trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>>>         return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>>>     }
>>>>>
>>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>> +                                     struct vhost_vring_file *file)
>>>>> +{
>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>>>> +
>>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>>> Two questions here (had similar questions for vring kick):
>>>>
>>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>>>> vhost_vdpa_svq_setup() not here?
>>>>
>>> I'm not sure what you mean.
>>>
>>> The guest->SVQ call and kick fds are set here and at
>>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
>>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
>>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
>>> notifier handler since we don't poll it.
>>>
>>> On the other hand, the connection SVQ <-> device uses the same fds
>>> from the beginning to the end, and they will not change with, for
>>> example, call fd masking. That's why it's setup from
>>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
>>> us add way more logic there.
>>
>> More logic in general shadow vq code but less codes for vhost-vdpa
>> specific code I think.
>>
>> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
>> here.
>>
> But they are different fds. vhost_vdpa_svq_set_fds sets the
> SVQ<->device. This function sets the SVQ->guest call file descriptor.
>
> To move the logic of vhost_vdpa_svq_set_fds here would imply either:
> a) Logic to know if we are receiving the first call fd or not.


Any reason for this? I guess you meant multiqueue. If yes, it should not 
be much difference since we have idx as the parameter.


>   That
> code is not in the series at the moment, because setting at
> vhost_vdpa_dev_start tells the difference for free. Is just adding
> code, not moving.
> b) Logic to set again *the same* file descriptor to device, with logic
> to tell if we have missed calls. That logic is not implemented for
> device->SVQ call file descriptor, because we are assuming it never
> changes from vhost_vdpa_svq_set_fds. So this is again adding code.
>
> At this moment, we have:
> vhost_vdpa_svq_set_fds:
>    set SVQ<->device fds
>
> vhost_vdpa_set_vring_call:
>    set guest<-SVQ call
>
> vhost_vdpa_set_vring_kick:
>    set guest->SVQ kick.
>
> If I understood correctly, the alternative would be something like:
> vhost_vdpa_set_vring_call:
>    set guest<-SVQ call
>    if(!vq->call_set) {
>      - set SVQ<-device call.
>      - vq->call_set = true
>    }
>
> vhost_vdpa_set_vring_kick:
>    set guest<-SVQ call
>    if(!vq->dev_kick_set) {
>      - set guest->device kick.
>      - vq->dev_kick_set = true
>    }
>
> dev_reset / dev_stop:
> for vq in vqs:
>    vq->dev_kick_set = vq->dev_call_set = false
> ...
>
> Or have I misunderstood something?


I wonder what happens if MSI-X is masking in guest. So if I understand 
correctly, we don't disable the eventfd from device? If yes, this seems 
suboptinal.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>> 2) The call could be disabled by using -1 as the fd, I don't see any
>>>> code to deal with that.
>>>>
>>> Right, I didn't take that into account. vhost-kernel takes also -1 as
>>> kick_fd to unbind, so SVQ can be reworked to take that into account
>>> for sure.
>>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>> +        return 0;
>>>>> +    } else {
>>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * Set shadow virtqueue descriptors to the device
>>>>>      *



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
  2022-02-21  7:31         ` Jason Wang
  (?)
@ 2022-02-21  7:42         ` Eugenio Perez Martin
  -1 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-21  7:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Mon, Feb 21, 2022 at 8:31 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/1/28 下午3:57, Eugenio Perez Martin 写道:
> > On Fri, Jan 28, 2022 at 6:59 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> vhost_vdpa_set_features and vhost_vdpa_init need to use
> >>> vhost_vdpa_get_features in svq mode.
> >>>
> >>> vhost_vdpa_dev_start needs to use almost all _set_ functions:
> >>> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
> >>> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
> >>>
> >>> No functional change intended.
> >>
> >> Is it related (a must) to the SVQ code?
> >>
> > Yes, SVQ needs to access the device variants to configure it, while
> > exposing the SVQ ones.
> >
> > For example for set_features, SVQ needs to set device features in the
> > start code, but expose SVQ ones to the guest.
> >
> > Another possibility is to forward-declare them but I feel it pollutes
> > the code more, doesn't it? Is there any reason to avoid the reordering
> > beyond reducing the number of changes/patches?
>
>
> No, but for reviewer, it might be easier if you squash the reordering
> logic into the patch which needs that.
>

Sure, I can do that way. I thought the opposite but I can merge the
reorder in the different patches for the next version for sure.

Thanks!

> Thanks
>
>
> >
> > Thanks!
> >
> >
> >> Thanks
> >>
> >>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
> >>>    1 file changed, 82 insertions(+), 82 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index 04ea43704f..6c10a7f05f 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
> >>>        return v->index != 0;
> >>>    }
> >>>
> >>> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >>> -{
> >>> -    struct vhost_vdpa *v;
> >>> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> >>> -    trace_vhost_vdpa_init(dev, opaque);
> >>> -    int ret;
> >>> -
> >>> -    /*
> >>> -     * Similar to VFIO, we end up pinning all guest memory and have to
> >>> -     * disable discarding of RAM.
> >>> -     */
> >>> -    ret = ram_block_discard_disable(true);
> >>> -    if (ret) {
> >>> -        error_report("Cannot set discarding of RAM broken");
> >>> -        return ret;
> >>> -    }
> >>> -
> >>> -    v = opaque;
> >>> -    v->dev = dev;
> >>> -    dev->opaque =  opaque ;
> >>> -    v->listener = vhost_vdpa_memory_listener;
> >>> -    v->msg_type = VHOST_IOTLB_MSG_V2;
> >>> -
> >>> -    vhost_vdpa_get_iova_range(v);
> >>> -
> >>> -    if (vhost_vdpa_one_time_request(dev)) {
> >>> -        return 0;
> >>> -    }
> >>> -
> >>> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>> -                               VIRTIO_CONFIG_S_DRIVER);
> >>> -
> >>> -    return 0;
> >>> -}
> >>> -
> >>>    static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
> >>>                                                int queue_index)
> >>>    {
> >>> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
> >>>        return 0;
> >>>    }
> >>>
> >>> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> >>> -                                   uint64_t features)
> >>> -{
> >>> -    int ret;
> >>> -
> >>> -    if (vhost_vdpa_one_time_request(dev)) {
> >>> -        return 0;
> >>> -    }
> >>> -
> >>> -    trace_vhost_vdpa_set_features(dev, features);
> >>> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> >>> -    if (ret) {
> >>> -        return ret;
> >>> -    }
> >>> -
> >>> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> >>> -}
> >>> -
> >>>    static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >>>    {
> >>>        uint64_t features;
> >>> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
> >>>        return ret;
> >>>     }
> >>>
> >>> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>> -{
> >>> -    struct vhost_vdpa *v = dev->opaque;
> >>> -    trace_vhost_vdpa_dev_start(dev, started);
> >>> -
> >>> -    if (started) {
> >>> -        vhost_vdpa_host_notifiers_init(dev);
> >>> -        vhost_vdpa_set_vring_ready(dev);
> >>> -    } else {
> >>> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>> -    }
> >>> -
> >>> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> >>> -        return 0;
> >>> -    }
> >>> -
> >>> -    if (started) {
> >>> -        memory_listener_register(&v->listener, &address_space_memory);
> >>> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> >>> -    } else {
> >>> -        vhost_vdpa_reset_device(dev);
> >>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>> -                                   VIRTIO_CONFIG_S_DRIVER);
> >>> -        memory_listener_unregister(&v->listener);
> >>> -
> >>> -        return 0;
> >>> -    }
> >>> -}
> >>> -
> >>>    static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> >>>                                         struct vhost_log *log)
> >>>    {
> >>> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >>>    }
> >>>
> >>> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>> +{
> >>> +    struct vhost_vdpa *v = dev->opaque;
> >>> +    trace_vhost_vdpa_dev_start(dev, started);
> >>> +
> >>> +    if (started) {
> >>> +        vhost_vdpa_host_notifiers_init(dev);
> >>> +        vhost_vdpa_set_vring_ready(dev);
> >>> +    } else {
> >>> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>> +    }
> >>> +
> >>> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (started) {
> >>> +        memory_listener_register(&v->listener, &address_space_memory);
> >>> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> >>> +    } else {
> >>> +        vhost_vdpa_reset_device(dev);
> >>> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>> +                                   VIRTIO_CONFIG_S_DRIVER);
> >>> +        memory_listener_unregister(&v->listener);
> >>> +
> >>> +        return 0;
> >>> +    }
> >>> +}
> >>> +
> >>>    static int vhost_vdpa_get_features(struct vhost_dev *dev,
> >>>                                         uint64_t *features)
> >>>    {
> >>> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
> >>>        return ret;
> >>>    }
> >>>
> >>> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> >>> +                                   uint64_t features)
> >>> +{
> >>> +    int ret;
> >>> +
> >>> +    if (vhost_vdpa_one_time_request(dev)) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    trace_vhost_vdpa_set_features(dev, features);
> >>> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> >>> +    if (ret) {
> >>> +        return ret;
> >>> +    }
> >>> +
> >>> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> >>> +}
> >>> +
> >>>    static int vhost_vdpa_set_owner(struct vhost_dev *dev)
> >>>    {
> >>>        if (vhost_vdpa_one_time_request(dev)) {
> >>> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
> >>>        return true;
> >>>    }
> >>>
> >>> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >>> +{
> >>> +    struct vhost_vdpa *v;
> >>> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> >>> +    trace_vhost_vdpa_init(dev, opaque);
> >>> +    int ret;
> >>> +
> >>> +    /*
> >>> +     * Similar to VFIO, we end up pinning all guest memory and have to
> >>> +     * disable discarding of RAM.
> >>> +     */
> >>> +    ret = ram_block_discard_disable(true);
> >>> +    if (ret) {
> >>> +        error_report("Cannot set discarding of RAM broken");
> >>> +        return ret;
> >>> +    }
> >>> +
> >>> +    v = opaque;
> >>> +    v->dev = dev;
> >>> +    dev->opaque =  opaque ;
> >>> +    v->listener = vhost_vdpa_memory_listener;
> >>> +    v->msg_type = VHOST_IOTLB_MSG_V2;
> >>> +
> >>> +    vhost_vdpa_get_iova_range(v);
> >>> +
> >>> +    if (vhost_vdpa_one_time_request(dev)) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >>> +                               VIRTIO_CONFIG_S_DRIVER);
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>>    const VhostOps vdpa_ops = {
> >>>            .backend_type = VHOST_BACKEND_TYPE_VDPA,
> >>>            .vhost_backend_init = vhost_vdpa_init,
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-17 12:48         ` Eugenio Perez Martin
@ 2022-02-21  7:43             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>     {
>>>>>         event_notifier_set_handler(&svq->svq_kick, NULL);
>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>>>> +
>>>>> +    if (!svq->vq) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    /* Send all pending used descriptors to guest */
>>>>> +    vhost_svq_flush(svq, false);
>>>> Do we need to wait for all the pending descriptors to be completed here?
>>>>
>>> No, this function does not wait, it only completes the forwarding of
>>> the *used* descriptors.
>>>
>>> The best example is the net rx queue in my opinion. This call will
>>> check SVQ's vring used_idx and will forward the last used descriptors
>>> if any, but all available descriptors will remain as available for
>>> qemu's VQ code.
>>>
>>> To skip it would miss those last rx descriptors in migration.
>>>
>>> Thanks!
>>
>> So it's probably to not the best place to ask. It's more about the
>> inflight descriptors so it should be TX instead of RX.
>>
>> I can imagine the migration last phase, we should stop the vhost-vDPA
>> before calling vhost_svq_stop(). Then we should be fine regardless of
>> inflight descriptors.
>>
> I think I'm still missing something here.
>
> To be on the same page. Regarding tx this could cause repeated tx
> frames (one at source and other at destination), but never a missed
> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> is not forwarding available buffers anymore". Would that work?


Right, but this only work if

1) a flush to make sure TX DMA for inflight descriptors are all completed

2) just mark all inflight descriptor used

Otherwise there could be buffers that is inflight forever.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>> Thanks
>>>>
>>>>
>>>>> +
>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>>> +        g_autofree VirtQueueElement *elem = NULL;
>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>>>> +        if (elem) {
>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>>>> +    if (next_avail_elem) {
>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>>>> +                                 next_avail_elem->len);
>>>>> +    }
>>>>>     }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-02-21  7:43             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-21  7:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>     {
>>>>>         event_notifier_set_handler(&svq->svq_kick, NULL);
>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>>>> +
>>>>> +    if (!svq->vq) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    /* Send all pending used descriptors to guest */
>>>>> +    vhost_svq_flush(svq, false);
>>>> Do we need to wait for all the pending descriptors to be completed here?
>>>>
>>> No, this function does not wait, it only completes the forwarding of
>>> the *used* descriptors.
>>>
>>> The best example is the net rx queue in my opinion. This call will
>>> check SVQ's vring used_idx and will forward the last used descriptors
>>> if any, but all available descriptors will remain as available for
>>> qemu's VQ code.
>>>
>>> To skip it would miss those last rx descriptors in migration.
>>>
>>> Thanks!
>>
>> So it's probably to not the best place to ask. It's more about the
>> inflight descriptors so it should be TX instead of RX.
>>
>> I can imagine the migration last phase, we should stop the vhost-vDPA
>> before calling vhost_svq_stop(). Then we should be fine regardless of
>> inflight descriptors.
>>
> I think I'm still missing something here.
>
> To be on the same page. Regarding tx this could cause repeated tx
> frames (one at source and other at destination), but never a missed
> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> is not forwarding available buffers anymore". Would that work?


Right, but this only work if

1) a flush to make sure TX DMA for inflight descriptors are all completed

2) just mark all inflight descriptor used

Otherwise there could be buffers that is inflight forever.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>> Thanks
>>>>
>>>>
>>>>> +
>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>>> +        g_autofree VirtQueueElement *elem = NULL;
>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>>>> +        if (elem) {
>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>>>> +    if (next_avail_elem) {
>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>>>> +                                 next_avail_elem->len);
>>>>> +    }
>>>>>     }



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-02-21  7:39             ` Jason Wang
  (?)
@ 2022-02-21  8:01             ` Eugenio Perez Martin
  2022-02-22  7:18                 ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-21  8:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Mon, Feb 21, 2022 at 8:39 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
> > On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
> >>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>>>> ---
> >>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
> >>>>>     1 file changed, 18 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>>> index 18de14f0fb..029f98feee 100644
> >>>>> --- a/hw/virtio/vhost-vdpa.c
> >>>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >>>>> -                                       struct vhost_vring_file *file)
> >>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> >>>>> +                                         struct vhost_vring_file *file)
> >>>>>     {
> >>>>>         trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> >>>>>         return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> >>>>>     }
> >>>>>
> >>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >>>>> +                                     struct vhost_vring_file *file)
> >>>>> +{
> >>>>> +    struct vhost_vdpa *v = dev->opaque;
> >>>>> +
> >>>>> +    if (v->shadow_vqs_enabled) {
> >>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> >>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> >>>>> +
> >>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
> >>>> Two questions here (had similar questions for vring kick):
> >>>>
> >>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
> >>>> vhost_vdpa_svq_setup() not here?
> >>>>
> >>> I'm not sure what you mean.
> >>>
> >>> The guest->SVQ call and kick fds are set here and at
> >>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
> >>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
> >>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
> >>> notifier handler since we don't poll it.
> >>>
> >>> On the other hand, the connection SVQ <-> device uses the same fds
> >>> from the beginning to the end, and they will not change with, for
> >>> example, call fd masking. That's why it's setup from
> >>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
> >>> us add way more logic there.
> >>
> >> More logic in general shadow vq code but less codes for vhost-vdpa
> >> specific code I think.
> >>
> >> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
> >> here.
> >>
> > But they are different fds. vhost_vdpa_svq_set_fds sets the
> > SVQ<->device. This function sets the SVQ->guest call file descriptor.
> >
> > To move the logic of vhost_vdpa_svq_set_fds here would imply either:
> > a) Logic to know if we are receiving the first call fd or not.
>
>
> Any reason for this? I guess you meant multiqueue. If yes, it should not
> be much difference since we have idx as the parameter.
>

With "first call fd" I meant "first time we receive the call fd", so
we only set them once.

I think this is going to be easier if I prepare a patch doing your way
and we comment on it.

>
> >   That
> > code is not in the series at the moment, because setting at
> > vhost_vdpa_dev_start tells the difference for free. Is just adding
> > code, not moving.
> > b) Logic to set again *the same* file descriptor to device, with logic
> > to tell if we have missed calls. That logic is not implemented for
> > device->SVQ call file descriptor, because we are assuming it never
> > changes from vhost_vdpa_svq_set_fds. So this is again adding code.
> >
> > At this moment, we have:
> > vhost_vdpa_svq_set_fds:
> >    set SVQ<->device fds
> >
> > vhost_vdpa_set_vring_call:
> >    set guest<-SVQ call
> >
> > vhost_vdpa_set_vring_kick:
> >    set guest->SVQ kick.
> >
> > If I understood correctly, the alternative would be something like:
> > vhost_vdpa_set_vring_call:
> >    set guest<-SVQ call
> >    if(!vq->call_set) {
> >      - set SVQ<-device call.
> >      - vq->call_set = true
> >    }
> >
> > vhost_vdpa_set_vring_kick:
> >    set guest<-SVQ call
> >    if(!vq->dev_kick_set) {
> >      - set guest->device kick.
> >      - vq->dev_kick_set = true
> >    }
> >
> > dev_reset / dev_stop:
> > for vq in vqs:
> >    vq->dev_kick_set = vq->dev_call_set = false
> > ...
> >
> > Or have I misunderstood something?
>
>
> I wonder what happens if MSI-X is masking in guest. So if I understand
> correctly, we don't disable the eventfd from device? If yes, this seems
> suboptinal.
>

We cannot disable the device's call fd unless SVQ actively poll it. As
I see it, if the guest masks the call fd, it could be because:
a) it doesn't want to receive more calls because is processing buffers
b) Is going to burn a cpu to poll it.

The masking only affects SVQ->guest call. If we also mask device->SVQ,
we're adding latency in the case a), and we're effectively disabling
forwarding in case b).

It only works if guest is effectively not interested in calls because
is not going to retire used buffers, but in that case it doesn't hurt
to simply maintain the device->call fd, the eventfds are going to be
silent anyway.

Thanks!

> Thanks
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>
> >>>> 2) The call could be disabled by using -1 as the fd, I don't see any
> >>>> code to deal with that.
> >>>>
> >>> Right, I didn't take that into account. vhost-kernel takes also -1 as
> >>> kick_fd to unbind, so SVQ can be reworked to take that into account
> >>> for sure.
> >>>
> >>> Thanks!
> >>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>> +        return 0;
> >>>>> +    } else {
> >>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>>     /**
> >>>>>      * Set shadow virtqueue descriptors to the device
> >>>>>      *
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-21  7:43             ` Jason Wang
  (?)
@ 2022-02-21  8:15             ` Eugenio Perez Martin
  2022-02-22  7:26                 ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-21  8:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> > On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> >>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>>>     {
> >>>>>         event_notifier_set_handler(&svq->svq_kick, NULL);
> >>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> >>>>> +
> >>>>> +    if (!svq->vq) {
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>> +    /* Send all pending used descriptors to guest */
> >>>>> +    vhost_svq_flush(svq, false);
> >>>> Do we need to wait for all the pending descriptors to be completed here?
> >>>>
> >>> No, this function does not wait, it only completes the forwarding of
> >>> the *used* descriptors.
> >>>
> >>> The best example is the net rx queue in my opinion. This call will
> >>> check SVQ's vring used_idx and will forward the last used descriptors
> >>> if any, but all available descriptors will remain as available for
> >>> qemu's VQ code.
> >>>
> >>> To skip it would miss those last rx descriptors in migration.
> >>>
> >>> Thanks!
> >>
> >> So it's probably to not the best place to ask. It's more about the
> >> inflight descriptors so it should be TX instead of RX.
> >>
> >> I can imagine the migration last phase, we should stop the vhost-vDPA
> >> before calling vhost_svq_stop(). Then we should be fine regardless of
> >> inflight descriptors.
> >>
> > I think I'm still missing something here.
> >
> > To be on the same page. Regarding tx this could cause repeated tx
> > frames (one at source and other at destination), but never a missed
> > buffer not transmitted. The "stop before" could be interpreted as "SVQ
> > is not forwarding available buffers anymore". Would that work?
>
>
> Right, but this only work if
>
> 1) a flush to make sure TX DMA for inflight descriptors are all completed
>
> 2) just mark all inflight descriptor used
>

It currently trusts on the reverse: Buffers not marked as used (by the
device) will be available in the destination, so expect
retransmissions.

Thanks!

> Otherwise there could be buffers that is inflight forever.
>
> Thanks
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>
> >>>> Thanks
> >>>>
> >>>>
> >>>>> +
> >>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> >>>>> +        g_autofree VirtQueueElement *elem = NULL;
> >>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> >>>>> +        if (elem) {
> >>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> >>>>> +    if (next_avail_elem) {
> >>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> >>>>> +                                 next_avail_elem->len);
> >>>>> +    }
> >>>>>     }
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-02-21  7:15             ` Jason Wang
  (?)
@ 2022-02-21 17:22             ` Eugenio Perez Martin
  2022-02-22  3:16                 ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-21 17:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> >>>>> this is effectively dead code at the moment, but it helps to reduce
> >>>>> patch size.
> >>>>>
> >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>>>> ---
> >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> >>>>>
> >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> >>>>> index 035207a469..39aef5ffdf 100644
> >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >>>>>
> >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >>>>>
> >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> >>>>>
> >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> >>>>>
> >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> >>>>> index f129ec8395..7c168075d7 100644
> >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>>>     /**
> >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> >>>>>      * methods and file descriptors.
> >>>>> + *
> >>>>> + * @qsize Shadow VirtQueue size
> >>>>> + *
> >>>>> + * Returns the new virtqueue or NULL.
> >>>>> + *
> >>>>> + * In case of error, reason is reported through error_report.
> >>>>>      */
> >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >>>>>     {
> >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> >>>>> +    size_t device_size, driver_size;
> >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> >>>>>         int r;
> >>>>>
> >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> >>>>>
> >>>>> +    svq->vring.num = qsize;
> >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> >>>> queue size. So this will probably end up with:
> >>>>
> >>>> 1) SVQ use 32K queue size
> >>>> 2) hardware queue uses 256
> >>>>
> >>> In that case SVQ vring queue size will be 32K and guest's vring can
> >>> negotiate any number with SVQ equal or less than 32K,
> >>
> >> Sorry for being unclear what I meant is actually
> >>
> >> 1) SVQ uses 32K queue size
> >>
> >> 2) guest vq uses 256
> >>
> >> This looks like a burden that needs extra logic and may damage the
> >> performance.
> >>
> > Still not getting this point.
> >
> > An available guest buffer, although contiguous in GPA/GVA, can expand
> > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > "plenty" of SVQ buffers.
>
>
> Yes, but this case should be rare. So in this case we should deal with
> overrun on SVQ, that is
>
> 1) SVQ is full
> 2) guest VQ isn't
>
> We need to
>
> 1) check the available buffer slots
> 2) disable guest kick and wait for the used buffers
>
> But it looks to me the current code is not ready for dealing with this case?
>

Yes it deals, that's the meaning of svq->next_guest_avail_elem.

>
> >
> > I'm ok if we decide to put an upper limit though, or if we decide not
> > to handle this situation. But we would leave out valid virtio drivers.
> > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > (x-svq-size-n=N)?
> >
> > If you mean we lose performance because memory gets more sparse I
> > think the only possibility is to limit that way.
>
>
> If guest is not using 32K, having a 32K for svq may gives extra stress
> on the cache since we will end up with a pretty large working set.
>

That might be true. My guess is that it should not matter, since SVQ
and the guest's vring will have the same numbers of scattered buffers
and the avail / used / packed ring will be consumed more or less
sequentially. But I haven't tested.

I think it's better to add an upper limit (either fixed or in the
qemu's backend's cmdline) later if we see that this is a problem.
Another solution now would be to get the number from the frontend
device cmdline instead of from the vdpa device. I'm ok with that, but
it doesn't delete the svq->next_guest_avail_elem processing, and it
comes with disadvantages in my opinion. More below.

>
> >
> >> And this can lead other interesting situation:
> >>
> >> 1) SVQ uses 256
> >>
> >> 2) guest vq uses 1024
> >>
> >> Where a lot of more SVQ logic is needed.
> >>
> > If we agree that a guest descriptor can expand in multiple SVQ
> > descriptors, this should be already handled by the previous logic too.
> >
> > But this should only happen in case that qemu is launched with a "bad"
> > cmdline, isn't it?
>
>
> This seems can happen when we use -device
> virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
>

I'm going to use the rx queue here since it's more accurate, tx has
its own limit separately.

If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
SVQ, L0 qemu will happily accept 1024 as size when L1 qemu writes that
value at vhost_virtqueue_start. I'm not sure what would happen with a
real device, my guess is that the device will fail somehow. That's
what I meant with a "bad cmdline", I should have been more specific.

If we add SVQ to the mix, the guest first negotiates the 1024 with the
qemu device model. After that, vhost.c will try to write 1024 too but
this is totally ignored by this patch's changes at
vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
the device, since it's the read value from the device, leading to your
scenario. So SVQ effectively isolates both sides and makes possible
the communication, even with a device that does not support so many
descriptors.

But SVQ already handles this case: It's the same as if the buffers are
fragmented in HVA and queue size is equal at both sides. That's why I
think SVQ size should depend on the backend device's size, not
frontend cmdline.

Thanks!

>
> >
> > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > as a queue size [2]. But if the vdpa device maximum queue size is
> > effectively 256, this will result in an error: We're not exposing it
> > to the guest at any moment but with qemu's cmdline.
> >
> >>> including 256.
> >>> Is that what you mean?
> >>
> >> I mean, it looks to me the logic will be much more simplified if we just
> >> allocate the shadow virtqueue with the size what guest can see (guest
> >> vring).
> >>
> >> Then we don't need to think if the difference of the queue size can have
> >> any side effects.
> >>
> > I think that we cannot avoid that extra logic unless we force GPA to
> > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > more than one descriptor in SVQ, then yes, we can simplify things. If
> > not, I think we are forced to carry all of it.
>
>
> Yes, I agree, the code should be robust to handle any case.
>
> Thanks
>
>
> >
> > But if we prove it I'm not opposed to simplifying things and making
> > head at SVQ == head at guest.
> >
> > Thanks!
> >
> > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > [2] But that's not the whole story: I've been running limited in tx
> > descriptors because of virtio_net_max_tx_queue_size, which predates
> > vdpa. I'll send a patch to also un-limit it.
> >
> >>> If with hardware queues you mean guest's vring, not sure why it is
> >>> "probably 256". I'd say that in that case with the virtio-net kernel
> >>> driver the ring size will be the same as the device export, for
> >>> example, isn't it?
> >>>
> >>> The implementation should support any combination of sizes, but the
> >>> ring size exposed to the guest is never bigger than hardware one.
> >>>
> >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> >>>> to add event index support?
> >>>>
> >>> I think we should not have any problem with event idx. If you mean
> >>> that the guest could mark more buffers available than SVQ vring's
> >>> size, that should not happen because there must be less entries in the
> >>> guest than SVQ.
> >>>
> >>> But if I understood you correctly, a similar situation could happen if
> >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> >>> Even if that would happen, the situation should be ok too: SVQ knows
> >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> >>> avail buffers when the device uses more buffers.
> >>>
> >>> Does that make sense to you?
> >>
> >> Yes.
> >>
> >> Thanks
> >>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-02-21 17:22             ` Eugenio Perez Martin
@ 2022-02-22  3:16                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  3:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > >>>>> patch size.
> > >>>>>
> > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>>> ---
> > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > >>>>>
> > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> index 035207a469..39aef5ffdf 100644
> > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>>>
> > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>>>
> > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>>>>
> > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > >>>>>
> > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> index f129ec8395..7c168075d7 100644
> > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>     /**
> > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > >>>>>      * methods and file descriptors.
> > >>>>> + *
> > >>>>> + * @qsize Shadow VirtQueue size
> > >>>>> + *
> > >>>>> + * Returns the new virtqueue or NULL.
> > >>>>> + *
> > >>>>> + * In case of error, reason is reported through error_report.
> > >>>>>      */
> > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > >>>>>     {
> > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > >>>>> +    size_t device_size, driver_size;
> > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > >>>>>         int r;
> > >>>>>
> > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > >>>>>
> > >>>>> +    svq->vring.num = qsize;
> > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > >>>> queue size. So this will probably end up with:
> > >>>>
> > >>>> 1) SVQ use 32K queue size
> > >>>> 2) hardware queue uses 256
> > >>>>
> > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > >>> negotiate any number with SVQ equal or less than 32K,
> > >>
> > >> Sorry for being unclear what I meant is actually
> > >>
> > >> 1) SVQ uses 32K queue size
> > >>
> > >> 2) guest vq uses 256
> > >>
> > >> This looks like a burden that needs extra logic and may damage the
> > >> performance.
> > >>
> > > Still not getting this point.
> > >
> > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > "plenty" of SVQ buffers.
> >
> >
> > Yes, but this case should be rare. So in this case we should deal with
> > overrun on SVQ, that is
> >
> > 1) SVQ is full
> > 2) guest VQ isn't
> >
> > We need to
> >
> > 1) check the available buffer slots
> > 2) disable guest kick and wait for the used buffers
> >
> > But it looks to me the current code is not ready for dealing with this case?
> >
>
> Yes it deals, that's the meaning of svq->next_guest_avail_elem.

Oh right, I missed that.

>
> >
> > >
> > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > to handle this situation. But we would leave out valid virtio drivers.
> > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > (x-svq-size-n=N)?
> > >
> > > If you mean we lose performance because memory gets more sparse I
> > > think the only possibility is to limit that way.
> >
> >
> > If guest is not using 32K, having a 32K for svq may gives extra stress
> > on the cache since we will end up with a pretty large working set.
> >
>
> That might be true. My guess is that it should not matter, since SVQ
> and the guest's vring will have the same numbers of scattered buffers
> and the avail / used / packed ring will be consumed more or less
> sequentially. But I haven't tested.
>
> I think it's better to add an upper limit (either fixed or in the
> qemu's backend's cmdline) later if we see that this is a problem.

I'd suggest using the same size as what the guest saw.

> Another solution now would be to get the number from the frontend
> device cmdline instead of from the vdpa device. I'm ok with that, but
> it doesn't delete the svq->next_guest_avail_elem processing, and it
> comes with disadvantages in my opinion. More below.

Right, we should keep next_guest_avail_elem. Using the same queue size
is a balance between:

1) using next_guest_avail_elem (rare)
2) not give too much stress on the cache

>
> >
> > >
> > >> And this can lead other interesting situation:
> > >>
> > >> 1) SVQ uses 256
> > >>
> > >> 2) guest vq uses 1024
> > >>
> > >> Where a lot of more SVQ logic is needed.
> > >>
> > > If we agree that a guest descriptor can expand in multiple SVQ
> > > descriptors, this should be already handled by the previous logic too.
> > >
> > > But this should only happen in case that qemu is launched with a "bad"
> > > cmdline, isn't it?
> >
> >
> > This seems can happen when we use -device
> > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> >
>
> I'm going to use the rx queue here since it's more accurate, tx has
> its own limit separately.
>
> If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> SVQ, L0 qemu will happily accept 1024 as size

Interesting, looks like a bug (I guess it works since you enable vhost?):

Per virtio-spec:

"""
Queue Size. On reset, specifies the maximum queue size supported by
the device. This can be modified by the driver to reduce memory
requirements. A 0 means the queue is unavailable.
"""

We can't increase the queue_size from 256 to 1024 actually. (Only
decrease is allowed).

> when L1 qemu writes that
> value at vhost_virtqueue_start. I'm not sure what would happen with a
> real device, my guess is that the device will fail somehow. That's
> what I meant with a "bad cmdline", I should have been more specific.

I should say that it's something that is probably unrelated to this
series but needs to be addressed.

>
> If we add SVQ to the mix, the guest first negotiates the 1024 with the
> qemu device model. After that, vhost.c will try to write 1024 too but
> this is totally ignored by this patch's changes at
> vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> the device, since it's the read value from the device, leading to your
> scenario. So SVQ effectively isolates both sides and makes possible
> the communication, even with a device that does not support so many
> descriptors.
>
> But SVQ already handles this case: It's the same as if the buffers are
> fragmented in HVA and queue size is equal at both sides. That's why I
> think SVQ size should depend on the backend device's size, not
> frontend cmdline.

Right.

Thanks

>
> Thanks!
>
> >
> > >
> > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > effectively 256, this will result in an error: We're not exposing it
> > > to the guest at any moment but with qemu's cmdline.
> > >
> > >>> including 256.
> > >>> Is that what you mean?
> > >>
> > >> I mean, it looks to me the logic will be much more simplified if we just
> > >> allocate the shadow virtqueue with the size what guest can see (guest
> > >> vring).
> > >>
> > >> Then we don't need to think if the difference of the queue size can have
> > >> any side effects.
> > >>
> > > I think that we cannot avoid that extra logic unless we force GPA to
> > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > not, I think we are forced to carry all of it.
> >
> >
> > Yes, I agree, the code should be robust to handle any case.
> >
> > Thanks
> >
> >
> > >
> > > But if we prove it I'm not opposed to simplifying things and making
> > > head at SVQ == head at guest.
> > >
> > > Thanks!
> > >
> > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > [2] But that's not the whole story: I've been running limited in tx
> > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > vdpa. I'll send a patch to also un-limit it.
> > >
> > >>> If with hardware queues you mean guest's vring, not sure why it is
> > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > >>> driver the ring size will be the same as the device export, for
> > >>> example, isn't it?
> > >>>
> > >>> The implementation should support any combination of sizes, but the
> > >>> ring size exposed to the guest is never bigger than hardware one.
> > >>>
> > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > >>>> to add event index support?
> > >>>>
> > >>> I think we should not have any problem with event idx. If you mean
> > >>> that the guest could mark more buffers available than SVQ vring's
> > >>> size, that should not happen because there must be less entries in the
> > >>> guest than SVQ.
> > >>>
> > >>> But if I understood you correctly, a similar situation could happen if
> > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > >>> avail buffers when the device uses more buffers.
> > >>>
> > >>> Does that make sense to you?
> > >>
> > >> Yes.
> > >>
> > >> Thanks
> > >>
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
@ 2022-02-22  3:16                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  3:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > >>>>> patch size.
> > >>>>>
> > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>>> ---
> > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > >>>>>
> > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> index 035207a469..39aef5ffdf 100644
> > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>>>
> > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>>>
> > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>>>>
> > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > >>>>>
> > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> index f129ec8395..7c168075d7 100644
> > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>     /**
> > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > >>>>>      * methods and file descriptors.
> > >>>>> + *
> > >>>>> + * @qsize Shadow VirtQueue size
> > >>>>> + *
> > >>>>> + * Returns the new virtqueue or NULL.
> > >>>>> + *
> > >>>>> + * In case of error, reason is reported through error_report.
> > >>>>>      */
> > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > >>>>>     {
> > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > >>>>> +    size_t device_size, driver_size;
> > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > >>>>>         int r;
> > >>>>>
> > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > >>>>>
> > >>>>> +    svq->vring.num = qsize;
> > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > >>>> queue size. So this will probably end up with:
> > >>>>
> > >>>> 1) SVQ use 32K queue size
> > >>>> 2) hardware queue uses 256
> > >>>>
> > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > >>> negotiate any number with SVQ equal or less than 32K,
> > >>
> > >> Sorry for being unclear what I meant is actually
> > >>
> > >> 1) SVQ uses 32K queue size
> > >>
> > >> 2) guest vq uses 256
> > >>
> > >> This looks like a burden that needs extra logic and may damage the
> > >> performance.
> > >>
> > > Still not getting this point.
> > >
> > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > "plenty" of SVQ buffers.
> >
> >
> > Yes, but this case should be rare. So in this case we should deal with
> > overrun on SVQ, that is
> >
> > 1) SVQ is full
> > 2) guest VQ isn't
> >
> > We need to
> >
> > 1) check the available buffer slots
> > 2) disable guest kick and wait for the used buffers
> >
> > But it looks to me the current code is not ready for dealing with this case?
> >
>
> Yes it deals, that's the meaning of svq->next_guest_avail_elem.

Oh right, I missed that.

>
> >
> > >
> > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > to handle this situation. But we would leave out valid virtio drivers.
> > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > (x-svq-size-n=N)?
> > >
> > > If you mean we lose performance because memory gets more sparse I
> > > think the only possibility is to limit that way.
> >
> >
> > If guest is not using 32K, having a 32K for svq may gives extra stress
> > on the cache since we will end up with a pretty large working set.
> >
>
> That might be true. My guess is that it should not matter, since SVQ
> and the guest's vring will have the same numbers of scattered buffers
> and the avail / used / packed ring will be consumed more or less
> sequentially. But I haven't tested.
>
> I think it's better to add an upper limit (either fixed or in the
> qemu's backend's cmdline) later if we see that this is a problem.

I'd suggest using the same size as what the guest saw.

> Another solution now would be to get the number from the frontend
> device cmdline instead of from the vdpa device. I'm ok with that, but
> it doesn't delete the svq->next_guest_avail_elem processing, and it
> comes with disadvantages in my opinion. More below.

Right, we should keep next_guest_avail_elem. Using the same queue size
is a balance between:

1) using next_guest_avail_elem (rare)
2) not give too much stress on the cache

>
> >
> > >
> > >> And this can lead other interesting situation:
> > >>
> > >> 1) SVQ uses 256
> > >>
> > >> 2) guest vq uses 1024
> > >>
> > >> Where a lot of more SVQ logic is needed.
> > >>
> > > If we agree that a guest descriptor can expand in multiple SVQ
> > > descriptors, this should be already handled by the previous logic too.
> > >
> > > But this should only happen in case that qemu is launched with a "bad"
> > > cmdline, isn't it?
> >
> >
> > This seems can happen when we use -device
> > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> >
>
> I'm going to use the rx queue here since it's more accurate, tx has
> its own limit separately.
>
> If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> SVQ, L0 qemu will happily accept 1024 as size

Interesting, looks like a bug (I guess it works since you enable vhost?):

Per virtio-spec:

"""
Queue Size. On reset, specifies the maximum queue size supported by
the device. This can be modified by the driver to reduce memory
requirements. A 0 means the queue is unavailable.
"""

We can't increase the queue_size from 256 to 1024 actually. (Only
decrease is allowed).

> when L1 qemu writes that
> value at vhost_virtqueue_start. I'm not sure what would happen with a
> real device, my guess is that the device will fail somehow. That's
> what I meant with a "bad cmdline", I should have been more specific.

I should say that it's something that is probably unrelated to this
series but needs to be addressed.

>
> If we add SVQ to the mix, the guest first negotiates the 1024 with the
> qemu device model. After that, vhost.c will try to write 1024 too but
> this is totally ignored by this patch's changes at
> vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> the device, since it's the read value from the device, leading to your
> scenario. So SVQ effectively isolates both sides and makes possible
> the communication, even with a device that does not support so many
> descriptors.
>
> But SVQ already handles this case: It's the same as if the buffers are
> fragmented in HVA and queue size is equal at both sides. That's why I
> think SVQ size should depend on the backend device's size, not
> frontend cmdline.

Right.

Thanks

>
> Thanks!
>
> >
> > >
> > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > effectively 256, this will result in an error: We're not exposing it
> > > to the guest at any moment but with qemu's cmdline.
> > >
> > >>> including 256.
> > >>> Is that what you mean?
> > >>
> > >> I mean, it looks to me the logic will be much more simplified if we just
> > >> allocate the shadow virtqueue with the size what guest can see (guest
> > >> vring).
> > >>
> > >> Then we don't need to think if the difference of the queue size can have
> > >> any side effects.
> > >>
> > > I think that we cannot avoid that extra logic unless we force GPA to
> > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > not, I think we are forced to carry all of it.
> >
> >
> > Yes, I agree, the code should be robust to handle any case.
> >
> > Thanks
> >
> >
> > >
> > > But if we prove it I'm not opposed to simplifying things and making
> > > head at SVQ == head at guest.
> > >
> > > Thanks!
> > >
> > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > [2] But that's not the whole story: I've been running limited in tx
> > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > vdpa. I'll send a patch to also un-limit it.
> > >
> > >>> If with hardware queues you mean guest's vring, not sure why it is
> > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > >>> driver the ring size will be the same as the device export, for
> > >>> example, isn't it?
> > >>>
> > >>> The implementation should support any combination of sizes, but the
> > >>> ring size exposed to the guest is never bigger than hardware one.
> > >>>
> > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > >>>> to add event index support?
> > >>>>
> > >>> I think we should not have any problem with event idx. If you mean
> > >>> that the guest could mark more buffers available than SVQ vring's
> > >>> size, that should not happen because there must be less entries in the
> > >>> guest than SVQ.
> > >>>
> > >>> But if I understood you correctly, a similar situation could happen if
> > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > >>> avail buffers when the device uses more buffers.
> > >>>
> > >>> Does that make sense to you?
> > >>
> > >> Yes.
> > >>
> > >> Thanks
> > >>
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
  2022-02-21  8:01             ` Eugenio Perez Martin
@ 2022-02-22  7:18                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/21 下午4:01, Eugenio Perez Martin 写道:
> On Mon, Feb 21, 2022 at 8:39 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
>>> On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
>>>>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> ---
>>>>>>>      hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>>>>>      1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index 18de14f0fb..029f98feee 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>>>>>          }
>>>>>>>      }
>>>>>>>
>>>>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>>>> -                                       struct vhost_vring_file *file)
>>>>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>>>>>> +                                         struct vhost_vring_file *file)
>>>>>>>      {
>>>>>>>          trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>>>>>          return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>>>>>      }
>>>>>>>
>>>>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>>>> +                                     struct vhost_vring_file *file)
>>>>>>> +{
>>>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>>>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>>>>>> +
>>>>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>>>>> Two questions here (had similar questions for vring kick):
>>>>>>
>>>>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>>>>>> vhost_vdpa_svq_setup() not here?
>>>>>>
>>>>> I'm not sure what you mean.
>>>>>
>>>>> The guest->SVQ call and kick fds are set here and at
>>>>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
>>>>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
>>>>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
>>>>> notifier handler since we don't poll it.
>>>>>
>>>>> On the other hand, the connection SVQ <-> device uses the same fds
>>>>> from the beginning to the end, and they will not change with, for
>>>>> example, call fd masking. That's why it's setup from
>>>>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
>>>>> us add way more logic there.
>>>> More logic in general shadow vq code but less codes for vhost-vdpa
>>>> specific code I think.
>>>>
>>>> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
>>>> here.
>>>>
>>> But they are different fds. vhost_vdpa_svq_set_fds sets the
>>> SVQ<->device. This function sets the SVQ->guest call file descriptor.
>>>
>>> To move the logic of vhost_vdpa_svq_set_fds here would imply either:
>>> a) Logic to know if we are receiving the first call fd or not.
>>
>> Any reason for this? I guess you meant multiqueue. If yes, it should not
>> be much difference since we have idx as the parameter.
>>
> With "first call fd" I meant "first time we receive the call fd", so
> we only set them once.
>
> I think this is going to be easier if I prepare a patch doing your way
> and we comment on it.


That would be helpful but if there's no issue with current code (see 
below), we can leave it as is and do optimization on top.


>
>>>    That
>>> code is not in the series at the moment, because setting at
>>> vhost_vdpa_dev_start tells the difference for free. Is just adding
>>> code, not moving.
>>> b) Logic to set again *the same* file descriptor to device, with logic
>>> to tell if we have missed calls. That logic is not implemented for
>>> device->SVQ call file descriptor, because we are assuming it never
>>> changes from vhost_vdpa_svq_set_fds. So this is again adding code.
>>>
>>> At this moment, we have:
>>> vhost_vdpa_svq_set_fds:
>>>     set SVQ<->device fds
>>>
>>> vhost_vdpa_set_vring_call:
>>>     set guest<-SVQ call
>>>
>>> vhost_vdpa_set_vring_kick:
>>>     set guest->SVQ kick.
>>>
>>> If I understood correctly, the alternative would be something like:
>>> vhost_vdpa_set_vring_call:
>>>     set guest<-SVQ call
>>>     if(!vq->call_set) {
>>>       - set SVQ<-device call.
>>>       - vq->call_set = true
>>>     }
>>>
>>> vhost_vdpa_set_vring_kick:
>>>     set guest<-SVQ call
>>>     if(!vq->dev_kick_set) {
>>>       - set guest->device kick.
>>>       - vq->dev_kick_set = true
>>>     }
>>>
>>> dev_reset / dev_stop:
>>> for vq in vqs:
>>>     vq->dev_kick_set = vq->dev_call_set = false
>>> ...
>>>
>>> Or have I misunderstood something?
>>
>> I wonder what happens if MSI-X is masking in guest. So if I understand
>> correctly, we don't disable the eventfd from device? If yes, this seems
>> suboptinal.
>>
> We cannot disable the device's call fd unless SVQ actively poll it. As
> I see it, if the guest masks the call fd, it could be because:
> a) it doesn't want to receive more calls because is processing buffers
> b) Is going to burn a cpu to poll it.
>
> The masking only affects SVQ->guest call. If we also mask device->SVQ,
> we're adding latency in the case a), and we're effectively disabling
> forwarding in case b).


Right, so we need leave a comment to explain this, then I'm totally fine 
with this approach.


>
> It only works if guest is effectively not interested in calls because
> is not going to retire used buffers, but in that case it doesn't hurt
> to simply maintain the device->call fd, the eventfds are going to be
> silent anyway.
>
> Thanks!


Yes.

Thanks


>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> 2) The call could be disabled by using -1 as the fd, I don't see any
>>>>>> code to deal with that.
>>>>>>
>>>>> Right, I didn't take that into account. vhost-kernel takes also -1 as
>>>>> kick_fd to unbind, so SVQ can be reworked to take that into account
>>>>> for sure.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +        return 0;
>>>>>>> +    } else {
>>>>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>>      /**
>>>>>>>       * Set shadow virtqueue descriptors to the device
>>>>>>>       *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
@ 2022-02-22  7:18                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/21 下午4:01, Eugenio Perez Martin 写道:
> On Mon, Feb 21, 2022 at 8:39 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
>>> On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
>>>>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> ---
>>>>>>>      hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>>>>>      1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index 18de14f0fb..029f98feee 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>>>>>          }
>>>>>>>      }
>>>>>>>
>>>>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>>>> -                                       struct vhost_vring_file *file)
>>>>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>>>>>> +                                         struct vhost_vring_file *file)
>>>>>>>      {
>>>>>>>          trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>>>>>          return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>>>>>      }
>>>>>>>
>>>>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>>>> +                                     struct vhost_vring_file *file)
>>>>>>> +{
>>>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>>>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>>>>>> +
>>>>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>>>>> Two questions here (had similar questions for vring kick):
>>>>>>
>>>>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>>>>>> vhost_vdpa_svq_setup() not here?
>>>>>>
>>>>> I'm not sure what you mean.
>>>>>
>>>>> The guest->SVQ call and kick fds are set here and at
>>>>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
>>>>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
>>>>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
>>>>> notifier handler since we don't poll it.
>>>>>
>>>>> On the other hand, the connection SVQ <-> device uses the same fds
>>>>> from the beginning to the end, and they will not change with, for
>>>>> example, call fd masking. That's why it's setup from
>>>>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
>>>>> us add way more logic there.
>>>> More logic in general shadow vq code but less codes for vhost-vdpa
>>>> specific code I think.
>>>>
>>>> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
>>>> here.
>>>>
>>> But they are different fds. vhost_vdpa_svq_set_fds sets the
>>> SVQ<->device. This function sets the SVQ->guest call file descriptor.
>>>
>>> To move the logic of vhost_vdpa_svq_set_fds here would imply either:
>>> a) Logic to know if we are receiving the first call fd or not.
>>
>> Any reason for this? I guess you meant multiqueue. If yes, it should not
>> be much difference since we have idx as the parameter.
>>
> With "first call fd" I meant "first time we receive the call fd", so
> we only set them once.
>
> I think this is going to be easier if I prepare a patch doing your way
> and we comment on it.


That would be helpful but if there's no issue with current code (see 
below), we can leave it as is and do optimization on top.


>
>>>    That
>>> code is not in the series at the moment, because setting at
>>> vhost_vdpa_dev_start tells the difference for free. Is just adding
>>> code, not moving.
>>> b) Logic to set again *the same* file descriptor to device, with logic
>>> to tell if we have missed calls. That logic is not implemented for
>>> device->SVQ call file descriptor, because we are assuming it never
>>> changes from vhost_vdpa_svq_set_fds. So this is again adding code.
>>>
>>> At this moment, we have:
>>> vhost_vdpa_svq_set_fds:
>>>     set SVQ<->device fds
>>>
>>> vhost_vdpa_set_vring_call:
>>>     set guest<-SVQ call
>>>
>>> vhost_vdpa_set_vring_kick:
>>>     set guest->SVQ kick.
>>>
>>> If I understood correctly, the alternative would be something like:
>>> vhost_vdpa_set_vring_call:
>>>     set guest<-SVQ call
>>>     if(!vq->call_set) {
>>>       - set SVQ<-device call.
>>>       - vq->call_set = true
>>>     }
>>>
>>> vhost_vdpa_set_vring_kick:
>>>     set guest<-SVQ call
>>>     if(!vq->dev_kick_set) {
>>>       - set guest->device kick.
>>>       - vq->dev_kick_set = true
>>>     }
>>>
>>> dev_reset / dev_stop:
>>> for vq in vqs:
>>>     vq->dev_kick_set = vq->dev_call_set = false
>>> ...
>>>
>>> Or have I misunderstood something?
>>
>> I wonder what happens if MSI-X is masking in guest. So if I understand
>> correctly, we don't disable the eventfd from device? If yes, this seems
>> suboptinal.
>>
> We cannot disable the device's call fd unless SVQ actively poll it. As
> I see it, if the guest masks the call fd, it could be because:
> a) it doesn't want to receive more calls because is processing buffers
> b) Is going to burn a cpu to poll it.
>
> The masking only affects SVQ->guest call. If we also mask device->SVQ,
> we're adding latency in the case a), and we're effectively disabling
> forwarding in case b).


Right, so we need leave a comment to explain this, then I'm totally fine 
with this approach.


>
> It only works if guest is effectively not interested in calls because
> is not going to retire used buffers, but in that case it doesn't hurt
> to simply maintain the device->call fd, the eventfds are going to be
> silent anyway.
>
> Thanks!


Yes.

Thanks


>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> 2) The call could be disabled by using -1 as the fd, I don't see any
>>>>>> code to deal with that.
>>>>>>
>>>>> Right, I didn't take that into account. vhost-kernel takes also -1 as
>>>>> kick_fd to unbind, so SVQ can be reworked to take that into account
>>>>> for sure.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +        return 0;
>>>>>>> +    } else {
>>>>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>>      /**
>>>>>>>       * Set shadow virtqueue descriptors to the device
>>>>>>>       *



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-21  8:15             ` Eugenio Perez Martin
@ 2022-02-22  7:26                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:26 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
>>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
>>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>>>      {
>>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
>>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>>>>>> +
>>>>>>> +    if (!svq->vq) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* Send all pending used descriptors to guest */
>>>>>>> +    vhost_svq_flush(svq, false);
>>>>>> Do we need to wait for all the pending descriptors to be completed here?
>>>>>>
>>>>> No, this function does not wait, it only completes the forwarding of
>>>>> the *used* descriptors.
>>>>>
>>>>> The best example is the net rx queue in my opinion. This call will
>>>>> check SVQ's vring used_idx and will forward the last used descriptors
>>>>> if any, but all available descriptors will remain as available for
>>>>> qemu's VQ code.
>>>>>
>>>>> To skip it would miss those last rx descriptors in migration.
>>>>>
>>>>> Thanks!
>>>> So it's probably to not the best place to ask. It's more about the
>>>> inflight descriptors so it should be TX instead of RX.
>>>>
>>>> I can imagine the migration last phase, we should stop the vhost-vDPA
>>>> before calling vhost_svq_stop(). Then we should be fine regardless of
>>>> inflight descriptors.
>>>>
>>> I think I'm still missing something here.
>>>
>>> To be on the same page. Regarding tx this could cause repeated tx
>>> frames (one at source and other at destination), but never a missed
>>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
>>> is not forwarding available buffers anymore". Would that work?
>>
>> Right, but this only work if
>>
>> 1) a flush to make sure TX DMA for inflight descriptors are all completed
>>
>> 2) just mark all inflight descriptor used
>>
> It currently trusts on the reverse: Buffers not marked as used (by the
> device) will be available in the destination, so expect
> retransmissions.


I may miss something but I think we do migrate last_avail_idx. So there 
won't be a re-transmission, since we depend on qemu virtqueue code to 
deal with vring base?

Thanks


>
> Thanks!
>
>> Otherwise there could be buffers that is inflight forever.
>>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
>>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>>>>>> +        if (elem) {
>>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>>>>>> +    if (next_avail_elem) {
>>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>>>>>> +                                 next_avail_elem->len);
>>>>>>> +    }
>>>>>>>      }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-02-22  7:26                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:26 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
>>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
>>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>>>      {
>>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
>>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>>>>>> +
>>>>>>> +    if (!svq->vq) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* Send all pending used descriptors to guest */
>>>>>>> +    vhost_svq_flush(svq, false);
>>>>>> Do we need to wait for all the pending descriptors to be completed here?
>>>>>>
>>>>> No, this function does not wait, it only completes the forwarding of
>>>>> the *used* descriptors.
>>>>>
>>>>> The best example is the net rx queue in my opinion. This call will
>>>>> check SVQ's vring used_idx and will forward the last used descriptors
>>>>> if any, but all available descriptors will remain as available for
>>>>> qemu's VQ code.
>>>>>
>>>>> To skip it would miss those last rx descriptors in migration.
>>>>>
>>>>> Thanks!
>>>> So it's probably to not the best place to ask. It's more about the
>>>> inflight descriptors so it should be TX instead of RX.
>>>>
>>>> I can imagine the migration last phase, we should stop the vhost-vDPA
>>>> before calling vhost_svq_stop(). Then we should be fine regardless of
>>>> inflight descriptors.
>>>>
>>> I think I'm still missing something here.
>>>
>>> To be on the same page. Regarding tx this could cause repeated tx
>>> frames (one at source and other at destination), but never a missed
>>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
>>> is not forwarding available buffers anymore". Would that work?
>>
>> Right, but this only work if
>>
>> 1) a flush to make sure TX DMA for inflight descriptors are all completed
>>
>> 2) just mark all inflight descriptor used
>>
> It currently trusts on the reverse: Buffers not marked as used (by the
> device) will be available in the destination, so expect
> retransmissions.


I may miss something but I think we do migrate last_avail_idx. So there 
won't be a re-transmission, since we depend on qemu virtqueue code to 
deal with vring base?

Thanks


>
> Thanks!
>
>> Otherwise there could be buffers that is inflight forever.
>>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
>>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>>>>>> +        if (elem) {
>>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>>>>>> +    if (next_avail_elem) {
>>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>>>>>> +                                 next_avail_elem->len);
>>>>>>> +    }
>>>>>>>      }



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-17  8:22             ` Eugenio Perez Martin
@ 2022-02-22  7:41                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
>> <eperezma@redhat.com> wrote:
>>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
>>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
>>>>>>> block migration.
>>>>>>>
>>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
>>>>>>> enabled. Even if the device supports it, the reports would be nonsense
>>>>>>> because SVQ memory is in the qemu region.
>>>>>>>
>>>>>>> The log region is still allocated. Future changes might skip that, but
>>>>>>> this series is already long enough.
>>>>>>>
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> ---
>>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>>>>>>>     1 file changed, 20 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index fb0a338baa..75090d65e8 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
>>>>>>>             /* Filter only features that SVQ can offer to guest */
>>>>>>>             vhost_svq_valid_guest_features(features);
>>>>>>> +
>>>>>>> +        /* Add SVQ logging capabilities */
>>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>>>>>>>         }
>>>>>>>
>>>>>>>         return ret;
>>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>>>>>>
>>>>>>>         if (v->shadow_vqs_enabled) {
>>>>>>>             uint64_t dev_features, svq_features, acked_features;
>>>>>>> +        uint8_t status = 0;
>>>>>>>             bool ok;
>>>>>>>
>>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>>>>>> +        if (unlikely(ret)) {
>>>>>>> +            return ret;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>>>>>> +            /*
>>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
>>>>>>> +             * would report wrong dirty pages. SVQ handles it.
>>>>>>> +             */
>>>>>> I fail to understand this comment, I'd think there's no way to disable
>>>>>> dirty page tracking for SVQ.
>>>>>>
>>>>> vhost_log_global_{start,stop} are called at the beginning and end of
>>>>> migration. To inform the device that it should start logging, they set
>>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
>>>>
>>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
>>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
>>>> enabled and disabled.
>>>>
>>> Yes, that's what this patch does.
>>>
>>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
>>>>> vhost does not block migration. Maybe we need to look for another way
>>>>> to do this?
>>>>
>>>> I'm fine with filtering since it's much more simpler, but I fail to
>>>> understand why we need to check DRIVER_OK.
>>>>
>>> Ok maybe I can make that part more clear,
>>>
>>> Since both operations use vhost_vdpa_set_features we must just filter
>>> the one that actually sets or removes VHOST_F_LOG_ALL, without
>>> affecting other features.
>>>
>>> In practice, that means to not forward the set features after
>>> DRIVER_OK. The device is not expecting them anymore.
>> I wonder what happens if we don't do this.
>>
> If we simply delete the check vhost_dev_set_features will return an
> error, failing the start of the migration. More on this below.


Ok.


>
>> So kernel had this check:
>>
>>          /*
>>           * It's not allowed to change the features after they have
>>           * been negotiated.
>>           */
>> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
>>          return -EBUSY;
>>
>> So is it FEATURES_OK actually?
>>
> Yes, FEATURES_OK seems more appropriate actually so I will switch to
> it for the next version.
>
> But it should be functionally equivalent, since
> vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> be concurrent with it.


Right.


>
>> For this patch, I wonder if the thing we need to do is to see whether
>> it is a enable/disable F_LOG_ALL and simply return.
>>
> Yes, that's the intention of the patch.
>
> We have 4 cases here:
> a) We're being called from vhost_dev_start, with enable_log = false
> b) We're being called from vhost_dev_start, with enable_log = true


And this case makes us can't simply return without calling vhost-vdpa.


> c) We're being called from vhost_dev_set_log, with enable_log = false
> d) We're being called from vhost_dev_set_log, with enable_log = true
>
> The way to tell the difference between a/b and c/d is to check if
> {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> memory through the memory unmapping, so we clear the bit
> unconditionally if we detect that VHOST_SET_FEATURES will be called
> (cases a and b).
>
> Another possibility is to track if features have been set with a bool
> in vhost_vdpa or something like that. But it seems cleaner to me to
> only store that in the actual device.


So I suggest to make sure codes match the comment:

         if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
             /*
              * vhost is trying to enable or disable _F_LOG, and the device
              * would report wrong dirty pages. SVQ handles it.
              */
             return 0;
         }

It would be better to check whether the caller is toggling _F_LOG_ALL in 
this case.

Thanks


>
>> Thanks
>>
>>> Does that make more sense?
>>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +            return 0;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
>>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
>>>>>>> +
>>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>>>>>>>             if (ret != 0) {
>>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
@ 2022-02-22  7:41                 ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella


在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
>> <eperezma@redhat.com> wrote:
>>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
>>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
>>>>>>> block migration.
>>>>>>>
>>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
>>>>>>> enabled. Even if the device supports it, the reports would be nonsense
>>>>>>> because SVQ memory is in the qemu region.
>>>>>>>
>>>>>>> The log region is still allocated. Future changes might skip that, but
>>>>>>> this series is already long enough.
>>>>>>>
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> ---
>>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>>>>>>>     1 file changed, 20 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index fb0a338baa..75090d65e8 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
>>>>>>>             /* Filter only features that SVQ can offer to guest */
>>>>>>>             vhost_svq_valid_guest_features(features);
>>>>>>> +
>>>>>>> +        /* Add SVQ logging capabilities */
>>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>>>>>>>         }
>>>>>>>
>>>>>>>         return ret;
>>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>>>>>>
>>>>>>>         if (v->shadow_vqs_enabled) {
>>>>>>>             uint64_t dev_features, svq_features, acked_features;
>>>>>>> +        uint8_t status = 0;
>>>>>>>             bool ok;
>>>>>>>
>>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>>>>>> +        if (unlikely(ret)) {
>>>>>>> +            return ret;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>>>>>> +            /*
>>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
>>>>>>> +             * would report wrong dirty pages. SVQ handles it.
>>>>>>> +             */
>>>>>> I fail to understand this comment, I'd think there's no way to disable
>>>>>> dirty page tracking for SVQ.
>>>>>>
>>>>> vhost_log_global_{start,stop} are called at the beginning and end of
>>>>> migration. To inform the device that it should start logging, they set
>>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
>>>>
>>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
>>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
>>>> enabled and disabled.
>>>>
>>> Yes, that's what this patch does.
>>>
>>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
>>>>> vhost does not block migration. Maybe we need to look for another way
>>>>> to do this?
>>>>
>>>> I'm fine with filtering since it's much more simpler, but I fail to
>>>> understand why we need to check DRIVER_OK.
>>>>
>>> Ok maybe I can make that part more clear,
>>>
>>> Since both operations use vhost_vdpa_set_features we must just filter
>>> the one that actually sets or removes VHOST_F_LOG_ALL, without
>>> affecting other features.
>>>
>>> In practice, that means to not forward the set features after
>>> DRIVER_OK. The device is not expecting them anymore.
>> I wonder what happens if we don't do this.
>>
> If we simply delete the check vhost_dev_set_features will return an
> error, failing the start of the migration. More on this below.


Ok.


>
>> So kernel had this check:
>>
>>          /*
>>           * It's not allowed to change the features after they have
>>           * been negotiated.
>>           */
>> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
>>          return -EBUSY;
>>
>> So is it FEATURES_OK actually?
>>
> Yes, FEATURES_OK seems more appropriate actually so I will switch to
> it for the next version.
>
> But it should be functionally equivalent, since
> vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> be concurrent with it.


Right.


>
>> For this patch, I wonder if the thing we need to do is to see whether
>> it is a enable/disable F_LOG_ALL and simply return.
>>
> Yes, that's the intention of the patch.
>
> We have 4 cases here:
> a) We're being called from vhost_dev_start, with enable_log = false
> b) We're being called from vhost_dev_start, with enable_log = true


And this case makes us can't simply return without calling vhost-vdpa.


> c) We're being called from vhost_dev_set_log, with enable_log = false
> d) We're being called from vhost_dev_set_log, with enable_log = true
>
> The way to tell the difference between a/b and c/d is to check if
> {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> memory through the memory unmapping, so we clear the bit
> unconditionally if we detect that VHOST_SET_FEATURES will be called
> (cases a and b).
>
> Another possibility is to track if features have been set with a bool
> in vhost_vdpa or something like that. But it seems cleaner to me to
> only store that in the actual device.


So I suggest to make sure codes match the comment:

         if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
             /*
              * vhost is trying to enable or disable _F_LOG, and the device
              * would report wrong dirty pages. SVQ handles it.
              */
             return 0;
         }

It would be better to check whether the caller is toggling _F_LOG_ALL in 
this case.

Thanks


>
>> Thanks
>>
>>> Does that make more sense?
>>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +            return 0;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
>>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
>>>>>>> +
>>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>>>>>>>             if (ret != 0) {
>>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-02-22  3:16                 ` Jason Wang
  (?)
@ 2022-02-22  7:42                 ` Eugenio Perez Martin
  2022-02-22  7:59                     ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-22  7:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 4:16 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >>
> > > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > > >>>>> patch size.
> > > >>>>>
> > > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > >>>>> ---
> > > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > > >>>>>
> > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > >>>>> index 035207a469..39aef5ffdf 100644
> > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > > >>>>>
> > > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > > >>>>>
> > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > > >>>>>
> > > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > >>>>>
> > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > >>>>> index f129ec8395..7c168075d7 100644
> > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > > >>>>>     /**
> > > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > >>>>>      * methods and file descriptors.
> > > >>>>> + *
> > > >>>>> + * @qsize Shadow VirtQueue size
> > > >>>>> + *
> > > >>>>> + * Returns the new virtqueue or NULL.
> > > >>>>> + *
> > > >>>>> + * In case of error, reason is reported through error_report.
> > > >>>>>      */
> > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > > >>>>>     {
> > > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > > >>>>> +    size_t device_size, driver_size;
> > > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > >>>>>         int r;
> > > >>>>>
> > > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > > >>>>>
> > > >>>>> +    svq->vring.num = qsize;
> > > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > > >>>> queue size. So this will probably end up with:
> > > >>>>
> > > >>>> 1) SVQ use 32K queue size
> > > >>>> 2) hardware queue uses 256
> > > >>>>
> > > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > > >>> negotiate any number with SVQ equal or less than 32K,
> > > >>
> > > >> Sorry for being unclear what I meant is actually
> > > >>
> > > >> 1) SVQ uses 32K queue size
> > > >>
> > > >> 2) guest vq uses 256
> > > >>
> > > >> This looks like a burden that needs extra logic and may damage the
> > > >> performance.
> > > >>
> > > > Still not getting this point.
> > > >
> > > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > > "plenty" of SVQ buffers.
> > >
> > >
> > > Yes, but this case should be rare. So in this case we should deal with
> > > overrun on SVQ, that is
> > >
> > > 1) SVQ is full
> > > 2) guest VQ isn't
> > >
> > > We need to
> > >
> > > 1) check the available buffer slots
> > > 2) disable guest kick and wait for the used buffers
> > >
> > > But it looks to me the current code is not ready for dealing with this case?
> > >
> >
> > Yes it deals, that's the meaning of svq->next_guest_avail_elem.
>
> Oh right, I missed that.
>
> >
> > >
> > > >
> > > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > > to handle this situation. But we would leave out valid virtio drivers.
> > > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > > (x-svq-size-n=N)?
> > > >
> > > > If you mean we lose performance because memory gets more sparse I
> > > > think the only possibility is to limit that way.
> > >
> > >
> > > If guest is not using 32K, having a 32K for svq may gives extra stress
> > > on the cache since we will end up with a pretty large working set.
> > >
> >
> > That might be true. My guess is that it should not matter, since SVQ
> > and the guest's vring will have the same numbers of scattered buffers
> > and the avail / used / packed ring will be consumed more or less
> > sequentially. But I haven't tested.
> >
> > I think it's better to add an upper limit (either fixed or in the
> > qemu's backend's cmdline) later if we see that this is a problem.
>
> I'd suggest using the same size as what the guest saw.
>
> > Another solution now would be to get the number from the frontend
> > device cmdline instead of from the vdpa device. I'm ok with that, but
> > it doesn't delete the svq->next_guest_avail_elem processing, and it
> > comes with disadvantages in my opinion. More below.
>
> Right, we should keep next_guest_avail_elem. Using the same queue size
> is a balance between:
>
> 1) using next_guest_avail_elem (rare)
> 2) not give too much stress on the cache
>

Ok I'll change the SVQ size for the frontend size then.

> >
> > >
> > > >
> > > >> And this can lead other interesting situation:
> > > >>
> > > >> 1) SVQ uses 256
> > > >>
> > > >> 2) guest vq uses 1024
> > > >>
> > > >> Where a lot of more SVQ logic is needed.
> > > >>
> > > > If we agree that a guest descriptor can expand in multiple SVQ
> > > > descriptors, this should be already handled by the previous logic too.
> > > >
> > > > But this should only happen in case that qemu is launched with a "bad"
> > > > cmdline, isn't it?
> > >
> > >
> > > This seems can happen when we use -device
> > > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> > >
> >
> > I'm going to use the rx queue here since it's more accurate, tx has
> > its own limit separately.
> >
> > If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> > SVQ, L0 qemu will happily accept 1024 as size
>
> Interesting, looks like a bug (I guess it works since you enable vhost?):
>

No, emulated interfaces. More below.

> Per virtio-spec:
>
> """
> Queue Size. On reset, specifies the maximum queue size supported by
> the device. This can be modified by the driver to reduce memory
> requirements. A 0 means the queue is unavailable.
> """
>

Yes but how should it fail? Drivers do not know how to check if the
value was invalid. DEVICE_NEEDS_RESET?

The L0 emulated device simply receives the write to pci and calls
virtio_queue_set_num. I can try to add to the check "num <
vdev->vq[n].vring.num_default", but there is no way to notify the
guest that setting the value failed.

> We can't increase the queue_size from 256 to 1024 actually. (Only
> decrease is allowed).
>
> > when L1 qemu writes that
> > value at vhost_virtqueue_start. I'm not sure what would happen with a
> > real device, my guess is that the device will fail somehow. That's
> > what I meant with a "bad cmdline", I should have been more specific.
>
> I should say that it's something that is probably unrelated to this
> series but needs to be addressed.
>

I agree, I can start developing the patches for sure.

> >
> > If we add SVQ to the mix, the guest first negotiates the 1024 with the
> > qemu device model. After that, vhost.c will try to write 1024 too but
> > this is totally ignored by this patch's changes at
> > vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> > the device, since it's the read value from the device, leading to your
> > scenario. So SVQ effectively isolates both sides and makes possible
> > the communication, even with a device that does not support so many
> > descriptors.
> >
> > But SVQ already handles this case: It's the same as if the buffers are
> > fragmented in HVA and queue size is equal at both sides. That's why I
> > think SVQ size should depend on the backend device's size, not
> > frontend cmdline.
>
> Right.
>
> Thanks
>
> >
> > Thanks!
> >
> > >
> > > >
> > > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > > effectively 256, this will result in an error: We're not exposing it
> > > > to the guest at any moment but with qemu's cmdline.
> > > >
> > > >>> including 256.
> > > >>> Is that what you mean?
> > > >>
> > > >> I mean, it looks to me the logic will be much more simplified if we just
> > > >> allocate the shadow virtqueue with the size what guest can see (guest
> > > >> vring).
> > > >>
> > > >> Then we don't need to think if the difference of the queue size can have
> > > >> any side effects.
> > > >>
> > > > I think that we cannot avoid that extra logic unless we force GPA to
> > > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > > not, I think we are forced to carry all of it.
> > >
> > >
> > > Yes, I agree, the code should be robust to handle any case.
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > But if we prove it I'm not opposed to simplifying things and making
> > > > head at SVQ == head at guest.
> > > >
> > > > Thanks!
> > > >
> > > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > > [2] But that's not the whole story: I've been running limited in tx
> > > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > > vdpa. I'll send a patch to also un-limit it.
> > > >
> > > >>> If with hardware queues you mean guest's vring, not sure why it is
> > > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > > >>> driver the ring size will be the same as the device export, for
> > > >>> example, isn't it?
> > > >>>
> > > >>> The implementation should support any combination of sizes, but the
> > > >>> ring size exposed to the guest is never bigger than hardware one.
> > > >>>
> > > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > > >>>> to add event index support?
> > > >>>>
> > > >>> I think we should not have any problem with event idx. If you mean
> > > >>> that the guest could mark more buffers available than SVQ vring's
> > > >>> size, that should not happen because there must be less entries in the
> > > >>> guest than SVQ.
> > > >>>
> > > >>> But if I understood you correctly, a similar situation could happen if
> > > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > > >>> avail buffers when the device uses more buffers.
> > > >>>
> > > >>> Does that make sense to you?
> > > >>
> > > >> Yes.
> > > >>
> > > >> Thanks
> > > >>
> > >
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
  2022-02-22  7:42                 ` Eugenio Perez Martin
@ 2022-02-22  7:59                     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:59 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 3:43 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 4:16 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>
> > > > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > > > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > > > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > > > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > > > >>>>> patch size.
> > > > >>>>>
> > > > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>>> ---
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > > > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > > > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > > > >>>>>
> > > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> index 035207a469..39aef5ffdf 100644
> > > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > > > >>>>>
> > > > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > >>>>>
> > > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> index f129ec8395..7c168075d7 100644
> > > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > > > >>>>>     /**
> > > > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > >>>>>      * methods and file descriptors.
> > > > >>>>> + *
> > > > >>>>> + * @qsize Shadow VirtQueue size
> > > > >>>>> + *
> > > > >>>>> + * Returns the new virtqueue or NULL.
> > > > >>>>> + *
> > > > >>>>> + * In case of error, reason is reported through error_report.
> > > > >>>>>      */
> > > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > > > >>>>>     {
> > > > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > > > >>>>> +    size_t device_size, driver_size;
> > > > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >>>>>         int r;
> > > > >>>>>
> > > > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > > > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > > > >>>>>
> > > > >>>>> +    svq->vring.num = qsize;
> > > > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > > > >>>> queue size. So this will probably end up with:
> > > > >>>>
> > > > >>>> 1) SVQ use 32K queue size
> > > > >>>> 2) hardware queue uses 256
> > > > >>>>
> > > > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > > > >>> negotiate any number with SVQ equal or less than 32K,
> > > > >>
> > > > >> Sorry for being unclear what I meant is actually
> > > > >>
> > > > >> 1) SVQ uses 32K queue size
> > > > >>
> > > > >> 2) guest vq uses 256
> > > > >>
> > > > >> This looks like a burden that needs extra logic and may damage the
> > > > >> performance.
> > > > >>
> > > > > Still not getting this point.
> > > > >
> > > > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > > > "plenty" of SVQ buffers.
> > > >
> > > >
> > > > Yes, but this case should be rare. So in this case we should deal with
> > > > overrun on SVQ, that is
> > > >
> > > > 1) SVQ is full
> > > > 2) guest VQ isn't
> > > >
> > > > We need to
> > > >
> > > > 1) check the available buffer slots
> > > > 2) disable guest kick and wait for the used buffers
> > > >
> > > > But it looks to me the current code is not ready for dealing with this case?
> > > >
> > >
> > > Yes it deals, that's the meaning of svq->next_guest_avail_elem.
> >
> > Oh right, I missed that.
> >
> > >
> > > >
> > > > >
> > > > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > > > to handle this situation. But we would leave out valid virtio drivers.
> > > > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > > > (x-svq-size-n=N)?
> > > > >
> > > > > If you mean we lose performance because memory gets more sparse I
> > > > > think the only possibility is to limit that way.
> > > >
> > > >
> > > > If guest is not using 32K, having a 32K for svq may gives extra stress
> > > > on the cache since we will end up with a pretty large working set.
> > > >
> > >
> > > That might be true. My guess is that it should not matter, since SVQ
> > > and the guest's vring will have the same numbers of scattered buffers
> > > and the avail / used / packed ring will be consumed more or less
> > > sequentially. But I haven't tested.
> > >
> > > I think it's better to add an upper limit (either fixed or in the
> > > qemu's backend's cmdline) later if we see that this is a problem.
> >
> > I'd suggest using the same size as what the guest saw.
> >
> > > Another solution now would be to get the number from the frontend
> > > device cmdline instead of from the vdpa device. I'm ok with that, but
> > > it doesn't delete the svq->next_guest_avail_elem processing, and it
> > > comes with disadvantages in my opinion. More below.
> >
> > Right, we should keep next_guest_avail_elem. Using the same queue size
> > is a balance between:
> >
> > 1) using next_guest_avail_elem (rare)
> > 2) not give too much stress on the cache
> >
>
> Ok I'll change the SVQ size for the frontend size then.
>
> > >
> > > >
> > > > >
> > > > >> And this can lead other interesting situation:
> > > > >>
> > > > >> 1) SVQ uses 256
> > > > >>
> > > > >> 2) guest vq uses 1024
> > > > >>
> > > > >> Where a lot of more SVQ logic is needed.
> > > > >>
> > > > > If we agree that a guest descriptor can expand in multiple SVQ
> > > > > descriptors, this should be already handled by the previous logic too.
> > > > >
> > > > > But this should only happen in case that qemu is launched with a "bad"
> > > > > cmdline, isn't it?
> > > >
> > > >
> > > > This seems can happen when we use -device
> > > > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> > > >
> > >
> > > I'm going to use the rx queue here since it's more accurate, tx has
> > > its own limit separately.
> > >
> > > If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> > > SVQ, L0 qemu will happily accept 1024 as size
> >
> > Interesting, looks like a bug (I guess it works since you enable vhost?):
> >
>
> No, emulated interfaces. More below.
>
> > Per virtio-spec:
> >
> > """
> > Queue Size. On reset, specifies the maximum queue size supported by
> > the device. This can be modified by the driver to reduce memory
> > requirements. A 0 means the queue is unavailable.
> > """
> >
>
> Yes but how should it fail? Drivers do not know how to check if the
> value was invalid. DEVICE_NEEDS_RESET?

I think it can be detected by reading the value back to see if it matches.

Thanks

>
> The L0 emulated device simply receives the write to pci and calls
> virtio_queue_set_num. I can try to add to the check "num <
> vdev->vq[n].vring.num_default", but there is no way to notify the
> guest that setting the value failed.
>
> > We can't increase the queue_size from 256 to 1024 actually. (Only
> > decrease is allowed).
> >
> > > when L1 qemu writes that
> > > value at vhost_virtqueue_start. I'm not sure what would happen with a
> > > real device, my guess is that the device will fail somehow. That's
> > > what I meant with a "bad cmdline", I should have been more specific.
> >
> > I should say that it's something that is probably unrelated to this
> > series but needs to be addressed.
> >
>
> I agree, I can start developing the patches for sure.
>
> > >
> > > If we add SVQ to the mix, the guest first negotiates the 1024 with the
> > > qemu device model. After that, vhost.c will try to write 1024 too but
> > > this is totally ignored by this patch's changes at
> > > vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> > > the device, since it's the read value from the device, leading to your
> > > scenario. So SVQ effectively isolates both sides and makes possible
> > > the communication, even with a device that does not support so many
> > > descriptors.
> > >
> > > But SVQ already handles this case: It's the same as if the buffers are
> > > fragmented in HVA and queue size is equal at both sides. That's why I
> > > think SVQ size should depend on the backend device's size, not
> > > frontend cmdline.
> >
> > Right.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > >
> > > > >
> > > > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > > > effectively 256, this will result in an error: We're not exposing it
> > > > > to the guest at any moment but with qemu's cmdline.
> > > > >
> > > > >>> including 256.
> > > > >>> Is that what you mean?
> > > > >>
> > > > >> I mean, it looks to me the logic will be much more simplified if we just
> > > > >> allocate the shadow virtqueue with the size what guest can see (guest
> > > > >> vring).
> > > > >>
> > > > >> Then we don't need to think if the difference of the queue size can have
> > > > >> any side effects.
> > > > >>
> > > > > I think that we cannot avoid that extra logic unless we force GPA to
> > > > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > > > not, I think we are forced to carry all of it.
> > > >
> > > >
> > > > Yes, I agree, the code should be robust to handle any case.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > But if we prove it I'm not opposed to simplifying things and making
> > > > > head at SVQ == head at guest.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > > > [2] But that's not the whole story: I've been running limited in tx
> > > > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > > > vdpa. I'll send a patch to also un-limit it.
> > > > >
> > > > >>> If with hardware queues you mean guest's vring, not sure why it is
> > > > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > > > >>> driver the ring size will be the same as the device export, for
> > > > >>> example, isn't it?
> > > > >>>
> > > > >>> The implementation should support any combination of sizes, but the
> > > > >>> ring size exposed to the guest is never bigger than hardware one.
> > > > >>>
> > > > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > > > >>>> to add event index support?
> > > > >>>>
> > > > >>> I think we should not have any problem with event idx. If you mean
> > > > >>> that the guest could mark more buffers available than SVQ vring's
> > > > >>> size, that should not happen because there must be less entries in the
> > > > >>> guest than SVQ.
> > > > >>>
> > > > >>> But if I understood you correctly, a similar situation could happen if
> > > > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > > > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > > > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > > > >>> avail buffers when the device uses more buffers.
> > > > >>>
> > > > >>> Does that make sense to you?
> > > > >>
> > > > >> Yes.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
@ 2022-02-22  7:59                     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-22  7:59 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 3:43 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 4:16 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>
> > > > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > > > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > > > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > > > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > > > >>>>> patch size.
> > > > >>>>>
> > > > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>>> ---
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > > > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > > > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > > > >>>>>
> > > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> index 035207a469..39aef5ffdf 100644
> > > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > > > >>>>>
> > > > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > >>>>>
> > > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> index f129ec8395..7c168075d7 100644
> > > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > > > >>>>>     /**
> > > > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > >>>>>      * methods and file descriptors.
> > > > >>>>> + *
> > > > >>>>> + * @qsize Shadow VirtQueue size
> > > > >>>>> + *
> > > > >>>>> + * Returns the new virtqueue or NULL.
> > > > >>>>> + *
> > > > >>>>> + * In case of error, reason is reported through error_report.
> > > > >>>>>      */
> > > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > > > >>>>>     {
> > > > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > > > >>>>> +    size_t device_size, driver_size;
> > > > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >>>>>         int r;
> > > > >>>>>
> > > > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > > > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > > > >>>>>
> > > > >>>>> +    svq->vring.num = qsize;
> > > > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > > > >>>> queue size. So this will probably end up with:
> > > > >>>>
> > > > >>>> 1) SVQ use 32K queue size
> > > > >>>> 2) hardware queue uses 256
> > > > >>>>
> > > > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > > > >>> negotiate any number with SVQ equal or less than 32K,
> > > > >>
> > > > >> Sorry for being unclear what I meant is actually
> > > > >>
> > > > >> 1) SVQ uses 32K queue size
> > > > >>
> > > > >> 2) guest vq uses 256
> > > > >>
> > > > >> This looks like a burden that needs extra logic and may damage the
> > > > >> performance.
> > > > >>
> > > > > Still not getting this point.
> > > > >
> > > > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > > > "plenty" of SVQ buffers.
> > > >
> > > >
> > > > Yes, but this case should be rare. So in this case we should deal with
> > > > overrun on SVQ, that is
> > > >
> > > > 1) SVQ is full
> > > > 2) guest VQ isn't
> > > >
> > > > We need to
> > > >
> > > > 1) check the available buffer slots
> > > > 2) disable guest kick and wait for the used buffers
> > > >
> > > > But it looks to me the current code is not ready for dealing with this case?
> > > >
> > >
> > > Yes it deals, that's the meaning of svq->next_guest_avail_elem.
> >
> > Oh right, I missed that.
> >
> > >
> > > >
> > > > >
> > > > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > > > to handle this situation. But we would leave out valid virtio drivers.
> > > > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > > > (x-svq-size-n=N)?
> > > > >
> > > > > If you mean we lose performance because memory gets more sparse I
> > > > > think the only possibility is to limit that way.
> > > >
> > > >
> > > > If guest is not using 32K, having a 32K for svq may gives extra stress
> > > > on the cache since we will end up with a pretty large working set.
> > > >
> > >
> > > That might be true. My guess is that it should not matter, since SVQ
> > > and the guest's vring will have the same numbers of scattered buffers
> > > and the avail / used / packed ring will be consumed more or less
> > > sequentially. But I haven't tested.
> > >
> > > I think it's better to add an upper limit (either fixed or in the
> > > qemu's backend's cmdline) later if we see that this is a problem.
> >
> > I'd suggest using the same size as what the guest saw.
> >
> > > Another solution now would be to get the number from the frontend
> > > device cmdline instead of from the vdpa device. I'm ok with that, but
> > > it doesn't delete the svq->next_guest_avail_elem processing, and it
> > > comes with disadvantages in my opinion. More below.
> >
> > Right, we should keep next_guest_avail_elem. Using the same queue size
> > is a balance between:
> >
> > 1) using next_guest_avail_elem (rare)
> > 2) not give too much stress on the cache
> >
>
> Ok I'll change the SVQ size for the frontend size then.
>
> > >
> > > >
> > > > >
> > > > >> And this can lead other interesting situation:
> > > > >>
> > > > >> 1) SVQ uses 256
> > > > >>
> > > > >> 2) guest vq uses 1024
> > > > >>
> > > > >> Where a lot of more SVQ logic is needed.
> > > > >>
> > > > > If we agree that a guest descriptor can expand in multiple SVQ
> > > > > descriptors, this should be already handled by the previous logic too.
> > > > >
> > > > > But this should only happen in case that qemu is launched with a "bad"
> > > > > cmdline, isn't it?
> > > >
> > > >
> > > > This seems can happen when we use -device
> > > > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> > > >
> > >
> > > I'm going to use the rx queue here since it's more accurate, tx has
> > > its own limit separately.
> > >
> > > If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> > > SVQ, L0 qemu will happily accept 1024 as size
> >
> > Interesting, looks like a bug (I guess it works since you enable vhost?):
> >
>
> No, emulated interfaces. More below.
>
> > Per virtio-spec:
> >
> > """
> > Queue Size. On reset, specifies the maximum queue size supported by
> > the device. This can be modified by the driver to reduce memory
> > requirements. A 0 means the queue is unavailable.
> > """
> >
>
> Yes but how should it fail? Drivers do not know how to check if the
> value was invalid. DEVICE_NEEDS_RESET?

I think it can be detected by reading the value back to see if it matches.

Thanks

>
> The L0 emulated device simply receives the write to pci and calls
> virtio_queue_set_num. I can try to add to the check "num <
> vdev->vq[n].vring.num_default", but there is no way to notify the
> guest that setting the value failed.
>
> > We can't increase the queue_size from 256 to 1024 actually. (Only
> > decrease is allowed).
> >
> > > when L1 qemu writes that
> > > value at vhost_virtqueue_start. I'm not sure what would happen with a
> > > real device, my guess is that the device will fail somehow. That's
> > > what I meant with a "bad cmdline", I should have been more specific.
> >
> > I should say that it's something that is probably unrelated to this
> > series but needs to be addressed.
> >
>
> I agree, I can start developing the patches for sure.
>
> > >
> > > If we add SVQ to the mix, the guest first negotiates the 1024 with the
> > > qemu device model. After that, vhost.c will try to write 1024 too but
> > > this is totally ignored by this patch's changes at
> > > vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> > > the device, since it's the read value from the device, leading to your
> > > scenario. So SVQ effectively isolates both sides and makes possible
> > > the communication, even with a device that does not support so many
> > > descriptors.
> > >
> > > But SVQ already handles this case: It's the same as if the buffers are
> > > fragmented in HVA and queue size is equal at both sides. That's why I
> > > think SVQ size should depend on the backend device's size, not
> > > frontend cmdline.
> >
> > Right.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > >
> > > > >
> > > > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > > > effectively 256, this will result in an error: We're not exposing it
> > > > > to the guest at any moment but with qemu's cmdline.
> > > > >
> > > > >>> including 256.
> > > > >>> Is that what you mean?
> > > > >>
> > > > >> I mean, it looks to me the logic will be much more simplified if we just
> > > > >> allocate the shadow virtqueue with the size what guest can see (guest
> > > > >> vring).
> > > > >>
> > > > >> Then we don't need to think if the difference of the queue size can have
> > > > >> any side effects.
> > > > >>
> > > > > I think that we cannot avoid that extra logic unless we force GPA to
> > > > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > > > not, I think we are forced to carry all of it.
> > > >
> > > >
> > > > Yes, I agree, the code should be robust to handle any case.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > But if we prove it I'm not opposed to simplifying things and making
> > > > > head at SVQ == head at guest.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > > > [2] But that's not the whole story: I've been running limited in tx
> > > > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > > > vdpa. I'll send a patch to also un-limit it.
> > > > >
> > > > >>> If with hardware queues you mean guest's vring, not sure why it is
> > > > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > > > >>> driver the ring size will be the same as the device export, for
> > > > >>> example, isn't it?
> > > > >>>
> > > > >>> The implementation should support any combination of sizes, but the
> > > > >>> ring size exposed to the guest is never bigger than hardware one.
> > > > >>>
> > > > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > > > >>>> to add event index support?
> > > > >>>>
> > > > >>> I think we should not have any problem with event idx. If you mean
> > > > >>> that the guest could mark more buffers available than SVQ vring's
> > > > >>> size, that should not happen because there must be less entries in the
> > > > >>> guest than SVQ.
> > > > >>>
> > > > >>> But if I understood you correctly, a similar situation could happen if
> > > > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > > > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > > > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > > > >>> avail buffers when the device uses more buffers.
> > > > >>>
> > > > >>> Does that make sense to you?
> > > > >>
> > > > >> Yes.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-22  7:41                 ` Jason Wang
  (?)
@ 2022-02-22  8:05                 ` Eugenio Perez Martin
  2022-02-23  3:46                     ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-22  8:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> >> <eperezma@redhat.com> wrote:
> >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>>
> >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> >>>>>>> block migration.
> >>>>>>>
> >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> >>>>>>> because SVQ memory is in the qemu region.
> >>>>>>>
> >>>>>>> The log region is still allocated. Future changes might skip that, but
> >>>>>>> this series is already long enough.
> >>>>>>>
> >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>>>>>> ---
> >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> >>>>>>>     1 file changed, 20 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>>>>>> index fb0a338baa..75090d65e8 100644
> >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> >>>>>>>             /* Filter only features that SVQ can offer to guest */
> >>>>>>>             vhost_svq_valid_guest_features(features);
> >>>>>>> +
> >>>>>>> +        /* Add SVQ logging capabilities */
> >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> >>>>>>>         }
> >>>>>>>
> >>>>>>>         return ret;
> >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> >>>>>>>
> >>>>>>>         if (v->shadow_vqs_enabled) {
> >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> >>>>>>> +        uint8_t status = 0;
> >>>>>>>             bool ok;
> >>>>>>>
> >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> >>>>>>> +        if (unlikely(ret)) {
> >>>>>>> +            return ret;
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >>>>>>> +            /*
> >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> >>>>>>> +             */
> >>>>>> I fail to understand this comment, I'd think there's no way to disable
> >>>>>> dirty page tracking for SVQ.
> >>>>>>
> >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> >>>>> migration. To inform the device that it should start logging, they set
> >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> >>>>
> >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> >>>> enabled and disabled.
> >>>>
> >>> Yes, that's what this patch does.
> >>>
> >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> >>>>> vhost does not block migration. Maybe we need to look for another way
> >>>>> to do this?
> >>>>
> >>>> I'm fine with filtering since it's much more simpler, but I fail to
> >>>> understand why we need to check DRIVER_OK.
> >>>>
> >>> Ok maybe I can make that part more clear,
> >>>
> >>> Since both operations use vhost_vdpa_set_features we must just filter
> >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> >>> affecting other features.
> >>>
> >>> In practice, that means to not forward the set features after
> >>> DRIVER_OK. The device is not expecting them anymore.
> >> I wonder what happens if we don't do this.
> >>
> > If we simply delete the check vhost_dev_set_features will return an
> > error, failing the start of the migration. More on this below.
>
>
> Ok.
>
>
> >
> >> So kernel had this check:
> >>
> >>          /*
> >>           * It's not allowed to change the features after they have
> >>           * been negotiated.
> >>           */
> >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> >>          return -EBUSY;
> >>
> >> So is it FEATURES_OK actually?
> >>
> > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > it for the next version.
> >
> > But it should be functionally equivalent, since
> > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > be concurrent with it.
>
>
> Right.
>
>
> >
> >> For this patch, I wonder if the thing we need to do is to see whether
> >> it is a enable/disable F_LOG_ALL and simply return.
> >>
> > Yes, that's the intention of the patch.
> >
> > We have 4 cases here:
> > a) We're being called from vhost_dev_start, with enable_log = false
> > b) We're being called from vhost_dev_start, with enable_log = true
>
>
> And this case makes us can't simply return without calling vhost-vdpa.
>

It calls because {FEATURES,DRIVER}_OK is still not set at that point.

>
> > c) We're being called from vhost_dev_set_log, with enable_log = false
> > d) We're being called from vhost_dev_set_log, with enable_log = true
> >
> > The way to tell the difference between a/b and c/d is to check if
> > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > memory through the memory unmapping, so we clear the bit
> > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > (cases a and b).
> >
> > Another possibility is to track if features have been set with a bool
> > in vhost_vdpa or something like that. But it seems cleaner to me to
> > only store that in the actual device.
>
>
> So I suggest to make sure codes match the comment:
>
>          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>              /*
>               * vhost is trying to enable or disable _F_LOG, and the device
>               * would report wrong dirty pages. SVQ handles it.
>               */
>              return 0;
>          }
>
> It would be better to check whether the caller is toggling _F_LOG_ALL in
> this case.
>

How to detect? We can save feature flags and compare, but ignoring all
set_features after FEATURES_OK seems simpler to me.

Would changing the comment work? Something like "set_features after
_S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
the device would report wrong dirty pages. SVQ handles it."

Thanks!

> Thanks
>
>
> >
> >> Thanks
> >>
> >>> Does that make more sense?
> >>>
> >>> Thanks!
> >>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>> Thanks!
> >>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>> +            return 0;
> >>>>>>> +        }
> >>>>>>> +
> >>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
> >>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> >>>>>>> +
> >>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> >>>>>>>             if (ret != 0) {
> >>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-22  7:26                 ` Jason Wang
  (?)
@ 2022-02-22  8:55                 ` Eugenio Perez Martin
  2022-02-23  2:26                     ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-22  8:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>>>>>      {
> >>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
> >>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> >>>>>>> +
> >>>>>>> +    if (!svq->vq) {
> >>>>>>> +        return;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    /* Send all pending used descriptors to guest */
> >>>>>>> +    vhost_svq_flush(svq, false);
> >>>>>> Do we need to wait for all the pending descriptors to be completed here?
> >>>>>>
> >>>>> No, this function does not wait, it only completes the forwarding of
> >>>>> the *used* descriptors.
> >>>>>
> >>>>> The best example is the net rx queue in my opinion. This call will
> >>>>> check SVQ's vring used_idx and will forward the last used descriptors
> >>>>> if any, but all available descriptors will remain as available for
> >>>>> qemu's VQ code.
> >>>>>
> >>>>> To skip it would miss those last rx descriptors in migration.
> >>>>>
> >>>>> Thanks!
> >>>> So it's probably to not the best place to ask. It's more about the
> >>>> inflight descriptors so it should be TX instead of RX.
> >>>>
> >>>> I can imagine the migration last phase, we should stop the vhost-vDPA
> >>>> before calling vhost_svq_stop(). Then we should be fine regardless of
> >>>> inflight descriptors.
> >>>>
> >>> I think I'm still missing something here.
> >>>
> >>> To be on the same page. Regarding tx this could cause repeated tx
> >>> frames (one at source and other at destination), but never a missed
> >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> >>> is not forwarding available buffers anymore". Would that work?
> >>
> >> Right, but this only work if
> >>
> >> 1) a flush to make sure TX DMA for inflight descriptors are all completed
> >>
> >> 2) just mark all inflight descriptor used
> >>
> > It currently trusts on the reverse: Buffers not marked as used (by the
> > device) will be available in the destination, so expect
> > retransmissions.
>
>
> I may miss something but I think we do migrate last_avail_idx. So there
> won't be a re-transmission, since we depend on qemu virtqueue code to
> deal with vring base?
>

On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
last_avail_idx == last_used_idx, and it's migrated after that if I'm
not wrong.

vhost kernel migrates last_avail_idx, but it makes rx buffers
available on-demand, unlike SVQ. So it does not need to unwind buffers
or anything like that. Because of how SVQ works with the rx queue,
this is not possible, since the destination will find no available
buffers for rx. And for tx you already have described the scenario.

In other words, we cannot see SVQ as a vhost device in that regard:
SVQ looks for total drain (as "make all guest's buffers available for
the device ASAP") vs the vhost device which can live with a lot of
available ones and it will use them on demand. Same problem as
masking. So the difference in behavior is justified in my opinion, and
it can be improved in the future with the vdpa in-flight descriptors.

If we restore the state that way in a virtio-net device, it will see
the available ones as expected, not as in-flight.

Another possibility is to transform all of these into in-flight ones,
but I feel it would create problems. Can we migrate all rx queues as
in-flight, with 0 bytes written? Is it worth it? I didn't investigate
that path too much, but I think the virtio-net emulated device does
not support that at the moment. If I'm not wrong, we should copy
something like the body of virtio_blk_load_device if we want to go
that route.

The current approach might be too net-centric, so let me know if this
behavior is unexpected or we can do better otherwise.

Thanks!

> Thanks
>
>
> >
> > Thanks!
> >
> >> Otherwise there could be buffers that is inflight forever.
> >>
> >> Thanks
> >>
> >>
> >>> Thanks!
> >>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>> +
> >>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> >>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
> >>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> >>>>>>> +        if (elem) {
> >>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> >>>>>>> +        }
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> >>>>>>> +    if (next_avail_elem) {
> >>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> >>>>>>> +                                 next_avail_elem->len);
> >>>>>>> +    }
> >>>>>>>      }
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-08  8:11         ` Jason Wang
  (?)
@ 2022-02-22 19:01         ` Eugenio Perez Martin
  2022-02-23  2:03             ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-22 19:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 8, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> > On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> Initial version of shadow virtqueue that actually forward buffers. There
> >>> is no iommu support at the moment, and that will be addressed in future
> >>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> >>> this means that SVQ is not usable at this point of the series on any
> >>> device.
> >>>
> >>> For simplicity it only supports modern devices, that expects vring
> >>> in little endian, with split ring and no event idx or indirect
> >>> descriptors. Support for them will not be added in this series.
> >>>
> >>> It reuses the VirtQueue code for the device part. The driver part is
> >>> based on Linux's virtio_ring driver, but with stripped functionality
> >>> and optimizations so it's easier to review.
> >>>
> >>> However, forwarding buffers have some particular pieces: One of the most
> >>> unexpected ones is that a guest's buffer can expand through more than
> >>> one descriptor in SVQ. While this is handled gracefully by qemu's
> >>> emulated virtio devices, it may cause unexpected SVQ queue full. This
> >>> patch also solves it by checking for this condition at both guest's
> >>> kicks and device's calls. The code may be more elegant in the future if
> >>> SVQ code runs in its own iocontext.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
> >>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
> >>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
> >>>    3 files changed, 462 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> >>> index 39aef5ffdf..19c934af49 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> >>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
> >>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> >>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> >>>
> >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> >>> +                     VirtQueue *vq);
> >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >>>
> >>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> >>> index 7c168075d7..a1a404f68f 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> >>> @@ -9,6 +9,8 @@
> >>>
> >>>    #include "qemu/osdep.h"
> >>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
> >>> +#include "hw/virtio/vhost.h"
> >>> +#include "hw/virtio/virtio-access.h"
> >>>    #include "standard-headers/linux/vhost_types.h"
> >>>
> >>>    #include "qemu/error-report.h"
> >>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
> >>>
> >>>        /* Guest's call notifier, where SVQ calls guest. */
> >>>        EventNotifier svq_call;
> >>> +
> >>> +    /* Virtio queue shadowing */
> >>> +    VirtQueue *vq;
> >>> +
> >>> +    /* Virtio device */
> >>> +    VirtIODevice *vdev;
> >>> +
> >>> +    /* Map for returning guest's descriptors */
> >>> +    VirtQueueElement **ring_id_maps;
> >>> +
> >>> +    /* Next VirtQueue element that guest made available */
> >>> +    VirtQueueElement *next_guest_avail_elem;
> >>> +
> >>> +    /* Next head to expose to device */
> >>> +    uint16_t avail_idx_shadow;
> >>> +
> >>> +    /* Next free descriptor */
> >>> +    uint16_t free_head;
> >>> +
> >>> +    /* Last seen used idx */
> >>> +    uint16_t shadow_used_idx;
> >>> +
> >>> +    /* Next head to consume from device */
> >>> +    uint16_t last_used_idx;
> >>> +
> >>> +    /* Cache for the exposed notification flag */
> >>> +    bool notification;
> >>>    } VhostShadowVirtqueue;
> >>>
> >>>    #define INVALID_SVQ_KICK_FD -1
> >>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
> >>>        return true;
> >>>    }
> >>>
> >>> -/* Forward guest notifications */
> >>> -static void vhost_handle_guest_kick(EventNotifier *n)
> >>> +/**
> >>> + * Number of descriptors that SVQ can make available from the guest.
> >>> + *
> >>> + * @svq   The svq
> >>> + */
> >>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> >>>    {
> >>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> >>> -                                             svq_kick);
> >>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> >>> +}
> >>> +
> >>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> >>> +{
> >>> +    uint16_t notification_flag;
> >>>
> >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> >>> +    if (svq->notification == enable) {
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> >>> +
> >>> +    svq->notification = enable;
> >>> +    if (enable) {
> >>> +        svq->vring.avail->flags &= ~notification_flag;
> >>> +    } else {
> >>> +        svq->vring.avail->flags |= notification_flag;
> >>> +    }
> >>> +}
> >>> +
> >>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> >>> +                                    const struct iovec *iovec,
> >>> +                                    size_t num, bool more_descs, bool write)
> >>> +{
> >>> +    uint16_t i = svq->free_head, last = svq->free_head;
> >>> +    unsigned n;
> >>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> >>> +    vring_desc_t *descs = svq->vring.desc;
> >>> +
> >>> +    if (num == 0) {
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    for (n = 0; n < num; n++) {
> >>> +        if (more_descs || (n + 1 < num)) {
> >>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> >>> +        } else {
> >>> +            descs[i].flags = flags;
> >>> +        }
> >>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> >>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> >>> +
> >>> +        last = i;
> >>> +        i = cpu_to_le16(descs[i].next);
> >>> +    }
> >>> +
> >>> +    svq->free_head = le16_to_cpu(descs[last].next);
> >>> +}
> >>> +
> >>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> >>> +                                    VirtQueueElement *elem)
> >>> +{
> >>> +    int head;
> >>> +    unsigned avail_idx;
> >>> +    vring_avail_t *avail = svq->vring.avail;
> >>> +
> >>> +    head = svq->free_head;
> >>> +
> >>> +    /* We need some descriptors here */
> >>> +    assert(elem->out_num || elem->in_num);
> >>
> >> Looks like this could be triggered by guest, we need fail instead assert
> >> here.
> >>
> > My understanding was that virtqueue_pop already sanitized that case,
> > but I'm not able to find where now. I will recheck and, in case it's
> > not, I will move to a failure.
> >
> >>> +
> >>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> >>> +                            elem->in_num > 0, false);
> >>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> >>> +
> >>> +    /*
> >>> +     * Put entry in available array (but don't update avail->idx until they
> >>> +     * do sync).
> >>> +     */
> >>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> >>> +    avail->ring[avail_idx] = cpu_to_le16(head);
> >>> +    svq->avail_idx_shadow++;
> >>> +
> >>> +    /* Update avail index after the descriptor is wrote */
> >>> +    smp_wmb();
> >>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> >>> +
> >>> +    return head;
> >>> +}
> >>> +
> >>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> >>> +{
> >>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> >>> +
> >>> +    svq->ring_id_maps[qemu_head] = elem;
> >>> +}
> >>> +
> >>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> >>> +{
> >>> +    /* We need to expose available array entries before checking used flags */
> >>> +    smp_mb();
> >>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
> >>>            return;
> >>>        }
> >>>
> >>>        event_notifier_set(&svq->hdev_kick);
> >>>    }
> >>>
> >>> -/* Forward vhost notifications */
> >>> +/**
> >>> + * Forward available buffers.
> >>> + *
> >>> + * @svq Shadow VirtQueue
> >>> + *
> >>> + * Note that this function does not guarantee that all guest's available
> >>> + * buffers are available to the device in SVQ avail ring. The guest may have
> >>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> >>> + * vaddr.
> >>> + *
> >>> + * If that happens, guest's kick notifications will be disabled until device
> >>> + * makes some buffers used.
> >>> + */
> >>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >>> +{
> >>> +    /* Clear event notifier */
> >>> +    event_notifier_test_and_clear(&svq->svq_kick);
> >>> +
> >>> +    /* Make available as many buffers as possible */
> >>> +    do {
> >>> +        if (virtio_queue_get_notification(svq->vq)) {
> >>> +            virtio_queue_set_notification(svq->vq, false);
> >>
> >> This looks like an optimization the should belong to
> >> virtio_queue_set_notification() itself.
> >>
> > Sure we can move.
> >
> >>> +        }
> >>> +
> >>> +        while (true) {
> >>> +            VirtQueueElement *elem;
> >>> +
> >>> +            if (svq->next_guest_avail_elem) {
> >>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> >>> +            } else {
> >>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> >>> +            }
> >>> +
> >>> +            if (!elem) {
> >>> +                break;
> >>> +            }
> >>> +
> >>> +            if (elem->out_num + elem->in_num >
> >>> +                vhost_svq_available_slots(svq)) {
> >>> +                /*
> >>> +                 * This condition is possible since a contiguous buffer in GPA
> >>> +                 * does not imply a contiguous buffer in qemu's VA
> >>> +                 * scatter-gather segments. If that happen, the buffer exposed
> >>> +                 * to the device needs to be a chain of descriptors at this
> >>> +                 * moment.
> >>> +                 *
> >>> +                 * SVQ cannot hold more available buffers if we are here:
> >>> +                 * queue the current guest descriptor and ignore further kicks
> >>> +                 * until some elements are used.
> >>> +                 */
> >>> +                svq->next_guest_avail_elem = elem;
> >>> +                return;
> >>> +            }
> >>> +
> >>> +            vhost_svq_add(svq, elem);
> >>> +            vhost_svq_kick(svq);
> >>> +        }
> >>> +
> >>> +        virtio_queue_set_notification(svq->vq, true);
> >>> +    } while (!virtio_queue_empty(svq->vq));
> >>> +}
> >>> +
> >>> +/**
> >>> + * Handle guest's kick.
> >>> + *
> >>> + * @n guest kick event notifier, the one that guest set to notify svq.
> >>> + */
> >>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> >>> +{
> >>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> >>> +                                             svq_kick);
> >>> +    vhost_handle_guest_kick(svq);
> >>> +}
> >>> +
> >>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> >>> +{
> >>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> >>> +
> >>> +    return svq->last_used_idx != svq->shadow_used_idx;
> >>> +}
> >>> +
> >>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> >>> +{
> >>> +    vring_desc_t *descs = svq->vring.desc;
> >>> +    const vring_used_t *used = svq->vring.used;
> >>> +    vring_used_elem_t used_elem;
> >>> +    uint16_t last_used;
> >>> +
> >>> +    if (!vhost_svq_more_used(svq)) {
> >>> +        return NULL;
> >>> +    }
> >>> +
> >>> +    /* Only get used array entries after they have been exposed by dev */
> >>> +    smp_rmb();
> >>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> >>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> >>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> >>> +
> >>> +    svq->last_used_idx++;
> >>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> >>> +        error_report("Device %s says index %u is used", svq->vdev->name,
> >>> +                     used_elem.id);
> >>> +        return NULL;
> >>> +    }
> >>> +
> >>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> >>> +        error_report(
> >>> +            "Device %s says index %u is used, but it was not available",
> >>> +            svq->vdev->name, used_elem.id);
> >>> +        return NULL;
> >>> +    }
> >>> +
> >>> +    descs[used_elem.id].next = svq->free_head;
> >>> +    svq->free_head = used_elem.id;
> >>> +
> >>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> >>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> >>> +}
> >>> +
> >>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >>> +                            bool check_for_avail_queue)
> >>> +{
> >>> +    VirtQueue *vq = svq->vq;
> >>> +
> >>> +    /* Make as many buffers as possible used. */
> >>> +    do {
> >>> +        unsigned i = 0;
> >>> +
> >>> +        vhost_svq_set_notification(svq, false);
> >>> +        while (true) {
> >>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> >>> +            if (!elem) {
> >>> +                break;
> >>> +            }
> >>> +
> >>> +            if (unlikely(i >= svq->vring.num)) {
> >>> +                virtio_error(svq->vdev,
> >>> +                         "More than %u used buffers obtained in a %u size SVQ",
> >>> +                         i, svq->vring.num);
> >>> +                virtqueue_fill(vq, elem, elem->len, i);
> >>> +                virtqueue_flush(vq, i);
> >>
> >> Let's simply use virtqueue_push() here?
> >>
> > virtqueue_push support to fill and flush only one element, instead of
> > batch. I'm fine with either but I think the less updates to the used
> > idx, the better.
>
>
> Fine.
>
>
> >
> >>> +                i = 0;
> >>
> >> Do we need to bail out here?
> >>
> > Yes I guess we can simply return.
> >
> >>> +            }
> >>> +            virtqueue_fill(vq, elem, elem->len, i++);
> >>> +        }
> >>> +
> >>> +        virtqueue_flush(vq, i);
> >>> +        event_notifier_set(&svq->svq_call);
> >>> +
> >>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> >>> +            /*
> >>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> >>> +             * good moment to make more descriptors available if possible
> >>> +             */
> >>> +            vhost_handle_guest_kick(svq);
> >>
> >> Is there better to have a similar check as vhost_handle_guest_kick() did?
> >>
> >>               if (elem->out_num + elem->in_num >
> >>                   vhost_svq_available_slots(svq)) {
> >>
> > It will be duplicated when we call vhost_handle_guest_kick, won't it?
>
>
> Right, I mis-read the code.
>
>
> >
> >>> +        }
> >>> +
> >>> +        vhost_svq_set_notification(svq, true);
> >>
> >> A mb() is needed here? Otherwise we may lost a call here (where
> >> vhost_svq_more_used() is run before vhost_svq_set_notification()).
> >>
> > I'm confused here then, I thought you said this is just a hint so
> > there was no need? [1]. I think the memory barrier is needed too.
>
>
> Yes, it's a hint but:
>
> 1) When we disable the notification, consider the notification disable
> is just a hint, device can still raise an interrupt, so the ordering is
> meaningless and a memory barrier is not necessary (the
> vhost_svq_set_notification(svq, false))
>
> 2) When we enable the notification, though it's a hint, the device can
> choose to implement it by enabling the interrupt, in this case, the
> notification enable should be done before checking the used. Otherwise,
> the checking of more used might be done before enable the notification:
>
> 1) driver check more used
> 2) device add more used but no notification
> 3) driver enable the notification then we lost a notification here
>

That was my understanding too. So the right way is to only add the
memory barrier in case 2), when setting the flag, right?

>
> >>> +    } while (vhost_svq_more_used(svq));
> >>> +}
> >>> +
> >>> +/**
> >>> + * Forward used buffers.
> >>> + *
> >>> + * @n hdev call event notifier, the one that device set to notify svq.
> >>> + *
> >>> + * Note that we are not making any buffers available in the loop, there is no
> >>> + * way that it runs more than virtqueue size times.
> >>> + */
> >>>    static void vhost_svq_handle_call(EventNotifier *n)
> >>>    {
> >>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> >>>                                                 hdev_call);
> >>>
> >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> >>> -        return;
> >>> -    }
> >>> +    /* Clear event notifier */
> >>> +    event_notifier_test_and_clear(n);
> >>
> >> Any reason that we remove the above check?
> >>
> > This comes from the previous versions, where this made sure we missed
> > no used buffers in the process of switching to SVQ mode.
>
>
> I'm not sure I get here. Even if for the switching, it should be more
> safe the handle the flush unconditionally?
>

Yes, I also think it's better to forward and kick/call unconditionally.

Thanks!

> Thanks
>
>
> >
> > If we enable SVQ from the beginning I think we can rely on getting all
> > the device's used buffer notifications, so let me think a little bit
> > and I can move to check the eventfd.
> >
> >>> -    event_notifier_set(&svq->svq_call);
> >>> +    vhost_svq_flush(svq, true);
> >>>    }
> >>>
> >>>    /**
> >>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >>>         * need to explicitely check for them.
> >>>         */
> >>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> >>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> >>> +    event_notifier_set_handler(&svq->svq_kick,
> >>> +                               vhost_handle_guest_kick_notifier);
> >>>
> >>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
> >>>            event_notifier_set(&svq->hdev_kick);
> >>>        }
> >>>    }
> >>>
> >>> +/**
> >>> + * Start shadow virtqueue operation.
> >>> + *
> >>> + * @svq Shadow Virtqueue
> >>> + * @vdev        VirtIO device
> >>> + * @vq          Virtqueue to shadow
> >>> + */
> >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> >>> +                     VirtQueue *vq)
> >>> +{
> >>> +    svq->next_guest_avail_elem = NULL;
> >>> +    svq->avail_idx_shadow = 0;
> >>> +    svq->shadow_used_idx = 0;
> >>> +    svq->last_used_idx = 0;
> >>> +    svq->vdev = vdev;
> >>> +    svq->vq = vq;
> >>> +
> >>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> >>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> >>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> >>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> >>> +    }
> >>> +}
> >>> +
> >>>    /**
> >>>     * Stop shadow virtqueue operation.
> >>>     * @svq Shadow Virtqueue
> >>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>    {
> >>>        event_notifier_set_handler(&svq->svq_kick, NULL);
> >>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> >>> +
> >>> +    if (!svq->vq) {
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    /* Send all pending used descriptors to guest */
> >>> +    vhost_svq_flush(svq, false);
> >>> +
> >>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> >>> +        g_autofree VirtQueueElement *elem = NULL;
> >>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> >>> +        if (elem) {
> >>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> >>> +        }
> >>> +    }
> >>> +
> >>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> >>> +    if (next_avail_elem) {
> >>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> >>> +                                 next_avail_elem->len);
> >>> +    }
> >>>    }
> >>>
> >>>    /**
> >>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >>>        memset(svq->vring.desc, 0, driver_size);
> >>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> >>>        memset(svq->vring.used, 0, device_size);
> >>> -
> >>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
> >>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >>>        return g_steal_pointer(&svq);
> >>>
> >>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> >>>        event_notifier_cleanup(&vq->hdev_kick);
> >>>        event_notifier_set_handler(&vq->hdev_call, NULL);
> >>>        event_notifier_cleanup(&vq->hdev_call);
> >>> +    g_free(vq->ring_id_maps);
> >>>        qemu_vfree(vq->vring.desc);
> >>>        qemu_vfree(vq->vring.used);
> >>>        g_free(vq);
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index 53e14bafa0..0e5c00ed7e 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> >>>     * Note that this function does not rewind kick file descriptor if cannot set
> >>>     * call one.
> >>>     */
> >>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> >>> -                                VhostShadowVirtqueue *svq,
> >>> -                                unsigned idx)
> >>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> >>> +                                  VhostShadowVirtqueue *svq,
> >>> +                                  unsigned idx)
> >>>    {
> >>>        struct vhost_vring_file file = {
> >>>            .index = dev->vq_index + idx,
> >>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> >>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> >>>        if (unlikely(r != 0)) {
> >>>            error_report("Can't set device kick fd (%d)", -r);
> >>> -        return false;
> >>> +        return r;
> >>>        }
> >>>
> >>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
> >>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> >>>            error_report("Can't set device call fd (%d)", -r);
> >>>        }
> >>>
> >>> +    return r;
> >>> +}
> >>> +
> >>> +/**
> >>> + * Unmap SVQ area in the device
> >>> + */
> >>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> >>> +                                      hwaddr size)
> >>> +{
> >>> +    int r;
> >>> +
> >>> +    size = ROUND_UP(size, qemu_real_host_page_size);
> >>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> >>> +    return r == 0;
> >>> +}
> >>> +
> >>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> >>> +                                       const VhostShadowVirtqueue *svq)
> >>> +{
> >>> +    struct vhost_vdpa *v = dev->opaque;
> >>> +    struct vhost_vring_addr svq_addr;
> >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> >>> +    bool ok;
> >>> +
> >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> >>> +
> >>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> >>> +    if (unlikely(!ok)) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> >>> +}
> >>> +
> >>> +/**
> >>> + * Map shadow virtqueue rings in device
> >>> + *
> >>> + * @dev   The vhost device
> >>> + * @svq   The shadow virtqueue
> >>> + */
> >>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> >>> +                                     const VhostShadowVirtqueue *svq)
> >>> +{
> >>> +    struct vhost_vdpa *v = dev->opaque;
> >>> +    struct vhost_vring_addr svq_addr;
> >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> >>> +    int r;
> >>> +
> >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> >>> +
> >>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> >>> +                           (void *)svq_addr.desc_user_addr, true);
> >>> +    if (unlikely(r != 0)) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> >>> +                           (void *)svq_addr.used_user_addr, false);
> >>
> >> Do we need unmap the driver area if we fail here?
> >>
> > Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> > think we need to unmap as you say.
> >
> > Thanks!
> >
> > [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
> >
> >> Thanks
> >>
> >>
> >>> +    return r == 0;
> >>> +}
> >>> +
> >>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> >>> +                                VhostShadowVirtqueue *svq,
> >>> +                                unsigned idx)
> >>> +{
> >>> +    uint16_t vq_index = dev->vq_index + idx;
> >>> +    struct vhost_vring_state s = {
> >>> +        .index = vq_index,
> >>> +    };
> >>> +    int r;
> >>> +    bool ok;
> >>> +
> >>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> >>> +    if (unlikely(r)) {
> >>> +        error_report("Can't set vring base (%d)", r);
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    s.num = vhost_svq_get_num(svq);
> >>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> >>> +    if (unlikely(r)) {
> >>> +        error_report("Can't set vring num (%d)", r);
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> >>> +    if (unlikely(!ok)) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
> >>>        return r == 0;
> >>>    }
> >>>
> >>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>>        if (started) {
> >>>            vhost_vdpa_host_notifiers_init(dev);
> >>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> >>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
> >>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> >>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> >>>                if (unlikely(!ok)) {
> >>>                    return -1;
> >>>                }
> >>> +            vhost_svq_start(svq, dev->vdev, vq);
> >>>            }
> >>>            vhost_vdpa_set_vring_ready(dev);
> >>>        } else {
> >>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> >>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> >>> +                                                          i);
> >>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> >>> +            if (unlikely(!ok)) {
> >>> +                return -1;
> >>> +            }
> >>> +        }
> >>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>>        }
> >>>
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-22 19:01         ` Eugenio Perez Martin
@ 2022-02-23  2:03             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-23  2:03 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Wed, Feb 23, 2022 at 3:01 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 8, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> Initial version of shadow virtqueue that actually forward buffers. There
> > >>> is no iommu support at the moment, and that will be addressed in future
> > >>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> > >>> this means that SVQ is not usable at this point of the series on any
> > >>> device.
> > >>>
> > >>> For simplicity it only supports modern devices, that expects vring
> > >>> in little endian, with split ring and no event idx or indirect
> > >>> descriptors. Support for them will not be added in this series.
> > >>>
> > >>> It reuses the VirtQueue code for the device part. The driver part is
> > >>> based on Linux's virtio_ring driver, but with stripped functionality
> > >>> and optimizations so it's easier to review.
> > >>>
> > >>> However, forwarding buffers have some particular pieces: One of the most
> > >>> unexpected ones is that a guest's buffer can expand through more than
> > >>> one descriptor in SVQ. While this is handled gracefully by qemu's
> > >>> emulated virtio devices, it may cause unexpected SVQ queue full. This
> > >>> patch also solves it by checking for this condition at both guest's
> > >>> kicks and device's calls. The code may be more elegant in the future if
> > >>> SVQ code runs in its own iocontext.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
> > >>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
> > >>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
> > >>>    3 files changed, 462 insertions(+), 16 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> index 39aef5ffdf..19c934af49 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
> > >>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> > >>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> +                     VirtQueue *vq);
> > >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>
> > >>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> index 7c168075d7..a1a404f68f 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> @@ -9,6 +9,8 @@
> > >>>
> > >>>    #include "qemu/osdep.h"
> > >>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
> > >>> +#include "hw/virtio/vhost.h"
> > >>> +#include "hw/virtio/virtio-access.h"
> > >>>    #include "standard-headers/linux/vhost_types.h"
> > >>>
> > >>>    #include "qemu/error-report.h"
> > >>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
> > >>>
> > >>>        /* Guest's call notifier, where SVQ calls guest. */
> > >>>        EventNotifier svq_call;
> > >>> +
> > >>> +    /* Virtio queue shadowing */
> > >>> +    VirtQueue *vq;
> > >>> +
> > >>> +    /* Virtio device */
> > >>> +    VirtIODevice *vdev;
> > >>> +
> > >>> +    /* Map for returning guest's descriptors */
> > >>> +    VirtQueueElement **ring_id_maps;
> > >>> +
> > >>> +    /* Next VirtQueue element that guest made available */
> > >>> +    VirtQueueElement *next_guest_avail_elem;
> > >>> +
> > >>> +    /* Next head to expose to device */
> > >>> +    uint16_t avail_idx_shadow;
> > >>> +
> > >>> +    /* Next free descriptor */
> > >>> +    uint16_t free_head;
> > >>> +
> > >>> +    /* Last seen used idx */
> > >>> +    uint16_t shadow_used_idx;
> > >>> +
> > >>> +    /* Next head to consume from device */
> > >>> +    uint16_t last_used_idx;
> > >>> +
> > >>> +    /* Cache for the exposed notification flag */
> > >>> +    bool notification;
> > >>>    } VhostShadowVirtqueue;
> > >>>
> > >>>    #define INVALID_SVQ_KICK_FD -1
> > >>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
> > >>>        return true;
> > >>>    }
> > >>>
> > >>> -/* Forward guest notifications */
> > >>> -static void vhost_handle_guest_kick(EventNotifier *n)
> > >>> +/**
> > >>> + * Number of descriptors that SVQ can make available from the guest.
> > >>> + *
> > >>> + * @svq   The svq
> > >>> + */
> > >>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> > >>>    {
> > >>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> -                                             svq_kick);
> > >>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > >>> +{
> > >>> +    uint16_t notification_flag;
> > >>>
> > >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > >>> +    if (svq->notification == enable) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> > >>> +
> > >>> +    svq->notification = enable;
> > >>> +    if (enable) {
> > >>> +        svq->vring.avail->flags &= ~notification_flag;
> > >>> +    } else {
> > >>> +        svq->vring.avail->flags |= notification_flag;
> > >>> +    }
> > >>> +}
> > >>> +
> > >>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > >>> +                                    const struct iovec *iovec,
> > >>> +                                    size_t num, bool more_descs, bool write)
> > >>> +{
> > >>> +    uint16_t i = svq->free_head, last = svq->free_head;
> > >>> +    unsigned n;
> > >>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > >>> +    vring_desc_t *descs = svq->vring.desc;
> > >>> +
> > >>> +    if (num == 0) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    for (n = 0; n < num; n++) {
> > >>> +        if (more_descs || (n + 1 < num)) {
> > >>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > >>> +        } else {
> > >>> +            descs[i].flags = flags;
> > >>> +        }
> > >>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > >>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > >>> +
> > >>> +        last = i;
> > >>> +        i = cpu_to_le16(descs[i].next);
> > >>> +    }
> > >>> +
> > >>> +    svq->free_head = le16_to_cpu(descs[last].next);
> > >>> +}
> > >>> +
> > >>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > >>> +                                    VirtQueueElement *elem)
> > >>> +{
> > >>> +    int head;
> > >>> +    unsigned avail_idx;
> > >>> +    vring_avail_t *avail = svq->vring.avail;
> > >>> +
> > >>> +    head = svq->free_head;
> > >>> +
> > >>> +    /* We need some descriptors here */
> > >>> +    assert(elem->out_num || elem->in_num);
> > >>
> > >> Looks like this could be triggered by guest, we need fail instead assert
> > >> here.
> > >>
> > > My understanding was that virtqueue_pop already sanitized that case,
> > > but I'm not able to find where now. I will recheck and, in case it's
> > > not, I will move to a failure.
> > >
> > >>> +
> > >>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > >>> +                            elem->in_num > 0, false);
> > >>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > >>> +
> > >>> +    /*
> > >>> +     * Put entry in available array (but don't update avail->idx until they
> > >>> +     * do sync).
> > >>> +     */
> > >>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > >>> +    avail->ring[avail_idx] = cpu_to_le16(head);
> > >>> +    svq->avail_idx_shadow++;
> > >>> +
> > >>> +    /* Update avail index after the descriptor is wrote */
> > >>> +    smp_wmb();
> > >>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> > >>> +
> > >>> +    return head;
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > >>> +{
> > >>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > >>> +
> > >>> +    svq->ring_id_maps[qemu_head] = elem;
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    /* We need to expose available array entries before checking used flags */
> > >>> +    smp_mb();
> > >>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
> > >>>            return;
> > >>>        }
> > >>>
> > >>>        event_notifier_set(&svq->hdev_kick);
> > >>>    }
> > >>>
> > >>> -/* Forward vhost notifications */
> > >>> +/**
> > >>> + * Forward available buffers.
> > >>> + *
> > >>> + * @svq Shadow VirtQueue
> > >>> + *
> > >>> + * Note that this function does not guarantee that all guest's available
> > >>> + * buffers are available to the device in SVQ avail ring. The guest may have
> > >>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> > >>> + * vaddr.
> > >>> + *
> > >>> + * If that happens, guest's kick notifications will be disabled until device
> > >>> + * makes some buffers used.
> > >>> + */
> > >>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    /* Clear event notifier */
> > >>> +    event_notifier_test_and_clear(&svq->svq_kick);
> > >>> +
> > >>> +    /* Make available as many buffers as possible */
> > >>> +    do {
> > >>> +        if (virtio_queue_get_notification(svq->vq)) {
> > >>> +            virtio_queue_set_notification(svq->vq, false);
> > >>
> > >> This looks like an optimization the should belong to
> > >> virtio_queue_set_notification() itself.
> > >>
> > > Sure we can move.
> > >
> > >>> +        }
> > >>> +
> > >>> +        while (true) {
> > >>> +            VirtQueueElement *elem;
> > >>> +
> > >>> +            if (svq->next_guest_avail_elem) {
> > >>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>> +            } else {
> > >>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > >>> +            }
> > >>> +
> > >>> +            if (!elem) {
> > >>> +                break;
> > >>> +            }
> > >>> +
> > >>> +            if (elem->out_num + elem->in_num >
> > >>> +                vhost_svq_available_slots(svq)) {
> > >>> +                /*
> > >>> +                 * This condition is possible since a contiguous buffer in GPA
> > >>> +                 * does not imply a contiguous buffer in qemu's VA
> > >>> +                 * scatter-gather segments. If that happen, the buffer exposed
> > >>> +                 * to the device needs to be a chain of descriptors at this
> > >>> +                 * moment.
> > >>> +                 *
> > >>> +                 * SVQ cannot hold more available buffers if we are here:
> > >>> +                 * queue the current guest descriptor and ignore further kicks
> > >>> +                 * until some elements are used.
> > >>> +                 */
> > >>> +                svq->next_guest_avail_elem = elem;
> > >>> +                return;
> > >>> +            }
> > >>> +
> > >>> +            vhost_svq_add(svq, elem);
> > >>> +            vhost_svq_kick(svq);
> > >>> +        }
> > >>> +
> > >>> +        virtio_queue_set_notification(svq->vq, true);
> > >>> +    } while (!virtio_queue_empty(svq->vq));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Handle guest's kick.
> > >>> + *
> > >>> + * @n guest kick event notifier, the one that guest set to notify svq.
> > >>> + */
> > >>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> > >>> +{
> > >>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> +                                             svq_kick);
> > >>> +    vhost_handle_guest_kick(svq);
> > >>> +}
> > >>> +
> > >>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> > >>> +        return true;
> > >>> +    }
> > >>> +
> > >>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> > >>> +
> > >>> +    return svq->last_used_idx != svq->shadow_used_idx;
> > >>> +}
> > >>> +
> > >>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    vring_desc_t *descs = svq->vring.desc;
> > >>> +    const vring_used_t *used = svq->vring.used;
> > >>> +    vring_used_elem_t used_elem;
> > >>> +    uint16_t last_used;
> > >>> +
> > >>> +    if (!vhost_svq_more_used(svq)) {
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    /* Only get used array entries after they have been exposed by dev */
> > >>> +    smp_rmb();
> > >>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> > >>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> > >>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> > >>> +
> > >>> +    svq->last_used_idx++;
> > >>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> > >>> +        error_report("Device %s says index %u is used", svq->vdev->name,
> > >>> +                     used_elem.id);
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> > >>> +        error_report(
> > >>> +            "Device %s says index %u is used, but it was not available",
> > >>> +            svq->vdev->name, used_elem.id);
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    descs[used_elem.id].next = svq->free_head;
> > >>> +    svq->free_head = used_elem.id;
> > >>> +
> > >>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > >>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >>> +                            bool check_for_avail_queue)
> > >>> +{
> > >>> +    VirtQueue *vq = svq->vq;
> > >>> +
> > >>> +    /* Make as many buffers as possible used. */
> > >>> +    do {
> > >>> +        unsigned i = 0;
> > >>> +
> > >>> +        vhost_svq_set_notification(svq, false);
> > >>> +        while (true) {
> > >>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > >>> +            if (!elem) {
> > >>> +                break;
> > >>> +            }
> > >>> +
> > >>> +            if (unlikely(i >= svq->vring.num)) {
> > >>> +                virtio_error(svq->vdev,
> > >>> +                         "More than %u used buffers obtained in a %u size SVQ",
> > >>> +                         i, svq->vring.num);
> > >>> +                virtqueue_fill(vq, elem, elem->len, i);
> > >>> +                virtqueue_flush(vq, i);
> > >>
> > >> Let's simply use virtqueue_push() here?
> > >>
> > > virtqueue_push support to fill and flush only one element, instead of
> > > batch. I'm fine with either but I think the less updates to the used
> > > idx, the better.
> >
> >
> > Fine.
> >
> >
> > >
> > >>> +                i = 0;
> > >>
> > >> Do we need to bail out here?
> > >>
> > > Yes I guess we can simply return.
> > >
> > >>> +            }
> > >>> +            virtqueue_fill(vq, elem, elem->len, i++);
> > >>> +        }
> > >>> +
> > >>> +        virtqueue_flush(vq, i);
> > >>> +        event_notifier_set(&svq->svq_call);
> > >>> +
> > >>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> > >>> +            /*
> > >>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> > >>> +             * good moment to make more descriptors available if possible
> > >>> +             */
> > >>> +            vhost_handle_guest_kick(svq);
> > >>
> > >> Is there better to have a similar check as vhost_handle_guest_kick() did?
> > >>
> > >>               if (elem->out_num + elem->in_num >
> > >>                   vhost_svq_available_slots(svq)) {
> > >>
> > > It will be duplicated when we call vhost_handle_guest_kick, won't it?
> >
> >
> > Right, I mis-read the code.
> >
> >
> > >
> > >>> +        }
> > >>> +
> > >>> +        vhost_svq_set_notification(svq, true);
> > >>
> > >> A mb() is needed here? Otherwise we may lost a call here (where
> > >> vhost_svq_more_used() is run before vhost_svq_set_notification()).
> > >>
> > > I'm confused here then, I thought you said this is just a hint so
> > > there was no need? [1]. I think the memory barrier is needed too.
> >
> >
> > Yes, it's a hint but:
> >
> > 1) When we disable the notification, consider the notification disable
> > is just a hint, device can still raise an interrupt, so the ordering is
> > meaningless and a memory barrier is not necessary (the
> > vhost_svq_set_notification(svq, false))
> >
> > 2) When we enable the notification, though it's a hint, the device can
> > choose to implement it by enabling the interrupt, in this case, the
> > notification enable should be done before checking the used. Otherwise,
> > the checking of more used might be done before enable the notification:
> >
> > 1) driver check more used
> > 2) device add more used but no notification
> > 3) driver enable the notification then we lost a notification here
> >
>
> That was my understanding too. So the right way is to only add the
> memory barrier in case 2), when setting the flag, right?

Yes.

>
> >
> > >>> +    } while (vhost_svq_more_used(svq));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Forward used buffers.
> > >>> + *
> > >>> + * @n hdev call event notifier, the one that device set to notify svq.
> > >>> + *
> > >>> + * Note that we are not making any buffers available in the loop, there is no
> > >>> + * way that it runs more than virtqueue size times.
> > >>> + */
> > >>>    static void vhost_svq_handle_call(EventNotifier *n)
> > >>>    {
> > >>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>>                                                 hdev_call);
> > >>>
> > >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > >>> -        return;
> > >>> -    }
> > >>> +    /* Clear event notifier */
> > >>> +    event_notifier_test_and_clear(n);
> > >>
> > >> Any reason that we remove the above check?
> > >>
> > > This comes from the previous versions, where this made sure we missed
> > > no used buffers in the process of switching to SVQ mode.
> >
> >
> > I'm not sure I get here. Even if for the switching, it should be more
> > safe the handle the flush unconditionally?
> >
>
> Yes, I also think it's better to forward and kick/call unconditionally.
>
> Thanks!

Ok.

Thanks

>
> > Thanks
> >
> >
> > >
> > > If we enable SVQ from the beginning I think we can rely on getting all
> > > the device's used buffer notifications, so let me think a little bit
> > > and I can move to check the eventfd.
> > >
> > >>> -    event_notifier_set(&svq->svq_call);
> > >>> +    vhost_svq_flush(svq, true);
> > >>>    }
> > >>>
> > >>>    /**
> > >>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>         * need to explicitely check for them.
> > >>>         */
> > >>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > >>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> > >>> +    event_notifier_set_handler(&svq->svq_kick,
> > >>> +                               vhost_handle_guest_kick_notifier);
> > >>>
> > >>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
> > >>>            event_notifier_set(&svq->hdev_kick);
> > >>>        }
> > >>>    }
> > >>>
> > >>> +/**
> > >>> + * Start shadow virtqueue operation.
> > >>> + *
> > >>> + * @svq Shadow Virtqueue
> > >>> + * @vdev        VirtIO device
> > >>> + * @vq          Virtqueue to shadow
> > >>> + */
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> +                     VirtQueue *vq)
> > >>> +{
> > >>> +    svq->next_guest_avail_elem = NULL;
> > >>> +    svq->avail_idx_shadow = 0;
> > >>> +    svq->shadow_used_idx = 0;
> > >>> +    svq->last_used_idx = 0;
> > >>> +    svq->vdev = vdev;
> > >>> +    svq->vq = vq;
> > >>> +
> > >>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> > >>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> > >>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> > >>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > >>> +    }
> > >>> +}
> > >>> +
> > >>>    /**
> > >>>     * Stop shadow virtqueue operation.
> > >>>     * @svq Shadow Virtqueue
> > >>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>    {
> > >>>        event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>> +
> > >>> +    if (!svq->vq) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    /* Send all pending used descriptors to guest */
> > >>> +    vhost_svq_flush(svq, false);
> > >>> +
> > >>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>> +        if (elem) {
> > >>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>> +        }
> > >>> +    }
> > >>> +
> > >>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>> +    if (next_avail_elem) {
> > >>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>> +                                 next_avail_elem->len);
> > >>> +    }
> > >>>    }
> > >>>
> > >>>    /**
> > >>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > >>>        memset(svq->vring.desc, 0, driver_size);
> > >>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > >>>        memset(svq->vring.used, 0, device_size);
> > >>> -
> > >>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
> > >>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> > >>>        return g_steal_pointer(&svq);
> > >>>
> > >>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> > >>>        event_notifier_cleanup(&vq->hdev_kick);
> > >>>        event_notifier_set_handler(&vq->hdev_call, NULL);
> > >>>        event_notifier_cleanup(&vq->hdev_call);
> > >>> +    g_free(vq->ring_id_maps);
> > >>>        qemu_vfree(vq->vring.desc);
> > >>>        qemu_vfree(vq->vring.used);
> > >>>        g_free(vq);
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index 53e14bafa0..0e5c00ed7e 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> > >>>     * Note that this function does not rewind kick file descriptor if cannot set
> > >>>     * call one.
> > >>>     */
> > >>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>> -                                VhostShadowVirtqueue *svq,
> > >>> -                                unsigned idx)
> > >>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> > >>> +                                  VhostShadowVirtqueue *svq,
> > >>> +                                  unsigned idx)
> > >>>    {
> > >>>        struct vhost_vring_file file = {
> > >>>            .index = dev->vq_index + idx,
> > >>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> > >>>        if (unlikely(r != 0)) {
> > >>>            error_report("Can't set device kick fd (%d)", -r);
> > >>> -        return false;
> > >>> +        return r;
> > >>>        }
> > >>>
> > >>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
> > >>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>>            error_report("Can't set device call fd (%d)", -r);
> > >>>        }
> > >>>
> > >>> +    return r;
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Unmap SVQ area in the device
> > >>> + */
> > >>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> > >>> +                                      hwaddr size)
> > >>> +{
> > >>> +    int r;
> > >>> +
> > >>> +    size = ROUND_UP(size, qemu_real_host_page_size);
> > >>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> > >>> +    return r == 0;
> > >>> +}
> > >>> +
> > >>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> > >>> +                                       const VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = dev->opaque;
> > >>> +    struct vhost_vring_addr svq_addr;
> > >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> > >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > >>> +    bool ok;
> > >>> +
> > >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > >>> +
> > >>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> > >>> +    if (unlikely(!ok)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Map shadow virtqueue rings in device
> > >>> + *
> > >>> + * @dev   The vhost device
> > >>> + * @svq   The shadow virtqueue
> > >>> + */
> > >>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> > >>> +                                     const VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = dev->opaque;
> > >>> +    struct vhost_vring_addr svq_addr;
> > >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> > >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > >>> +    int r;
> > >>> +
> > >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > >>> +
> > >>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> > >>> +                           (void *)svq_addr.desc_user_addr, true);
> > >>> +    if (unlikely(r != 0)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> > >>> +                           (void *)svq_addr.used_user_addr, false);
> > >>
> > >> Do we need unmap the driver area if we fail here?
> > >>
> > > Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> > > think we need to unmap as you say.
> > >
> > > Thanks!
> > >
> > > [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +    return r == 0;
> > >>> +}
> > >>> +
> > >>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>> +                                VhostShadowVirtqueue *svq,
> > >>> +                                unsigned idx)
> > >>> +{
> > >>> +    uint16_t vq_index = dev->vq_index + idx;
> > >>> +    struct vhost_vring_state s = {
> > >>> +        .index = vq_index,
> > >>> +    };
> > >>> +    int r;
> > >>> +    bool ok;
> > >>> +
> > >>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> > >>> +    if (unlikely(r)) {
> > >>> +        error_report("Can't set vring base (%d)", r);
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    s.num = vhost_svq_get_num(svq);
> > >>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> > >>> +    if (unlikely(r)) {
> > >>> +        error_report("Can't set vring num (%d)", r);
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> > >>> +    if (unlikely(!ok)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
> > >>>        return r == 0;
> > >>>    }
> > >>>
> > >>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >>>        if (started) {
> > >>>            vhost_vdpa_host_notifiers_init(dev);
> > >>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > >>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
> > >>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> > >>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> > >>>                if (unlikely(!ok)) {
> > >>>                    return -1;
> > >>>                }
> > >>> +            vhost_svq_start(svq, dev->vdev, vq);
> > >>>            }
> > >>>            vhost_vdpa_set_vring_ready(dev);
> > >>>        } else {
> > >>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > >>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> > >>> +                                                          i);
> > >>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> > >>> +            if (unlikely(!ok)) {
> > >>> +                return -1;
> > >>> +            }
> > >>> +        }
> > >>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> > >>>        }
> > >>>
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-02-23  2:03             ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-23  2:03 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Wed, Feb 23, 2022 at 3:01 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 8, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> Initial version of shadow virtqueue that actually forward buffers. There
> > >>> is no iommu support at the moment, and that will be addressed in future
> > >>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> > >>> this means that SVQ is not usable at this point of the series on any
> > >>> device.
> > >>>
> > >>> For simplicity it only supports modern devices, that expects vring
> > >>> in little endian, with split ring and no event idx or indirect
> > >>> descriptors. Support for them will not be added in this series.
> > >>>
> > >>> It reuses the VirtQueue code for the device part. The driver part is
> > >>> based on Linux's virtio_ring driver, but with stripped functionality
> > >>> and optimizations so it's easier to review.
> > >>>
> > >>> However, forwarding buffers have some particular pieces: One of the most
> > >>> unexpected ones is that a guest's buffer can expand through more than
> > >>> one descriptor in SVQ. While this is handled gracefully by qemu's
> > >>> emulated virtio devices, it may cause unexpected SVQ queue full. This
> > >>> patch also solves it by checking for this condition at both guest's
> > >>> kicks and device's calls. The code may be more elegant in the future if
> > >>> SVQ code runs in its own iocontext.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
> > >>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
> > >>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
> > >>>    3 files changed, 462 insertions(+), 16 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> index 39aef5ffdf..19c934af49 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
> > >>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> > >>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> +                     VirtQueue *vq);
> > >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>
> > >>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> index 7c168075d7..a1a404f68f 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> @@ -9,6 +9,8 @@
> > >>>
> > >>>    #include "qemu/osdep.h"
> > >>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
> > >>> +#include "hw/virtio/vhost.h"
> > >>> +#include "hw/virtio/virtio-access.h"
> > >>>    #include "standard-headers/linux/vhost_types.h"
> > >>>
> > >>>    #include "qemu/error-report.h"
> > >>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
> > >>>
> > >>>        /* Guest's call notifier, where SVQ calls guest. */
> > >>>        EventNotifier svq_call;
> > >>> +
> > >>> +    /* Virtio queue shadowing */
> > >>> +    VirtQueue *vq;
> > >>> +
> > >>> +    /* Virtio device */
> > >>> +    VirtIODevice *vdev;
> > >>> +
> > >>> +    /* Map for returning guest's descriptors */
> > >>> +    VirtQueueElement **ring_id_maps;
> > >>> +
> > >>> +    /* Next VirtQueue element that guest made available */
> > >>> +    VirtQueueElement *next_guest_avail_elem;
> > >>> +
> > >>> +    /* Next head to expose to device */
> > >>> +    uint16_t avail_idx_shadow;
> > >>> +
> > >>> +    /* Next free descriptor */
> > >>> +    uint16_t free_head;
> > >>> +
> > >>> +    /* Last seen used idx */
> > >>> +    uint16_t shadow_used_idx;
> > >>> +
> > >>> +    /* Next head to consume from device */
> > >>> +    uint16_t last_used_idx;
> > >>> +
> > >>> +    /* Cache for the exposed notification flag */
> > >>> +    bool notification;
> > >>>    } VhostShadowVirtqueue;
> > >>>
> > >>>    #define INVALID_SVQ_KICK_FD -1
> > >>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
> > >>>        return true;
> > >>>    }
> > >>>
> > >>> -/* Forward guest notifications */
> > >>> -static void vhost_handle_guest_kick(EventNotifier *n)
> > >>> +/**
> > >>> + * Number of descriptors that SVQ can make available from the guest.
> > >>> + *
> > >>> + * @svq   The svq
> > >>> + */
> > >>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> > >>>    {
> > >>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> -                                             svq_kick);
> > >>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > >>> +{
> > >>> +    uint16_t notification_flag;
> > >>>
> > >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > >>> +    if (svq->notification == enable) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> > >>> +
> > >>> +    svq->notification = enable;
> > >>> +    if (enable) {
> > >>> +        svq->vring.avail->flags &= ~notification_flag;
> > >>> +    } else {
> > >>> +        svq->vring.avail->flags |= notification_flag;
> > >>> +    }
> > >>> +}
> > >>> +
> > >>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > >>> +                                    const struct iovec *iovec,
> > >>> +                                    size_t num, bool more_descs, bool write)
> > >>> +{
> > >>> +    uint16_t i = svq->free_head, last = svq->free_head;
> > >>> +    unsigned n;
> > >>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > >>> +    vring_desc_t *descs = svq->vring.desc;
> > >>> +
> > >>> +    if (num == 0) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    for (n = 0; n < num; n++) {
> > >>> +        if (more_descs || (n + 1 < num)) {
> > >>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > >>> +        } else {
> > >>> +            descs[i].flags = flags;
> > >>> +        }
> > >>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > >>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > >>> +
> > >>> +        last = i;
> > >>> +        i = cpu_to_le16(descs[i].next);
> > >>> +    }
> > >>> +
> > >>> +    svq->free_head = le16_to_cpu(descs[last].next);
> > >>> +}
> > >>> +
> > >>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > >>> +                                    VirtQueueElement *elem)
> > >>> +{
> > >>> +    int head;
> > >>> +    unsigned avail_idx;
> > >>> +    vring_avail_t *avail = svq->vring.avail;
> > >>> +
> > >>> +    head = svq->free_head;
> > >>> +
> > >>> +    /* We need some descriptors here */
> > >>> +    assert(elem->out_num || elem->in_num);
> > >>
> > >> Looks like this could be triggered by guest, we need fail instead assert
> > >> here.
> > >>
> > > My understanding was that virtqueue_pop already sanitized that case,
> > > but I'm not able to find where now. I will recheck and, in case it's
> > > not, I will move to a failure.
> > >
> > >>> +
> > >>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > >>> +                            elem->in_num > 0, false);
> > >>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > >>> +
> > >>> +    /*
> > >>> +     * Put entry in available array (but don't update avail->idx until they
> > >>> +     * do sync).
> > >>> +     */
> > >>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > >>> +    avail->ring[avail_idx] = cpu_to_le16(head);
> > >>> +    svq->avail_idx_shadow++;
> > >>> +
> > >>> +    /* Update avail index after the descriptor is wrote */
> > >>> +    smp_wmb();
> > >>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> > >>> +
> > >>> +    return head;
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > >>> +{
> > >>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > >>> +
> > >>> +    svq->ring_id_maps[qemu_head] = elem;
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    /* We need to expose available array entries before checking used flags */
> > >>> +    smp_mb();
> > >>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
> > >>>            return;
> > >>>        }
> > >>>
> > >>>        event_notifier_set(&svq->hdev_kick);
> > >>>    }
> > >>>
> > >>> -/* Forward vhost notifications */
> > >>> +/**
> > >>> + * Forward available buffers.
> > >>> + *
> > >>> + * @svq Shadow VirtQueue
> > >>> + *
> > >>> + * Note that this function does not guarantee that all guest's available
> > >>> + * buffers are available to the device in SVQ avail ring. The guest may have
> > >>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> > >>> + * vaddr.
> > >>> + *
> > >>> + * If that happens, guest's kick notifications will be disabled until device
> > >>> + * makes some buffers used.
> > >>> + */
> > >>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    /* Clear event notifier */
> > >>> +    event_notifier_test_and_clear(&svq->svq_kick);
> > >>> +
> > >>> +    /* Make available as many buffers as possible */
> > >>> +    do {
> > >>> +        if (virtio_queue_get_notification(svq->vq)) {
> > >>> +            virtio_queue_set_notification(svq->vq, false);
> > >>
> > >> This looks like an optimization the should belong to
> > >> virtio_queue_set_notification() itself.
> > >>
> > > Sure we can move.
> > >
> > >>> +        }
> > >>> +
> > >>> +        while (true) {
> > >>> +            VirtQueueElement *elem;
> > >>> +
> > >>> +            if (svq->next_guest_avail_elem) {
> > >>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>> +            } else {
> > >>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > >>> +            }
> > >>> +
> > >>> +            if (!elem) {
> > >>> +                break;
> > >>> +            }
> > >>> +
> > >>> +            if (elem->out_num + elem->in_num >
> > >>> +                vhost_svq_available_slots(svq)) {
> > >>> +                /*
> > >>> +                 * This condition is possible since a contiguous buffer in GPA
> > >>> +                 * does not imply a contiguous buffer in qemu's VA
> > >>> +                 * scatter-gather segments. If that happen, the buffer exposed
> > >>> +                 * to the device needs to be a chain of descriptors at this
> > >>> +                 * moment.
> > >>> +                 *
> > >>> +                 * SVQ cannot hold more available buffers if we are here:
> > >>> +                 * queue the current guest descriptor and ignore further kicks
> > >>> +                 * until some elements are used.
> > >>> +                 */
> > >>> +                svq->next_guest_avail_elem = elem;
> > >>> +                return;
> > >>> +            }
> > >>> +
> > >>> +            vhost_svq_add(svq, elem);
> > >>> +            vhost_svq_kick(svq);
> > >>> +        }
> > >>> +
> > >>> +        virtio_queue_set_notification(svq->vq, true);
> > >>> +    } while (!virtio_queue_empty(svq->vq));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Handle guest's kick.
> > >>> + *
> > >>> + * @n guest kick event notifier, the one that guest set to notify svq.
> > >>> + */
> > >>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> > >>> +{
> > >>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> +                                             svq_kick);
> > >>> +    vhost_handle_guest_kick(svq);
> > >>> +}
> > >>> +
> > >>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> > >>> +        return true;
> > >>> +    }
> > >>> +
> > >>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> > >>> +
> > >>> +    return svq->last_used_idx != svq->shadow_used_idx;
> > >>> +}
> > >>> +
> > >>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    vring_desc_t *descs = svq->vring.desc;
> > >>> +    const vring_used_t *used = svq->vring.used;
> > >>> +    vring_used_elem_t used_elem;
> > >>> +    uint16_t last_used;
> > >>> +
> > >>> +    if (!vhost_svq_more_used(svq)) {
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    /* Only get used array entries after they have been exposed by dev */
> > >>> +    smp_rmb();
> > >>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> > >>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> > >>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> > >>> +
> > >>> +    svq->last_used_idx++;
> > >>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> > >>> +        error_report("Device %s says index %u is used", svq->vdev->name,
> > >>> +                     used_elem.id);
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> > >>> +        error_report(
> > >>> +            "Device %s says index %u is used, but it was not available",
> > >>> +            svq->vdev->name, used_elem.id);
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    descs[used_elem.id].next = svq->free_head;
> > >>> +    svq->free_head = used_elem.id;
> > >>> +
> > >>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > >>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >>> +                            bool check_for_avail_queue)
> > >>> +{
> > >>> +    VirtQueue *vq = svq->vq;
> > >>> +
> > >>> +    /* Make as many buffers as possible used. */
> > >>> +    do {
> > >>> +        unsigned i = 0;
> > >>> +
> > >>> +        vhost_svq_set_notification(svq, false);
> > >>> +        while (true) {
> > >>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > >>> +            if (!elem) {
> > >>> +                break;
> > >>> +            }
> > >>> +
> > >>> +            if (unlikely(i >= svq->vring.num)) {
> > >>> +                virtio_error(svq->vdev,
> > >>> +                         "More than %u used buffers obtained in a %u size SVQ",
> > >>> +                         i, svq->vring.num);
> > >>> +                virtqueue_fill(vq, elem, elem->len, i);
> > >>> +                virtqueue_flush(vq, i);
> > >>
> > >> Let's simply use virtqueue_push() here?
> > >>
> > > virtqueue_push support to fill and flush only one element, instead of
> > > batch. I'm fine with either but I think the less updates to the used
> > > idx, the better.
> >
> >
> > Fine.
> >
> >
> > >
> > >>> +                i = 0;
> > >>
> > >> Do we need to bail out here?
> > >>
> > > Yes I guess we can simply return.
> > >
> > >>> +            }
> > >>> +            virtqueue_fill(vq, elem, elem->len, i++);
> > >>> +        }
> > >>> +
> > >>> +        virtqueue_flush(vq, i);
> > >>> +        event_notifier_set(&svq->svq_call);
> > >>> +
> > >>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> > >>> +            /*
> > >>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> > >>> +             * good moment to make more descriptors available if possible
> > >>> +             */
> > >>> +            vhost_handle_guest_kick(svq);
> > >>
> > >> Is there better to have a similar check as vhost_handle_guest_kick() did?
> > >>
> > >>               if (elem->out_num + elem->in_num >
> > >>                   vhost_svq_available_slots(svq)) {
> > >>
> > > It will be duplicated when we call vhost_handle_guest_kick, won't it?
> >
> >
> > Right, I mis-read the code.
> >
> >
> > >
> > >>> +        }
> > >>> +
> > >>> +        vhost_svq_set_notification(svq, true);
> > >>
> > >> A mb() is needed here? Otherwise we may lost a call here (where
> > >> vhost_svq_more_used() is run before vhost_svq_set_notification()).
> > >>
> > > I'm confused here then, I thought you said this is just a hint so
> > > there was no need? [1]. I think the memory barrier is needed too.
> >
> >
> > Yes, it's a hint but:
> >
> > 1) When we disable the notification, consider the notification disable
> > is just a hint, device can still raise an interrupt, so the ordering is
> > meaningless and a memory barrier is not necessary (the
> > vhost_svq_set_notification(svq, false))
> >
> > 2) When we enable the notification, though it's a hint, the device can
> > choose to implement it by enabling the interrupt, in this case, the
> > notification enable should be done before checking the used. Otherwise,
> > the checking of more used might be done before enable the notification:
> >
> > 1) driver check more used
> > 2) device add more used but no notification
> > 3) driver enable the notification then we lost a notification here
> >
>
> That was my understanding too. So the right way is to only add the
> memory barrier in case 2), when setting the flag, right?

Yes.

>
> >
> > >>> +    } while (vhost_svq_more_used(svq));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Forward used buffers.
> > >>> + *
> > >>> + * @n hdev call event notifier, the one that device set to notify svq.
> > >>> + *
> > >>> + * Note that we are not making any buffers available in the loop, there is no
> > >>> + * way that it runs more than virtqueue size times.
> > >>> + */
> > >>>    static void vhost_svq_handle_call(EventNotifier *n)
> > >>>    {
> > >>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>>                                                 hdev_call);
> > >>>
> > >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > >>> -        return;
> > >>> -    }
> > >>> +    /* Clear event notifier */
> > >>> +    event_notifier_test_and_clear(n);
> > >>
> > >> Any reason that we remove the above check?
> > >>
> > > This comes from the previous versions, where this made sure we missed
> > > no used buffers in the process of switching to SVQ mode.
> >
> >
> > I'm not sure I get here. Even if for the switching, it should be more
> > safe the handle the flush unconditionally?
> >
>
> Yes, I also think it's better to forward and kick/call unconditionally.
>
> Thanks!

Ok.

Thanks

>
> > Thanks
> >
> >
> > >
> > > If we enable SVQ from the beginning I think we can rely on getting all
> > > the device's used buffer notifications, so let me think a little bit
> > > and I can move to check the eventfd.
> > >
> > >>> -    event_notifier_set(&svq->svq_call);
> > >>> +    vhost_svq_flush(svq, true);
> > >>>    }
> > >>>
> > >>>    /**
> > >>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>         * need to explicitely check for them.
> > >>>         */
> > >>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > >>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> > >>> +    event_notifier_set_handler(&svq->svq_kick,
> > >>> +                               vhost_handle_guest_kick_notifier);
> > >>>
> > >>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
> > >>>            event_notifier_set(&svq->hdev_kick);
> > >>>        }
> > >>>    }
> > >>>
> > >>> +/**
> > >>> + * Start shadow virtqueue operation.
> > >>> + *
> > >>> + * @svq Shadow Virtqueue
> > >>> + * @vdev        VirtIO device
> > >>> + * @vq          Virtqueue to shadow
> > >>> + */
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> +                     VirtQueue *vq)
> > >>> +{
> > >>> +    svq->next_guest_avail_elem = NULL;
> > >>> +    svq->avail_idx_shadow = 0;
> > >>> +    svq->shadow_used_idx = 0;
> > >>> +    svq->last_used_idx = 0;
> > >>> +    svq->vdev = vdev;
> > >>> +    svq->vq = vq;
> > >>> +
> > >>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> > >>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> > >>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> > >>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > >>> +    }
> > >>> +}
> > >>> +
> > >>>    /**
> > >>>     * Stop shadow virtqueue operation.
> > >>>     * @svq Shadow Virtqueue
> > >>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>    {
> > >>>        event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>> +
> > >>> +    if (!svq->vq) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    /* Send all pending used descriptors to guest */
> > >>> +    vhost_svq_flush(svq, false);
> > >>> +
> > >>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>> +        if (elem) {
> > >>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>> +        }
> > >>> +    }
> > >>> +
> > >>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>> +    if (next_avail_elem) {
> > >>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>> +                                 next_avail_elem->len);
> > >>> +    }
> > >>>    }
> > >>>
> > >>>    /**
> > >>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > >>>        memset(svq->vring.desc, 0, driver_size);
> > >>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > >>>        memset(svq->vring.used, 0, device_size);
> > >>> -
> > >>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
> > >>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> > >>>        return g_steal_pointer(&svq);
> > >>>
> > >>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> > >>>        event_notifier_cleanup(&vq->hdev_kick);
> > >>>        event_notifier_set_handler(&vq->hdev_call, NULL);
> > >>>        event_notifier_cleanup(&vq->hdev_call);
> > >>> +    g_free(vq->ring_id_maps);
> > >>>        qemu_vfree(vq->vring.desc);
> > >>>        qemu_vfree(vq->vring.used);
> > >>>        g_free(vq);
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index 53e14bafa0..0e5c00ed7e 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> > >>>     * Note that this function does not rewind kick file descriptor if cannot set
> > >>>     * call one.
> > >>>     */
> > >>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>> -                                VhostShadowVirtqueue *svq,
> > >>> -                                unsigned idx)
> > >>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> > >>> +                                  VhostShadowVirtqueue *svq,
> > >>> +                                  unsigned idx)
> > >>>    {
> > >>>        struct vhost_vring_file file = {
> > >>>            .index = dev->vq_index + idx,
> > >>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> > >>>        if (unlikely(r != 0)) {
> > >>>            error_report("Can't set device kick fd (%d)", -r);
> > >>> -        return false;
> > >>> +        return r;
> > >>>        }
> > >>>
> > >>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
> > >>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>>            error_report("Can't set device call fd (%d)", -r);
> > >>>        }
> > >>>
> > >>> +    return r;
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Unmap SVQ area in the device
> > >>> + */
> > >>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> > >>> +                                      hwaddr size)
> > >>> +{
> > >>> +    int r;
> > >>> +
> > >>> +    size = ROUND_UP(size, qemu_real_host_page_size);
> > >>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> > >>> +    return r == 0;
> > >>> +}
> > >>> +
> > >>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> > >>> +                                       const VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = dev->opaque;
> > >>> +    struct vhost_vring_addr svq_addr;
> > >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> > >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > >>> +    bool ok;
> > >>> +
> > >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > >>> +
> > >>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> > >>> +    if (unlikely(!ok)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Map shadow virtqueue rings in device
> > >>> + *
> > >>> + * @dev   The vhost device
> > >>> + * @svq   The shadow virtqueue
> > >>> + */
> > >>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> > >>> +                                     const VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = dev->opaque;
> > >>> +    struct vhost_vring_addr svq_addr;
> > >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> > >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > >>> +    int r;
> > >>> +
> > >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > >>> +
> > >>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> > >>> +                           (void *)svq_addr.desc_user_addr, true);
> > >>> +    if (unlikely(r != 0)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> > >>> +                           (void *)svq_addr.used_user_addr, false);
> > >>
> > >> Do we need unmap the driver area if we fail here?
> > >>
> > > Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> > > think we need to unmap as you say.
> > >
> > > Thanks!
> > >
> > > [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +    return r == 0;
> > >>> +}
> > >>> +
> > >>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>> +                                VhostShadowVirtqueue *svq,
> > >>> +                                unsigned idx)
> > >>> +{
> > >>> +    uint16_t vq_index = dev->vq_index + idx;
> > >>> +    struct vhost_vring_state s = {
> > >>> +        .index = vq_index,
> > >>> +    };
> > >>> +    int r;
> > >>> +    bool ok;
> > >>> +
> > >>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> > >>> +    if (unlikely(r)) {
> > >>> +        error_report("Can't set vring base (%d)", r);
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    s.num = vhost_svq_get_num(svq);
> > >>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> > >>> +    if (unlikely(r)) {
> > >>> +        error_report("Can't set vring num (%d)", r);
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> > >>> +    if (unlikely(!ok)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
> > >>>        return r == 0;
> > >>>    }
> > >>>
> > >>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >>>        if (started) {
> > >>>            vhost_vdpa_host_notifiers_init(dev);
> > >>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > >>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
> > >>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> > >>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> > >>>                if (unlikely(!ok)) {
> > >>>                    return -1;
> > >>>                }
> > >>> +            vhost_svq_start(svq, dev->vdev, vq);
> > >>>            }
> > >>>            vhost_vdpa_set_vring_ready(dev);
> > >>>        } else {
> > >>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > >>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> > >>> +                                                          i);
> > >>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> > >>> +            if (unlikely(!ok)) {
> > >>> +                return -1;
> > >>> +            }
> > >>> +        }
> > >>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> > >>>        }
> > >>>
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
  2022-02-22  8:55                 ` Eugenio Perez Martin
@ 2022-02-23  2:26                     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-23  2:26 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>>>      {
> > >>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>>>>>> +
> > >>>>>>> +    if (!svq->vq) {
> > >>>>>>> +        return;
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    /* Send all pending used descriptors to guest */
> > >>>>>>> +    vhost_svq_flush(svq, false);
> > >>>>>> Do we need to wait for all the pending descriptors to be completed here?
> > >>>>>>
> > >>>>> No, this function does not wait, it only completes the forwarding of
> > >>>>> the *used* descriptors.
> > >>>>>
> > >>>>> The best example is the net rx queue in my opinion. This call will
> > >>>>> check SVQ's vring used_idx and will forward the last used descriptors
> > >>>>> if any, but all available descriptors will remain as available for
> > >>>>> qemu's VQ code.
> > >>>>>
> > >>>>> To skip it would miss those last rx descriptors in migration.
> > >>>>>
> > >>>>> Thanks!
> > >>>> So it's probably to not the best place to ask. It's more about the
> > >>>> inflight descriptors so it should be TX instead of RX.
> > >>>>
> > >>>> I can imagine the migration last phase, we should stop the vhost-vDPA
> > >>>> before calling vhost_svq_stop(). Then we should be fine regardless of
> > >>>> inflight descriptors.
> > >>>>
> > >>> I think I'm still missing something here.
> > >>>
> > >>> To be on the same page. Regarding tx this could cause repeated tx
> > >>> frames (one at source and other at destination), but never a missed
> > >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> > >>> is not forwarding available buffers anymore". Would that work?
> > >>
> > >> Right, but this only work if
> > >>
> > >> 1) a flush to make sure TX DMA for inflight descriptors are all completed
> > >>
> > >> 2) just mark all inflight descriptor used
> > >>
> > > It currently trusts on the reverse: Buffers not marked as used (by the
> > > device) will be available in the destination, so expect
> > > retransmissions.
> >
> >
> > I may miss something but I think we do migrate last_avail_idx. So there
> > won't be a re-transmission, since we depend on qemu virtqueue code to
> > deal with vring base?
> >
>
> On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
> mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
> last_avail_idx == last_used_idx, and it's migrated after that if I'm
> not wrong.

Ok, I miss these details in the review. I suggest mentioning this in
the change log and add a comment in vhost_vdpa_get_vring_base().

>
> vhost kernel migrates last_avail_idx, but it makes rx buffers
> available on-demand, unlike SVQ. So it does not need to unwind buffers
> or anything like that. Because of how SVQ works with the rx queue,
> this is not possible, since the destination will find no available
> buffers for rx. And for tx you already have described the scenario.
>
> In other words, we cannot see SVQ as a vhost device in that regard:
> SVQ looks for total drain (as "make all guest's buffers available for
> the device ASAP") vs the vhost device which can live with a lot of
> available ones and it will use them on demand. Same problem as
> masking. So the difference in behavior is justified in my opinion, and
> it can be improved in the future with the vdpa in-flight descriptors.
>
> If we restore the state that way in a virtio-net device, it will see
> the available ones as expected, not as in-flight.
>
> Another possibility is to transform all of these into in-flight ones,
> but I feel it would create problems. Can we migrate all rx queues as
> in-flight, with 0 bytes written? Is it worth it?

To clarify, for inflight I meant from the device point of view, that
is [last_used_idx, last_avail_idx).

So for RX and SVQ, it should be as simple as stop forwarding buffers
since last_used_idx should be the same as last_avail_idx in this case.
(Though technically the rx buffer might be modified by the NIC).

> I didn't investigate
> that path too much, but I think the virtio-net emulated device does
> not support that at the moment. If I'm not wrong, we should copy
> something like the body of virtio_blk_load_device if we want to go
> that route.
>
> The current approach might be too net-centric, so let me know if this
> behavior is unexpected or we can do better otherwise.

It should be fine to start from a networking device. We can add more
in the future if it is needed.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Otherwise there could be buffers that is inflight forever.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +
> > >>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>>>>>> +        if (elem) {
> > >>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>>>>>> +        }
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>>>>>> +    if (next_avail_elem) {
> > >>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>>>>>> +                                 next_avail_elem->len);
> > >>>>>>> +    }
> > >>>>>>>      }
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
@ 2022-02-23  2:26                     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-23  2:26 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>>>      {
> > >>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>>>>>> +
> > >>>>>>> +    if (!svq->vq) {
> > >>>>>>> +        return;
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    /* Send all pending used descriptors to guest */
> > >>>>>>> +    vhost_svq_flush(svq, false);
> > >>>>>> Do we need to wait for all the pending descriptors to be completed here?
> > >>>>>>
> > >>>>> No, this function does not wait, it only completes the forwarding of
> > >>>>> the *used* descriptors.
> > >>>>>
> > >>>>> The best example is the net rx queue in my opinion. This call will
> > >>>>> check SVQ's vring used_idx and will forward the last used descriptors
> > >>>>> if any, but all available descriptors will remain as available for
> > >>>>> qemu's VQ code.
> > >>>>>
> > >>>>> To skip it would miss those last rx descriptors in migration.
> > >>>>>
> > >>>>> Thanks!
> > >>>> So it's probably to not the best place to ask. It's more about the
> > >>>> inflight descriptors so it should be TX instead of RX.
> > >>>>
> > >>>> I can imagine the migration last phase, we should stop the vhost-vDPA
> > >>>> before calling vhost_svq_stop(). Then we should be fine regardless of
> > >>>> inflight descriptors.
> > >>>>
> > >>> I think I'm still missing something here.
> > >>>
> > >>> To be on the same page. Regarding tx this could cause repeated tx
> > >>> frames (one at source and other at destination), but never a missed
> > >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> > >>> is not forwarding available buffers anymore". Would that work?
> > >>
> > >> Right, but this only work if
> > >>
> > >> 1) a flush to make sure TX DMA for inflight descriptors are all completed
> > >>
> > >> 2) just mark all inflight descriptor used
> > >>
> > > It currently trusts on the reverse: Buffers not marked as used (by the
> > > device) will be available in the destination, so expect
> > > retransmissions.
> >
> >
> > I may miss something but I think we do migrate last_avail_idx. So there
> > won't be a re-transmission, since we depend on qemu virtqueue code to
> > deal with vring base?
> >
>
> On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
> mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
> last_avail_idx == last_used_idx, and it's migrated after that if I'm
> not wrong.

Ok, I miss these details in the review. I suggest mentioning this in
the change log and add a comment in vhost_vdpa_get_vring_base().

>
> vhost kernel migrates last_avail_idx, but it makes rx buffers
> available on-demand, unlike SVQ. So it does not need to unwind buffers
> or anything like that. Because of how SVQ works with the rx queue,
> this is not possible, since the destination will find no available
> buffers for rx. And for tx you already have described the scenario.
>
> In other words, we cannot see SVQ as a vhost device in that regard:
> SVQ looks for total drain (as "make all guest's buffers available for
> the device ASAP") vs the vhost device which can live with a lot of
> available ones and it will use them on demand. Same problem as
> masking. So the difference in behavior is justified in my opinion, and
> it can be improved in the future with the vdpa in-flight descriptors.
>
> If we restore the state that way in a virtio-net device, it will see
> the available ones as expected, not as in-flight.
>
> Another possibility is to transform all of these into in-flight ones,
> but I feel it would create problems. Can we migrate all rx queues as
> in-flight, with 0 bytes written? Is it worth it?

To clarify, for inflight I meant from the device point of view, that
is [last_used_idx, last_avail_idx).

So for RX and SVQ, it should be as simple as stop forwarding buffers
since last_used_idx should be the same as last_avail_idx in this case.
(Though technically the rx buffer might be modified by the NIC).

> I didn't investigate
> that path too much, but I think the virtio-net emulated device does
> not support that at the moment. If I'm not wrong, we should copy
> something like the body of virtio_blk_load_device if we want to go
> that route.
>
> The current approach might be too net-centric, so let me know if this
> behavior is unexpected or we can do better otherwise.

It should be fine to start from a networking device. We can add more
in the future if it is needed.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Otherwise there could be buffers that is inflight forever.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +
> > >>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>>>>>> +        if (elem) {
> > >>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>>>>>> +        }
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>>>>>> +    if (next_avail_elem) {
> > >>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>>>>>> +                                 next_avail_elem->len);
> > >>>>>>> +    }
> > >>>>>>>      }
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-22  8:05                 ` Eugenio Perez Martin
@ 2022-02-23  3:46                     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-23  3:46 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > >> <eperezma@redhat.com> wrote:
> > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>
> > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>>>>>> block migration.
> > >>>>>>>
> > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > >>>>>>> because SVQ memory is in the qemu region.
> > >>>>>>>
> > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > >>>>>>> this series is already long enough.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>>>>> ---
> > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > >>>>>>>     1 file changed, 20 insertions(+)
> > >>>>>>>
> > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>> index fb0a338baa..75090d65e8 100644
> > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > >>>>>>>             vhost_svq_valid_guest_features(features);
> > >>>>>>> +
> > >>>>>>> +        /* Add SVQ logging capabilities */
> > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>>>>>>         }
> > >>>>>>>
> > >>>>>>>         return ret;
> > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > >>>>>>>
> > >>>>>>>         if (v->shadow_vqs_enabled) {
> > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > >>>>>>> +        uint8_t status = 0;
> > >>>>>>>             bool ok;
> > >>>>>>>
> > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>>>>>> +        if (unlikely(ret)) {
> > >>>>>>> +            return ret;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>>>>>> +            /*
> > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > >>>>>>> +             */
> > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > >>>>>> dirty page tracking for SVQ.
> > >>>>>>
> > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > >>>>> migration. To inform the device that it should start logging, they set
> > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > >>>>
> > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > >>>> enabled and disabled.
> > >>>>
> > >>> Yes, that's what this patch does.
> > >>>
> > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > >>>>> vhost does not block migration. Maybe we need to look for another way
> > >>>>> to do this?
> > >>>>
> > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > >>>> understand why we need to check DRIVER_OK.
> > >>>>
> > >>> Ok maybe I can make that part more clear,
> > >>>
> > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > >>> affecting other features.
> > >>>
> > >>> In practice, that means to not forward the set features after
> > >>> DRIVER_OK. The device is not expecting them anymore.
> > >> I wonder what happens if we don't do this.
> > >>
> > > If we simply delete the check vhost_dev_set_features will return an
> > > error, failing the start of the migration. More on this below.
> >
> >
> > Ok.
> >
> >
> > >
> > >> So kernel had this check:
> > >>
> > >>          /*
> > >>           * It's not allowed to change the features after they have
> > >>           * been negotiated.
> > >>           */
> > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > >>          return -EBUSY;
> > >>
> > >> So is it FEATURES_OK actually?
> > >>
> > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > it for the next version.
> > >
> > > But it should be functionally equivalent, since
> > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > be concurrent with it.
> >
> >
> > Right.
> >
> >
> > >
> > >> For this patch, I wonder if the thing we need to do is to see whether
> > >> it is a enable/disable F_LOG_ALL and simply return.
> > >>
> > > Yes, that's the intention of the patch.
> > >
> > > We have 4 cases here:
> > > a) We're being called from vhost_dev_start, with enable_log = false
> > > b) We're being called from vhost_dev_start, with enable_log = true
> >
> >
> > And this case makes us can't simply return without calling vhost-vdpa.
> >
>
> It calls because {FEATURES,DRIVER}_OK is still not set at that point.
>
> >
> > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > >
> > > The way to tell the difference between a/b and c/d is to check if
> > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > memory through the memory unmapping, so we clear the bit
> > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > (cases a and b).
> > >
> > > Another possibility is to track if features have been set with a bool
> > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > only store that in the actual device.
> >
> >
> > So I suggest to make sure codes match the comment:
> >
> >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >              /*
> >               * vhost is trying to enable or disable _F_LOG, and the device
> >               * would report wrong dirty pages. SVQ handles it.
> >               */
> >              return 0;
> >          }
> >
> > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > this case.
> >
>
> How to detect? We can save feature flags and compare, but ignoring all
> set_features after FEATURES_OK seems simpler to me.

Something like:

(status ^ status_old == _F_LOG_ALL) ?

It helps us to return errors on wrong features set during DRIVER_OK.

Thanks

>
> Would changing the comment work? Something like "set_features after
> _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> the device would report wrong dirty pages. SVQ handles it."
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > >> Thanks
> > >>
> > >>> Does that make more sense?
> > >>>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +            return 0;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > >>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > >>>>>>> +
> > >>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > >>>>>>>             if (ret != 0) {
> > >>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
@ 2022-02-23  3:46                     ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-23  3:46 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > >> <eperezma@redhat.com> wrote:
> > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>
> > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>>>>>> block migration.
> > >>>>>>>
> > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > >>>>>>> because SVQ memory is in the qemu region.
> > >>>>>>>
> > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > >>>>>>> this series is already long enough.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>>>>> ---
> > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > >>>>>>>     1 file changed, 20 insertions(+)
> > >>>>>>>
> > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>> index fb0a338baa..75090d65e8 100644
> > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > >>>>>>>             vhost_svq_valid_guest_features(features);
> > >>>>>>> +
> > >>>>>>> +        /* Add SVQ logging capabilities */
> > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>>>>>>         }
> > >>>>>>>
> > >>>>>>>         return ret;
> > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > >>>>>>>
> > >>>>>>>         if (v->shadow_vqs_enabled) {
> > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > >>>>>>> +        uint8_t status = 0;
> > >>>>>>>             bool ok;
> > >>>>>>>
> > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>>>>>> +        if (unlikely(ret)) {
> > >>>>>>> +            return ret;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>>>>>> +            /*
> > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > >>>>>>> +             */
> > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > >>>>>> dirty page tracking for SVQ.
> > >>>>>>
> > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > >>>>> migration. To inform the device that it should start logging, they set
> > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > >>>>
> > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > >>>> enabled and disabled.
> > >>>>
> > >>> Yes, that's what this patch does.
> > >>>
> > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > >>>>> vhost does not block migration. Maybe we need to look for another way
> > >>>>> to do this?
> > >>>>
> > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > >>>> understand why we need to check DRIVER_OK.
> > >>>>
> > >>> Ok maybe I can make that part more clear,
> > >>>
> > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > >>> affecting other features.
> > >>>
> > >>> In practice, that means to not forward the set features after
> > >>> DRIVER_OK. The device is not expecting them anymore.
> > >> I wonder what happens if we don't do this.
> > >>
> > > If we simply delete the check vhost_dev_set_features will return an
> > > error, failing the start of the migration. More on this below.
> >
> >
> > Ok.
> >
> >
> > >
> > >> So kernel had this check:
> > >>
> > >>          /*
> > >>           * It's not allowed to change the features after they have
> > >>           * been negotiated.
> > >>           */
> > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > >>          return -EBUSY;
> > >>
> > >> So is it FEATURES_OK actually?
> > >>
> > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > it for the next version.
> > >
> > > But it should be functionally equivalent, since
> > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > be concurrent with it.
> >
> >
> > Right.
> >
> >
> > >
> > >> For this patch, I wonder if the thing we need to do is to see whether
> > >> it is a enable/disable F_LOG_ALL and simply return.
> > >>
> > > Yes, that's the intention of the patch.
> > >
> > > We have 4 cases here:
> > > a) We're being called from vhost_dev_start, with enable_log = false
> > > b) We're being called from vhost_dev_start, with enable_log = true
> >
> >
> > And this case makes us can't simply return without calling vhost-vdpa.
> >
>
> It calls because {FEATURES,DRIVER}_OK is still not set at that point.
>
> >
> > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > >
> > > The way to tell the difference between a/b and c/d is to check if
> > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > memory through the memory unmapping, so we clear the bit
> > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > (cases a and b).
> > >
> > > Another possibility is to track if features have been set with a bool
> > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > only store that in the actual device.
> >
> >
> > So I suggest to make sure codes match the comment:
> >
> >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >              /*
> >               * vhost is trying to enable or disable _F_LOG, and the device
> >               * would report wrong dirty pages. SVQ handles it.
> >               */
> >              return 0;
> >          }
> >
> > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > this case.
> >
>
> How to detect? We can save feature flags and compare, but ignoring all
> set_features after FEATURES_OK seems simpler to me.

Something like:

(status ^ status_old == _F_LOG_ALL) ?

It helps us to return errors on wrong features set during DRIVER_OK.

Thanks

>
> Would changing the comment work? Something like "set_features after
> _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> the device would report wrong dirty pages. SVQ handles it."
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > >> Thanks
> > >>
> > >>> Does that make more sense?
> > >>>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +            return 0;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > >>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > >>>>>>> +
> > >>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > >>>>>>>             if (ret != 0) {
> > >>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);
> >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-23  3:46                     ` Jason Wang
  (?)
@ 2022-02-23  8:06                     ` Eugenio Perez Martin
  2022-02-24  3:45                         ` Jason Wang
  -1 siblings, 1 reply; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-23  8:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Wed, Feb 23, 2022 at 4:47 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > > >> <eperezma@redhat.com> wrote:
> > > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >>>>
> > > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > > >>>>>>> block migration.
> > > >>>>>>>
> > > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > > >>>>>>> because SVQ memory is in the qemu region.
> > > >>>>>>>
> > > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > > >>>>>>> this series is already long enough.
> > > >>>>>>>
> > > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > >>>>>>> ---
> > > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > > >>>>>>>     1 file changed, 20 insertions(+)
> > > >>>>>>>
> > > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > >>>>>>> index fb0a338baa..75090d65e8 100644
> > > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > > >>>>>>>             vhost_svq_valid_guest_features(features);
> > > >>>>>>> +
> > > >>>>>>> +        /* Add SVQ logging capabilities */
> > > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > >>>>>>>         }
> > > >>>>>>>
> > > >>>>>>>         return ret;
> > > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > > >>>>>>>
> > > >>>>>>>         if (v->shadow_vqs_enabled) {
> > > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > > >>>>>>> +        uint8_t status = 0;
> > > >>>>>>>             bool ok;
> > > >>>>>>>
> > > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > > >>>>>>> +        if (unlikely(ret)) {
> > > >>>>>>> +            return ret;
> > > >>>>>>> +        }
> > > >>>>>>> +
> > > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > >>>>>>> +            /*
> > > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > > >>>>>>> +             */
> > > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > > >>>>>> dirty page tracking for SVQ.
> > > >>>>>>
> > > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > > >>>>> migration. To inform the device that it should start logging, they set
> > > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > > >>>>
> > > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > > >>>> enabled and disabled.
> > > >>>>
> > > >>> Yes, that's what this patch does.
> > > >>>
> > > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > >>>>> vhost does not block migration. Maybe we need to look for another way
> > > >>>>> to do this?
> > > >>>>
> > > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > > >>>> understand why we need to check DRIVER_OK.
> > > >>>>
> > > >>> Ok maybe I can make that part more clear,
> > > >>>
> > > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > > >>> affecting other features.
> > > >>>
> > > >>> In practice, that means to not forward the set features after
> > > >>> DRIVER_OK. The device is not expecting them anymore.
> > > >> I wonder what happens if we don't do this.
> > > >>
> > > > If we simply delete the check vhost_dev_set_features will return an
> > > > error, failing the start of the migration. More on this below.
> > >
> > >
> > > Ok.
> > >
> > >
> > > >
> > > >> So kernel had this check:
> > > >>
> > > >>          /*
> > > >>           * It's not allowed to change the features after they have
> > > >>           * been negotiated.
> > > >>           */
> > > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > > >>          return -EBUSY;
> > > >>
> > > >> So is it FEATURES_OK actually?
> > > >>
> > > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > > it for the next version.
> > > >
> > > > But it should be functionally equivalent, since
> > > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > > be concurrent with it.
> > >
> > >
> > > Right.
> > >
> > >
> > > >
> > > >> For this patch, I wonder if the thing we need to do is to see whether
> > > >> it is a enable/disable F_LOG_ALL and simply return.
> > > >>
> > > > Yes, that's the intention of the patch.
> > > >
> > > > We have 4 cases here:
> > > > a) We're being called from vhost_dev_start, with enable_log = false
> > > > b) We're being called from vhost_dev_start, with enable_log = true
> > >
> > >
> > > And this case makes us can't simply return without calling vhost-vdpa.
> > >
> >
> > It calls because {FEATURES,DRIVER}_OK is still not set at that point.
> >
> > >
> > > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > > >
> > > > The way to tell the difference between a/b and c/d is to check if
> > > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > > memory through the memory unmapping, so we clear the bit
> > > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > > (cases a and b).
> > > >
> > > > Another possibility is to track if features have been set with a bool
> > > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > > only store that in the actual device.
> > >
> > >
> > > So I suggest to make sure codes match the comment:
> > >
> > >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >              /*
> > >               * vhost is trying to enable or disable _F_LOG, and the device
> > >               * would report wrong dirty pages. SVQ handles it.
> > >               */
> > >              return 0;
> > >          }
> > >
> > > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > > this case.
> > >
> >
> > How to detect? We can save feature flags and compare, but ignoring all
> > set_features after FEATURES_OK seems simpler to me.
>
> Something like:
>
> (status ^ status_old == _F_LOG_ALL) ?
>

s/status/features/ ?

> It helps us to return errors on wrong features set during DRIVER_OK.
>

Do you mean to return errors in case of toggling other features than
_F_LOG_ALL, isn't it? That's interesting actually, but it seems it
forces vhost_vdpa to track acked_features too.

Actually, it seems to me vhost_dev->acked_features will retain the bad
features even on error. I'll investigate it.

Thanks!


> Thanks
>
> >
> > Would changing the comment work? Something like "set_features after
> > _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> > the device would report wrong dirty pages. SVQ handles it."
> >



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-02-23  8:06                     ` Eugenio Perez Martin
@ 2022-02-24  3:45                         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-24  3:45 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Wed, Feb 23, 2022 at 4:06 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Feb 23, 2022 at 4:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > > > >> <eperezma@redhat.com> wrote:
> > > > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>
> > > > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > > > >>>>>>> block migration.
> > > > >>>>>>>
> > > > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > > > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > > > >>>>>>> because SVQ memory is in the qemu region.
> > > > >>>>>>>
> > > > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > > > >>>>>>> this series is already long enough.
> > > > >>>>>>>
> > > > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>>>>> ---
> > > > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > > > >>>>>>>     1 file changed, 20 insertions(+)
> > > > >>>>>>>
> > > > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> index fb0a338baa..75090d65e8 100644
> > > > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > > > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > > > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > > > >>>>>>>             vhost_svq_valid_guest_features(features);
> > > > >>>>>>> +
> > > > >>>>>>> +        /* Add SVQ logging capabilities */
> > > > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > > >>>>>>>         }
> > > > >>>>>>>
> > > > >>>>>>>         return ret;
> > > > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > > > >>>>>>>
> > > > >>>>>>>         if (v->shadow_vqs_enabled) {
> > > > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > > > >>>>>>> +        uint8_t status = 0;
> > > > >>>>>>>             bool ok;
> > > > >>>>>>>
> > > > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > > > >>>>>>> +        if (unlikely(ret)) {
> > > > >>>>>>> +            return ret;
> > > > >>>>>>> +        }
> > > > >>>>>>> +
> > > > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > >>>>>>> +            /*
> > > > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > > > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > > > >>>>>>> +             */
> > > > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > > > >>>>>> dirty page tracking for SVQ.
> > > > >>>>>>
> > > > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > > > >>>>> migration. To inform the device that it should start logging, they set
> > > > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > > > >>>>
> > > > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > > > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > > > >>>> enabled and disabled.
> > > > >>>>
> > > > >>> Yes, that's what this patch does.
> > > > >>>
> > > > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > > >>>>> vhost does not block migration. Maybe we need to look for another way
> > > > >>>>> to do this?
> > > > >>>>
> > > > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > > > >>>> understand why we need to check DRIVER_OK.
> > > > >>>>
> > > > >>> Ok maybe I can make that part more clear,
> > > > >>>
> > > > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > > > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > > > >>> affecting other features.
> > > > >>>
> > > > >>> In practice, that means to not forward the set features after
> > > > >>> DRIVER_OK. The device is not expecting them anymore.
> > > > >> I wonder what happens if we don't do this.
> > > > >>
> > > > > If we simply delete the check vhost_dev_set_features will return an
> > > > > error, failing the start of the migration. More on this below.
> > > >
> > > >
> > > > Ok.
> > > >
> > > >
> > > > >
> > > > >> So kernel had this check:
> > > > >>
> > > > >>          /*
> > > > >>           * It's not allowed to change the features after they have
> > > > >>           * been negotiated.
> > > > >>           */
> > > > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > > > >>          return -EBUSY;
> > > > >>
> > > > >> So is it FEATURES_OK actually?
> > > > >>
> > > > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > > > it for the next version.
> > > > >
> > > > > But it should be functionally equivalent, since
> > > > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > > > be concurrent with it.
> > > >
> > > >
> > > > Right.
> > > >
> > > >
> > > > >
> > > > >> For this patch, I wonder if the thing we need to do is to see whether
> > > > >> it is a enable/disable F_LOG_ALL and simply return.
> > > > >>
> > > > > Yes, that's the intention of the patch.
> > > > >
> > > > > We have 4 cases here:
> > > > > a) We're being called from vhost_dev_start, with enable_log = false
> > > > > b) We're being called from vhost_dev_start, with enable_log = true
> > > >
> > > >
> > > > And this case makes us can't simply return without calling vhost-vdpa.
> > > >
> > >
> > > It calls because {FEATURES,DRIVER}_OK is still not set at that point.
> > >
> > > >
> > > > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > > > >
> > > > > The way to tell the difference between a/b and c/d is to check if
> > > > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > > > memory through the memory unmapping, so we clear the bit
> > > > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > > > (cases a and b).
> > > > >
> > > > > Another possibility is to track if features have been set with a bool
> > > > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > > > only store that in the actual device.
> > > >
> > > >
> > > > So I suggest to make sure codes match the comment:
> > > >
> > > >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > >              /*
> > > >               * vhost is trying to enable or disable _F_LOG, and the device
> > > >               * would report wrong dirty pages. SVQ handles it.
> > > >               */
> > > >              return 0;
> > > >          }
> > > >
> > > > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > > > this case.
> > > >
> > >
> > > How to detect? We can save feature flags and compare, but ignoring all
> > > set_features after FEATURES_OK seems simpler to me.
> >
> > Something like:
> >
> > (status ^ status_old == _F_LOG_ALL) ?
> >
>
> s/status/features/ ?

Right.

>
> > It helps us to return errors on wrong features set during DRIVER_OK.
> >
>
> Do you mean to return errors in case of toggling other features than
> _F_LOG_ALL, isn't it? That's interesting actually, but it seems it
> forces vhost_vdpa to track acked_features too.

I meant we can change the check a little bit like:

if (featurs ^ features_old == _F_LOG_ALL && status &
VIRTIO_CONFIG_S_DRIVER_OK) {
    return 0;
}

For other features changing we and let it go down the logic as you
proposed in this patch.

Thanks

>
> Actually, it seems to me vhost_dev->acked_features will retain the bad
> features even on error. I'll investigate it.
>
> Thanks!
>
>
> > Thanks
> >
> > >
> > > Would changing the comment work? Something like "set_features after
> > > _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> > > the device would report wrong dirty pages. SVQ handles it."
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
@ 2022-02-24  3:45                         ` Jason Wang
  0 siblings, 0 replies; 182+ messages in thread
From: Jason Wang @ 2022-02-24  3:45 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Juan Quintela, Richard Henderson, qemu-level, Gautam Dawar,
	Markus Armbruster, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Peter Xu, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, virtualization, Eric Blake, Stefano Garzarella

On Wed, Feb 23, 2022 at 4:06 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Feb 23, 2022 at 4:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > > > >> <eperezma@redhat.com> wrote:
> > > > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>
> > > > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > > > >>>>>>> block migration.
> > > > >>>>>>>
> > > > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > > > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > > > >>>>>>> because SVQ memory is in the qemu region.
> > > > >>>>>>>
> > > > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > > > >>>>>>> this series is already long enough.
> > > > >>>>>>>
> > > > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>>>>> ---
> > > > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > > > >>>>>>>     1 file changed, 20 insertions(+)
> > > > >>>>>>>
> > > > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> index fb0a338baa..75090d65e8 100644
> > > > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > > > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > > > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > > > >>>>>>>             vhost_svq_valid_guest_features(features);
> > > > >>>>>>> +
> > > > >>>>>>> +        /* Add SVQ logging capabilities */
> > > > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > > >>>>>>>         }
> > > > >>>>>>>
> > > > >>>>>>>         return ret;
> > > > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > > > >>>>>>>
> > > > >>>>>>>         if (v->shadow_vqs_enabled) {
> > > > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > > > >>>>>>> +        uint8_t status = 0;
> > > > >>>>>>>             bool ok;
> > > > >>>>>>>
> > > > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > > > >>>>>>> +        if (unlikely(ret)) {
> > > > >>>>>>> +            return ret;
> > > > >>>>>>> +        }
> > > > >>>>>>> +
> > > > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > >>>>>>> +            /*
> > > > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > > > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > > > >>>>>>> +             */
> > > > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > > > >>>>>> dirty page tracking for SVQ.
> > > > >>>>>>
> > > > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > > > >>>>> migration. To inform the device that it should start logging, they set
> > > > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > > > >>>>
> > > > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > > > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > > > >>>> enabled and disabled.
> > > > >>>>
> > > > >>> Yes, that's what this patch does.
> > > > >>>
> > > > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > > >>>>> vhost does not block migration. Maybe we need to look for another way
> > > > >>>>> to do this?
> > > > >>>>
> > > > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > > > >>>> understand why we need to check DRIVER_OK.
> > > > >>>>
> > > > >>> Ok maybe I can make that part more clear,
> > > > >>>
> > > > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > > > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > > > >>> affecting other features.
> > > > >>>
> > > > >>> In practice, that means to not forward the set features after
> > > > >>> DRIVER_OK. The device is not expecting them anymore.
> > > > >> I wonder what happens if we don't do this.
> > > > >>
> > > > > If we simply delete the check vhost_dev_set_features will return an
> > > > > error, failing the start of the migration. More on this below.
> > > >
> > > >
> > > > Ok.
> > > >
> > > >
> > > > >
> > > > >> So kernel had this check:
> > > > >>
> > > > >>          /*
> > > > >>           * It's not allowed to change the features after they have
> > > > >>           * been negotiated.
> > > > >>           */
> > > > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > > > >>          return -EBUSY;
> > > > >>
> > > > >> So is it FEATURES_OK actually?
> > > > >>
> > > > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > > > it for the next version.
> > > > >
> > > > > But it should be functionally equivalent, since
> > > > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > > > be concurrent with it.
> > > >
> > > >
> > > > Right.
> > > >
> > > >
> > > > >
> > > > >> For this patch, I wonder if the thing we need to do is to see whether
> > > > >> it is a enable/disable F_LOG_ALL and simply return.
> > > > >>
> > > > > Yes, that's the intention of the patch.
> > > > >
> > > > > We have 4 cases here:
> > > > > a) We're being called from vhost_dev_start, with enable_log = false
> > > > > b) We're being called from vhost_dev_start, with enable_log = true
> > > >
> > > >
> > > > And this case makes us can't simply return without calling vhost-vdpa.
> > > >
> > >
> > > It calls because {FEATURES,DRIVER}_OK is still not set at that point.
> > >
> > > >
> > > > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > > > >
> > > > > The way to tell the difference between a/b and c/d is to check if
> > > > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > > > memory through the memory unmapping, so we clear the bit
> > > > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > > > (cases a and b).
> > > > >
> > > > > Another possibility is to track if features have been set with a bool
> > > > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > > > only store that in the actual device.
> > > >
> > > >
> > > > So I suggest to make sure codes match the comment:
> > > >
> > > >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > >              /*
> > > >               * vhost is trying to enable or disable _F_LOG, and the device
> > > >               * would report wrong dirty pages. SVQ handles it.
> > > >               */
> > > >              return 0;
> > > >          }
> > > >
> > > > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > > > this case.
> > > >
> > >
> > > How to detect? We can save feature flags and compare, but ignoring all
> > > set_features after FEATURES_OK seems simpler to me.
> >
> > Something like:
> >
> > (status ^ status_old == _F_LOG_ALL) ?
> >
>
> s/status/features/ ?

Right.

>
> > It helps us to return errors on wrong features set during DRIVER_OK.
> >
>
> Do you mean to return errors in case of toggling other features than
> _F_LOG_ALL, isn't it? That's interesting actually, but it seems it
> forces vhost_vdpa to track acked_features too.

I meant we can change the check a little bit like:

if (featurs ^ features_old == _F_LOG_ALL && status &
VIRTIO_CONFIG_S_DRIVER_OK) {
    return 0;
}

For other features changing we and let it go down the logic as you
proposed in this patch.

Thanks

>
> Actually, it seems to me vhost_dev->acked_features will retain the bad
> features even on error. I'll investigate it.
>
> Thanks!
>
>
> > Thanks
> >
> > >
> > > Would changing the comment work? Something like "set_features after
> > > _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> > > the device would report wrong dirty pages. SVQ handles it."
> > >
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-01-21 20:27 ` [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
  2022-01-29  8:11     ` Jason Wang
@ 2022-02-26  9:11   ` Liuxiangdong via
  2022-02-26 11:12     ` Eugenio Perez Martin
  1 sibling, 1 reply; 182+ messages in thread
From: Liuxiangdong via @ 2022-02-26  9:11 UTC (permalink / raw)
  To: eperezma
  Cc: armbru, eblake, ehabkost, eli, gdawar, hanand, jasowang,
	lingshan.zhu, lulu, lvivier, mst, parav, pbonzini, peterx,
	qemu-devel, quintela, richard.henderson, sgarzare, stefanha,
	virtualization, xiao.w.wang, Fangyi (Eric),
	yebiaoxiang

Hi, Eugenio.

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 9619c8082c..51442b3dbf 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
      return &svq->hdev_kick;
  }

+/**
+ * Validate the transport device features that SVQ can use with the device
+ *
+ * @dev_features  The device features. If success, the acknowledged 
features.
+ *
+ * Returns true if SVQ can go with a subset of these, false otherwise.
+ */
+bool vhost_svq_valid_device_features(uint64_t *dev_features)
+{
+    bool r = true;
+
+    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= 
VIRTIO_TRANSPORT_F_END;
+         ++b) {
+        switch (b) {
+        case VIRTIO_F_NOTIFY_ON_EMPTY:
+        case VIRTIO_F_ANY_LAYOUT:
+            continue;



#define VIRTIO_TRANSPORT_F_START    28
#define VIRTIO_TRANSPORT_F_END        38

#define VIRTIO_F_NOTIFY_ON_EMPTY    24

This case (VIRTIO_F_NOTIFY_ON_EMPTY) may be useless.


Thanks.
Xiangdong Liu


^ permalink raw reply related	[flat|nested] 182+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
  2022-02-26  9:11   ` Liuxiangdong via
@ 2022-02-26 11:12     ` Eugenio Perez Martin
  0 siblings, 0 replies; 182+ messages in thread
From: Eugenio Perez Martin @ 2022-02-26 11:12 UTC (permalink / raw)
  To: Liuxiangdong
  Cc: Michael Tsirkin, Jason Wang, qemu-level, Peter Xu,
	virtualization, Eli Cohen, Eric Blake, Parav Pandit, Cindy Lu,
	Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Stefano Garzarella,
	Laurent Vivier, Eduardo Habkost, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Paolo Bonzini, Zhu Lingshan

On Sat, Feb 26, 2022 at 10:31 AM Liuxiangdong <liuxiangdong5@huawei.com> wrote:
>
> Hi, Eugenio.
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c
> b/hw/virtio/vhost-shadow-virtqueue.c
> index 9619c8082c..51442b3dbf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>
> +/**
> + * Validate the transport device features that SVQ can use with the device
> + *
> + * @dev_features  The device features. If success, the acknowledged
> features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
> +{
> +    bool r = true;
> +
> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <=
> VIRTIO_TRANSPORT_F_END;
> +         ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
>
>
>
> #define VIRTIO_TRANSPORT_F_START    28
> #define VIRTIO_TRANSPORT_F_END        38
>
> #define VIRTIO_F_NOTIFY_ON_EMPTY    24
>
> This case (VIRTIO_F_NOTIFY_ON_EMPTY) may be useless.
>

Hi Xiangdong Liu,

You're right, it's out of the range so it does not make any sense to
check for it. I will delete it in the next version, thank you very
much!

>
> Thanks.
> Xiangdong Liu
>



^ permalink raw reply	[flat|nested] 182+ messages in thread

end of thread, other threads:[~2022-02-26 14:42 UTC | newest]

Thread overview: 182+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21 20:27 [PATCH 00/31] vDPA shadow virtqueue Eugenio Pérez
2022-01-21 20:27 ` [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions Eugenio Pérez
2022-01-28  5:59   ` Jason Wang
2022-01-28  5:59     ` Jason Wang
2022-01-28  7:57     ` Eugenio Perez Martin
2022-02-21  7:31       ` Jason Wang
2022-02-21  7:31         ` Jason Wang
2022-02-21  7:42         ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Eugenio Pérez
2022-01-26  8:53   ` Eugenio Perez Martin
2022-01-28  6:00   ` Jason Wang
2022-01-28  6:00     ` Jason Wang
2022-01-28  8:10     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier Eugenio Pérez
2022-01-28  6:03   ` Jason Wang
2022-01-28  6:03     ` Jason Wang
2022-01-31  9:33     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd Eugenio Pérez
2022-01-28  6:29   ` Jason Wang
2022-01-28  6:29     ` Jason Wang
2022-01-31 10:18     ` Eugenio Perez Martin
2022-02-08  8:47       ` Jason Wang
2022-02-08  8:47         ` Jason Wang
2022-02-18 18:22         ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
2022-01-28  6:32   ` Jason Wang
2022-01-28  6:32     ` Jason Wang
2022-01-31 10:48     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
2022-01-28  6:56   ` Jason Wang
2022-01-28  6:56     ` Jason Wang
2022-01-31 11:33     ` Eugenio Perez Martin
2022-02-08  9:02       ` Jason Wang
2022-02-08  9:02         ` Jason Wang
2022-01-21 20:27 ` [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier Eugenio Pérez
2022-01-29  7:57   ` Jason Wang
2022-01-29  7:57     ` Jason Wang
2022-01-29 17:49     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 08/31] vhost: Add vhost_svq_set_guest_call_notifier Eugenio Pérez
2022-01-21 20:27 ` [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Eugenio Pérez
2022-01-29  8:05   ` Jason Wang
2022-01-29  8:05     ` Jason Wang
2022-01-31 15:34     ` Eugenio Perez Martin
2022-02-08  3:23       ` Jason Wang
2022-02-08  3:23         ` Jason Wang
2022-02-18 12:35         ` Eugenio Perez Martin
2022-02-21  7:39           ` Jason Wang
2022-02-21  7:39             ` Jason Wang
2022-02-21  8:01             ` Eugenio Perez Martin
2022-02-22  7:18               ` Jason Wang
2022-02-22  7:18                 ` Jason Wang
2022-01-21 20:27 ` [PATCH 10/31] vhost: Route host->guest notification through shadow virtqueue Eugenio Pérez
2022-01-21 20:27 ` [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq Eugenio Pérez
2022-01-29  8:11   ` Jason Wang
2022-01-29  8:11     ` Jason Wang
2022-01-31 15:49     ` Eugenio Perez Martin
2022-02-01 10:57       ` Eugenio Perez Martin
2022-02-08  3:37         ` Jason Wang
2022-02-08  3:37           ` Jason Wang
2022-02-26  9:11   ` Liuxiangdong via
2022-02-26 11:12     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 12/31] vhost: Add vhost_svq_valid_guest_features " Eugenio Pérez
2022-01-21 20:27 ` [PATCH 13/31] vhost: Add vhost_svq_ack_guest_features " Eugenio Pérez
2022-01-21 20:27 ` [PATCH 14/31] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
2022-01-21 20:27 ` [PATCH 15/31] vdpa: Add vhost_svq_get_num Eugenio Pérez
2022-01-29  8:14   ` Jason Wang
2022-01-29  8:14     ` Jason Wang
2022-01-31 16:36     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr Eugenio Pérez
2022-01-29  8:20   ` Jason Wang
2022-01-29  8:20     ` Jason Wang
2022-01-31 17:44     ` Eugenio Perez Martin
2022-02-08  6:58       ` Jason Wang
2022-02-08  6:58         ` Jason Wang
2022-01-21 20:27 ` [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
2022-01-30  4:03   ` Jason Wang
2022-01-30  4:03     ` Jason Wang
2022-01-31 18:58     ` Eugenio Perez Martin
2022-02-08  3:57       ` Jason Wang
2022-02-08  3:57         ` Jason Wang
2022-02-17 17:13         ` Eugenio Perez Martin
2022-02-21  7:15           ` Jason Wang
2022-02-21  7:15             ` Jason Wang
2022-02-21 17:22             ` Eugenio Perez Martin
2022-02-22  3:16               ` Jason Wang
2022-02-22  3:16                 ` Jason Wang
2022-02-22  7:42                 ` Eugenio Perez Martin
2022-02-22  7:59                   ` Jason Wang
2022-02-22  7:59                     ` Jason Wang
2022-01-21 20:27 ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
2022-01-30  4:42   ` Jason Wang
2022-01-30  4:42     ` Jason Wang
2022-02-01 17:08     ` Eugenio Perez Martin
2022-02-08  8:11       ` Jason Wang
2022-02-08  8:11         ` Jason Wang
2022-02-22 19:01         ` Eugenio Perez Martin
2022-02-23  2:03           ` Jason Wang
2022-02-23  2:03             ` Jason Wang
2022-01-30  6:46   ` Jason Wang
2022-01-30  6:46     ` Jason Wang
2022-02-01 11:25     ` Eugenio Perez Martin
2022-02-08  8:15       ` Jason Wang
2022-02-08  8:15         ` Jason Wang
2022-02-17 12:48         ` Eugenio Perez Martin
2022-02-21  7:43           ` Jason Wang
2022-02-21  7:43             ` Jason Wang
2022-02-21  8:15             ` Eugenio Perez Martin
2022-02-22  7:26               ` Jason Wang
2022-02-22  7:26                 ` Jason Wang
2022-02-22  8:55                 ` Eugenio Perez Martin
2022-02-23  2:26                   ` Jason Wang
2022-02-23  2:26                     ` Jason Wang
2022-01-21 20:27 ` [PATCH 19/31] utils: Add internal DMAMap to iova-tree Eugenio Pérez
2022-01-21 20:27 ` [PATCH 20/31] util: Store DMA entries in a list Eugenio Pérez
2022-01-21 20:27 ` [PATCH 21/31] util: Add iova_tree_alloc Eugenio Pérez
2022-01-24  4:32   ` Peter Xu
2022-01-24  4:32     ` Peter Xu
2022-01-24  9:20     ` Eugenio Perez Martin
2022-01-24 11:07       ` Peter Xu
2022-01-24 11:07         ` Peter Xu
2022-01-25  9:40         ` Eugenio Perez Martin
2022-01-27  8:06           ` Peter Xu
2022-01-27  8:06             ` Peter Xu
2022-01-27  9:24             ` Eugenio Perez Martin
2022-01-28  3:57               ` Peter Xu
2022-01-28  3:57                 ` Peter Xu
2022-01-28  5:55                 ` Jason Wang
2022-01-28  5:55                   ` Jason Wang
2022-01-28  7:48                   ` Eugenio Perez Martin
2022-02-15 19:34                   ` Eugenio Pérez
2022-02-15 19:34                   ` [PATCH] util: Add iova_tree_alloc Eugenio Pérez
2022-02-16  7:25                     ` Peter Xu
2022-01-30  5:06       ` [PATCH 21/31] " Jason Wang
2022-01-30  5:06         ` Jason Wang
2022-01-21 20:27 ` [PATCH 22/31] vhost: Add VhostIOVATree Eugenio Pérez
2022-01-30  5:21   ` Jason Wang
2022-01-30  5:21     ` Jason Wang
2022-02-01 17:27     ` Eugenio Perez Martin
2022-02-08  8:17       ` Jason Wang
2022-02-08  8:17         ` Jason Wang
2022-01-21 20:27 ` [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
2022-01-30  5:57   ` Jason Wang
2022-01-30  5:57     ` Jason Wang
2022-01-31 19:11     ` Eugenio Perez Martin
2022-02-08  8:19       ` Jason Wang
2022-02-08  8:19         ` Jason Wang
2022-01-21 20:27 ` [PATCH 24/31] vhost: Add vhost_svq_get_last_used_idx Eugenio Pérez
2022-01-21 20:27 ` [PATCH 25/31] vdpa: Adapt vhost_vdpa_get_vring_base to SVQ Eugenio Pérez
2022-01-21 20:27 ` [PATCH 26/31] vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ Eugenio Pérez
2022-01-21 20:27 ` [PATCH 27/31] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
2022-01-21 20:27 ` [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
2022-01-30  6:50   ` Jason Wang
2022-01-30  6:50     ` Jason Wang
2022-02-01 11:45     ` Eugenio Perez Martin
2022-02-08  8:25       ` Jason Wang
2022-02-08  8:25         ` Jason Wang
2022-02-16 15:53         ` Eugenio Perez Martin
2022-02-17  6:02           ` Jason Wang
2022-02-17  6:02             ` Jason Wang
2022-02-17  8:22             ` Eugenio Perez Martin
2022-02-22  7:41               ` Jason Wang
2022-02-22  7:41                 ` Jason Wang
2022-02-22  8:05                 ` Eugenio Perez Martin
2022-02-23  3:46                   ` Jason Wang
2022-02-23  3:46                     ` Jason Wang
2022-02-23  8:06                     ` Eugenio Perez Martin
2022-02-24  3:45                       ` Jason Wang
2022-02-24  3:45                         ` Jason Wang
2022-01-21 20:27 ` [PATCH 29/31] vdpa: Make ncs autofree Eugenio Pérez
2022-01-30  6:51   ` Jason Wang
2022-01-30  6:51     ` Jason Wang
2022-02-01 17:10     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c Eugenio Pérez
2022-01-30  6:53   ` Jason Wang
2022-01-30  6:53     ` Jason Wang
2022-02-01 17:11     ` Eugenio Perez Martin
2022-01-21 20:27 ` [PATCH 31/31] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
2022-01-28  6:02 ` [PATCH 00/31] vDPA shadow virtqueue Jason Wang
2022-01-28  6:02   ` Jason Wang
2022-01-31  9:15   ` Eugenio Perez Martin
2022-02-08  8:27     ` Jason Wang
2022-02-08  8:27       ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.