All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-07 15:33 Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 01/15] vhost: Add VhostShadowVirtqueue Eugenio Pérez
                   ` (15 more replies)
  0 siblings, 16 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's virtio device operation.

When SVQ is enabled qemu offers a new virtual address space to the
device to read and write into, and it maps new vrings and the guest
memory in it. SVQ also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked.

This effectively means that vDPA device passthrough is intercepted by
qemu. While SVQ should only be enabled at migration time, the switching
from regular mode to SVQ mode is left for a future series.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

For qemu to use shadow virtqueues the guest virtio driver must not use
features like event_idx.

SVQ needs to be enabled with cmdline:

-netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on

The first three patches enables notifications forwarding with
assistance of qemu. It's easy to enable only this if the relevant
cmdline part of the last patch is applied on top of these.

Next four patches implement the actual buffer forwarding. However,
address are not translated from HVA so they will need a host device with
an iommu allowing them to access all of the HVA range.

The last part of the series uses properly the host iommu, so qemu
creates a new iova address space in the device's range and translates
the buffers in it. Finally, it adds the cmdline parameter.

Some simple performance tests with netperf were done. They used a nested
guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
baseline average of ~9009.96Mbps:
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
131072  16384  16384    30.01    9061.03
131072  16384  16384    30.01    8962.94
131072  16384  16384    30.01    9005.92

To enable SVQ buffers forwarding reduce throughput to about
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
131072  16384  16384    30.01    7689.72
131072  16384  16384    30.00    7752.07
131072  16384  16384    30.01    7750.30

However, many performance improvements were left out of this series for
simplicity, so difference should shrink in the future.

Comments are welcome.

TODO on future series:
* Event, indirect, packed, and others features of virtio.
* To support different set of features between the device<->SVQ and the
  SVQ<->guest communication.
* Support of device host notifier memory regions.
* To sepparate buffers forwarding in its own AIO context, so we can
  throw more threads to that task and we don't need to stop the main
  event loop.
* Support multiqueue virtio-net vdpa.
* Proper documentation.

Changes from v4:
* Iterate iova->hva tree instead on maintain own tree so we support HVA
  overlaps.
* Fix: Errno completion at failure.
* Rename x-svq to svq, so changes to stable does not affect cmdline parameter.

Changes from v3:
* Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
* Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
v3 link:
https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/

Changes from v2:
* Less assertions and more error handling in iova tree code.
* Better documentation, both fixing errors and making @param: format
* Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
  prefix at both times.
* Fix: Fo not use VirtQueueElement->len field, track separatedly.
* Split vhost_svq_{enable,disable}_notification, so the code looks more
  like the kernel driver code.
* Small improvements.
v2 link:
https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/

Changes from v1:
* Feature set at device->SVQ is now the same as SVQ->guest.
* Size of SVQ is not max available device size anymore, but guest's
  negotiated.
* Add VHOST_FILE_UNBIND kick and call fd treatment.
* Make SVQ a public struct
* Come back to previous approach to iova-tree
* Some assertions are now fail paths. Some errors are now log_guest.
* Only mask _F_LOG feature at vdpa_set_features svq enable path.
* Refactor some errors and messages. Add missing error unwindings.
* Add memory barrier at _F_NO_NOTIFY set.
* Stop checking for features flags out of transport range.
v1 link:
https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/

Changes from v4 RFC:
* Support of allocating / freeing iova ranges in IOVA tree. Extending
  already present iova-tree for that.
* Proper validation of guest features. Now SVQ can negotiate a
  different set of features with the device when enabled.
* Support of host notifiers memory regions
* Handling of SVQ full queue in case guest's descriptors span to
  different memory regions (qemu's VA chunks).
* Flush pending used buffers at end of SVQ operation.
* QMP command now looks by NetClientState name. Other devices will need
  to implement it's way to enable vdpa.
* Rename QMP command to set, so it looks more like a way of working
* Better use of qemu error system
* Make a few assertions proper error-handling paths.
* Add more documentation
* Less coupling of virtio / vhost, that could cause friction on changes
* Addressed many other small comments and small fixes.

Changes from v3 RFC:
  * Move everything to vhost-vdpa backend. A big change, this allowed
    some cleanup but more code has been added in other places.
  * More use of glib utilities, especially to manage memory.
v3 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html

Changes from v2 RFC:
  * Adding vhost-vdpa devices support
  * Fixed some memory leaks pointed by different comments
v2 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html

Changes from v1 RFC:
  * Use QMP instead of migration to start SVQ mode.
  * Only accepting IOMMU devices, closer behavior with target devices
    (vDPA)
  * Fix invalid masking/unmasking of vhost call fd.
  * Use of proper methods for synchronization.
  * No need to modify VirtIO device code, all of the changes are
    contained in vhost code.
  * Delete superfluous code.
  * An intermediate RFC was sent with only the notifications forwarding
    changes. It can be seen in
    https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
v1 link:
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html

Eugenio Pérez (20):
      virtio: Add VIRTIO_F_QUEUE_STATE
      virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
      virtio: Add virtio_queue_is_host_notifier_enabled
      vhost: Make vhost_virtqueue_{start,stop} public
      vhost: Add x-vhost-enable-shadow-vq qmp
      vhost: Add VhostShadowVirtqueue
      vdpa: Register vdpa devices in a list
      vhost: Route guest->host notification through shadow virtqueue
      Add vhost_svq_get_svq_call_notifier
      Add vhost_svq_set_guest_call_notifier
      vdpa: Save call_fd in vhost-vdpa
      vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
      vhost: Route host->guest notification through shadow virtqueue
      virtio: Add vhost_shadow_vq_get_vring_addr
      vdpa: Save host and guest features
      vhost: Add vhost_svq_valid_device_features to shadow vq
      vhost: Shadow virtqueue buffers forwarding
      vhost: Add VhostIOVATree
      vhost: Use a tree to store memory mappings
      vdpa: Add custom IOTLB translations to SVQ

Eugenio Pérez (15):
  vhost: Add VhostShadowVirtqueue
  vhost: Add Shadow VirtQueue kick forwarding capabilities
  vhost: Add Shadow VirtQueue call forwarding capabilities
  vhost: Add vhost_svq_valid_features to shadow vq
  virtio: Add vhost_svq_get_vring_addr
  vdpa: adapt vhost_ops callbacks to svq
  vhost: Shadow virtqueue buffers forwarding
  util: Add iova_tree_alloc_map
  util: add iova_tree_find_iova
  vhost: Add VhostIOVATree
  vdpa: Add custom IOTLB translations to SVQ
  vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
  vdpa: Never set log_base addr if SVQ is enabled
  vdpa: Expose VHOST_F_LOG_ALL on SVQ
  vdpa: Add x-svq to NetdevVhostVDPAOptions

 qapi/net.json                      |   8 +-
 hw/virtio/vhost-iova-tree.h        |  27 ++
 hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
 include/hw/virtio/vhost-vdpa.h     |   8 +
 include/qemu/iova-tree.h           |  38 +-
 hw/virtio/vhost-iova-tree.c        | 110 +++++
 hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
 net/vhost-vdpa.c                   |  48 ++-
 util/iova-tree.c                   | 169 ++++++++
 hw/virtio/meson.build              |   2 +-
 11 files changed, 1633 insertions(+), 26 deletions(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-iova-tree.c
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

-- 
2.27.0




^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v5 01/15] vhost: Add VhostShadowVirtqueue
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 02/15] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
notifications and buffers, allowing qemu to track them. While qemu is
forwarding the buffers and virtqueue changes, it is able to commit the
memory it's being dirtied, the same way regular qemu's VirtIO devices
do.

This commit only exposes basic SVQ allocation and free. Next patches of
the series add functionality like notifications and buffers forwarding.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 28 ++++++++++++++
 hw/virtio/vhost-shadow-virtqueue.c | 62 ++++++++++++++++++++++++++++++
 hw/virtio/meson.build              |  2 +-
 3 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
new file mode 100644
index 0000000000..f1519e3c7b
--- /dev/null
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -0,0 +1,28 @@
+/*
+ * vhost shadow virtqueue
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef VHOST_SHADOW_VIRTQUEUE_H
+#define VHOST_SHADOW_VIRTQUEUE_H
+
+#include "qemu/event_notifier.h"
+
+/* Shadow virtqueue to relay notifications */
+typedef struct VhostShadowVirtqueue {
+    /* Shadow kick notifier, sent to vhost */
+    EventNotifier hdev_kick;
+    /* Shadow call notifier, sent to vhost */
+    EventNotifier hdev_call;
+} VhostShadowVirtqueue;
+
+VhostShadowVirtqueue *vhost_svq_new(void);
+
+void vhost_svq_free(gpointer vq);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
+
+#endif
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
new file mode 100644
index 0000000000..c1db02c53e
--- /dev/null
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -0,0 +1,62 @@
+/*
+ * vhost shadow virtqueue
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
+
+#include "qemu/error-report.h"
+
+/**
+ * Creates vhost shadow virtqueue, and instructs the vhost device to use the
+ * shadow methods and file descriptors.
+ *
+ * Returns the new virtqueue or NULL.
+ *
+ * In case of error, reason is reported through error_report.
+ */
+VhostShadowVirtqueue *vhost_svq_new(void)
+{
+    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
+    int r;
+
+    r = event_notifier_init(&svq->hdev_kick, 0);
+    if (r != 0) {
+        error_report("Couldn't create kick event notifier: %s (%d)",
+                     g_strerror(errno), errno);
+        goto err_init_hdev_kick;
+    }
+
+    r = event_notifier_init(&svq->hdev_call, 0);
+    if (r != 0) {
+        error_report("Couldn't create call event notifier: %s (%d)",
+                     g_strerror(errno), errno);
+        goto err_init_hdev_call;
+    }
+
+    return g_steal_pointer(&svq);
+
+err_init_hdev_call:
+    event_notifier_cleanup(&svq->hdev_kick);
+
+err_init_hdev_kick:
+    return NULL;
+}
+
+/**
+ * Free the resources of the shadow virtqueue.
+ *
+ * @pvq: gpointer to SVQ so it can be used by autofree functions.
+ */
+void vhost_svq_free(gpointer pvq)
+{
+    VhostShadowVirtqueue *vq = pvq;
+    event_notifier_cleanup(&vq->hdev_kick);
+    event_notifier_cleanup(&vq->hdev_call);
+    g_free(vq);
+}
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 521f7d64a8..2dc87613bc 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 02/15] vhost: Add Shadow VirtQueue kick forwarding capabilities
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 01/15] vhost: Add VhostShadowVirtqueue Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 03/15] vhost: Add Shadow VirtQueue call " Eugenio Pérez
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

At this mode no buffer forwarding will be performed in SVQ mode: Qemu
will just forward the guest's kicks to the device.

Host memory notifiers regions are left out for simplicity, and they will
not be addressed in this series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  14 +++
 include/hw/virtio/vhost-vdpa.h     |   4 +
 hw/virtio/vhost-shadow-virtqueue.c |  56 +++++++++++
 hw/virtio/vhost-vdpa.c             | 145 ++++++++++++++++++++++++++++-
 4 files changed, 217 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index f1519e3c7b..1cbc87d5d8 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -18,8 +18,22 @@ typedef struct VhostShadowVirtqueue {
     EventNotifier hdev_kick;
     /* Shadow call notifier, sent to vhost */
     EventNotifier hdev_call;
+
+    /*
+     * Borrowed virtqueue's guest to host notifier. To borrow it in this event
+     * notifier allows to recover the VhostShadowVirtqueue from the event loop
+     * easily. If we use the VirtQueue's one, we don't have an easy way to
+     * retrieve VhostShadowVirtqueue.
+     *
+     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
+     */
+    EventNotifier svq_kick;
 } VhostShadowVirtqueue;
 
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+
+void vhost_svq_stop(VhostShadowVirtqueue *svq);
+
 VhostShadowVirtqueue *vhost_svq_new(void);
 
 void vhost_svq_free(gpointer vq);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 3ce79a646d..009a9f3b6b 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -12,6 +12,8 @@
 #ifndef HW_VIRTIO_VHOST_VDPA_H
 #define HW_VIRTIO_VHOST_VDPA_H
 
+#include <gmodule.h>
+
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
     bool iotlb_batch_begin_sent;
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
+    bool shadow_vqs_enabled;
+    GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index c1db02c53e..c96dbdf152 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,60 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "linux-headers/linux/vhost.h"
+
+/**
+ * Forward guest notifications.
+ *
+ * @n: guest kick event notifier, the one that guest set to notify svq.
+ */
+static void vhost_handle_guest_kick(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             svq_kick);
+    event_notifier_test_and_clear(n);
+    event_notifier_set(&svq->hdev_kick);
+}
+
+/**
+ * Set a new file descriptor for the guest to kick the SVQ and notify for avail
+ *
+ * @svq: The svq
+ * @svq_kick_fd: The svq kick fd
+ *
+ * Note that the SVQ will never close the old file descriptor.
+ */
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+    EventNotifier *svq_kick = &svq->svq_kick;
+    bool poll_stop = VHOST_FILE_UNBIND != event_notifier_get_fd(svq_kick);
+    bool poll_start = svq_kick_fd != VHOST_FILE_UNBIND;
+
+    if (poll_stop) {
+        event_notifier_set_handler(svq_kick, NULL);
+    }
+
+    /*
+     * event_notifier_set_handler already checks for guest's notifications if
+     * they arrive at the new file descriptor in the switch, so there is no
+     * need to explicitly check for them.
+     */
+    if (poll_start) {
+        event_notifier_init_fd(svq_kick, svq_kick_fd);
+        event_notifier_set(svq_kick);
+        event_notifier_set_handler(svq_kick, vhost_handle_guest_kick);
+    }
+}
+
+/**
+ * Stop the shadow virtqueue operation.
+ * @svq: Shadow Virtqueue
+ */
+void vhost_svq_stop(VhostShadowVirtqueue *svq)
+{
+    event_notifier_set_handler(&svq->svq_kick, NULL);
+}
 
 /**
  * Creates vhost shadow virtqueue, and instructs the vhost device to use the
@@ -39,6 +93,7 @@ VhostShadowVirtqueue *vhost_svq_new(void)
         goto err_init_hdev_call;
     }
 
+    event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
@@ -56,6 +111,7 @@ err_init_hdev_kick:
 void vhost_svq_free(gpointer pvq)
 {
     VhostShadowVirtqueue *vq = pvq;
+    vhost_svq_stop(vq);
     event_notifier_cleanup(&vq->hdev_kick);
     event_notifier_cleanup(&vq->hdev_call);
     g_free(vq);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 04ea43704f..1dd799b3ef 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -17,12 +17,14 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio-net.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "exec/address-spaces.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
 #include "trace.h"
 #include "qemu-common.h"
+#include "qapi/error.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -342,6 +344,30 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
     return v->index != 0;
 }
 
+static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
+                               Error **errp)
+{
+    g_autoptr(GPtrArray) shadow_vqs = NULL;
+
+    if (!v->shadow_vqs_enabled) {
+        return 0;
+    }
+
+    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
+    for (unsigned n = 0; n < hdev->nvqs; ++n) {
+        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();
+
+        if (unlikely(!svq)) {
+            error_setg(errp, "Cannot create svq %u", n);
+            return -1;
+        }
+        g_ptr_array_add(shadow_vqs, g_steal_pointer(&svq));
+    }
+
+    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
+    return 0;
+}
+
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 {
     struct vhost_vdpa *v;
@@ -364,6 +390,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     dev->opaque =  opaque ;
     v->listener = vhost_vdpa_memory_listener;
     v->msg_type = VHOST_IOTLB_MSG_V2;
+    ret = vhost_vdpa_init_svq(dev, v, errp);
+    if (ret) {
+        goto err;
+    }
 
     vhost_vdpa_get_iova_range(v);
 
@@ -375,6 +405,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
                                VIRTIO_CONFIG_S_DRIVER);
 
     return 0;
+
+err:
+    ram_block_discard_disable(false);
+    return ret;
 }
 
 static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
@@ -444,8 +478,14 @@ err:
 
 static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int i;
 
+    if (v->shadow_vqs_enabled) {
+        /* FIXME SVQ is not compatible with host notifiers mr */
+        return;
+    }
+
     for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
         if (vhost_vdpa_host_notifier_init(dev, i)) {
             goto err;
@@ -459,6 +499,21 @@ err:
     return;
 }
 
+static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    size_t idx;
+
+    if (!v->shadow_vqs) {
+        return;
+    }
+
+    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
+        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
+    }
+    g_ptr_array_free(v->shadow_vqs, true);
+}
+
 static int vhost_vdpa_cleanup(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v;
@@ -467,6 +522,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     trace_vhost_vdpa_cleanup(dev, v);
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     memory_listener_unregister(&v->listener);
+    vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
     ram_block_discard_disable(false);
@@ -558,11 +614,26 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
     return ret;
 }
 
+static void vhost_vdpa_reset_svq(struct vhost_vdpa *v)
+{
+    if (!v->shadow_vqs_enabled) {
+        return;
+    }
+
+    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+        vhost_svq_stop(svq);
+    }
+}
+
 static int vhost_vdpa_reset_device(struct vhost_dev *dev)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int ret;
     uint8_t status = 0;
 
+    vhost_vdpa_reset_svq(v);
+
     ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
     trace_vhost_vdpa_reset_device(dev, status);
     return ret;
@@ -646,13 +717,75 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
     return ret;
  }
 
+static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
+                                         struct vhost_vring_file *file)
+{
+    trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
+}
+
+/**
+ * Set the shadow virtqueue descriptors to the device
+ *
+ * @dev: The vhost device model
+ * @svq: The shadow virtqueue
+ * @idx: The index of the virtqueue in the vhost device
+ * @errp: Error
+ */
+static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
+                                 VhostShadowVirtqueue *svq,
+                                 unsigned idx,
+                                 Error **errp)
+{
+    struct vhost_vring_file file = {
+        .index = dev->vq_index + idx,
+    };
+    const EventNotifier *event_notifier = &svq->hdev_kick;
+    int r;
+
+    file.fd = event_notifier_get_fd(event_notifier);
+    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
+    if (unlikely(r != 0)) {
+        error_setg_errno(errp, -r, "Can't set device kick fd");
+    }
+
+    return r == 0;
+}
+
+static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    Error *err = NULL;
+    unsigned i;
+
+    if (!v->shadow_vqs) {
+        return true;
+    }
+
+    for (i = 0; i < v->shadow_vqs->len; ++i) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+        bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
+        if (unlikely(!ok)) {
+            error_reportf_err(err, "Cannot setup SVQ %u: ", i);
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
+    bool ok;
     trace_vhost_vdpa_dev_start(dev, started);
 
     if (started) {
         vhost_vdpa_host_notifiers_init(dev);
+        ok = vhost_vdpa_svqs_start(dev);
+        if (unlikely(!ok)) {
+            return -1;
+        }
         vhost_vdpa_set_vring_ready(dev);
     } else {
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
@@ -724,8 +857,16 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
 static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
-    trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
-    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
+    struct vhost_vdpa *v = dev->opaque;
+    int vdpa_idx = file->index - dev->vq_index;
+
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+        vhost_svq_set_svq_kick_fd(svq, file->fd);
+        return 0;
+    } else {
+        return vhost_vdpa_set_vring_dev_kick(dev, file);
+    }
 }
 
 static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 03/15] vhost: Add Shadow VirtQueue call forwarding capabilities
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 01/15] vhost: Add VhostShadowVirtqueue Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 02/15] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 04/15] vhost: Add vhost_svq_valid_features to shadow vq Eugenio Pérez
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This will make qemu aware of the device used buffers, allowing it to
write the guest memory with its contents if needed.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
 hw/virtio/vhost-shadow-virtqueue.c | 38 ++++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c             | 31 ++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 1cbc87d5d8..cbc5213579 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -28,9 +28,13 @@ typedef struct VhostShadowVirtqueue {
      * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
      */
     EventNotifier svq_kick;
+
+    /* Guest's call notifier, where the SVQ calls guest. */
+    EventNotifier svq_call;
 } VhostShadowVirtqueue;
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index c96dbdf152..5c1e09be5d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -27,6 +27,42 @@ static void vhost_handle_guest_kick(EventNotifier *n)
     event_notifier_set(&svq->hdev_kick);
 }
 
+/**
+ * Forward vhost notifications
+ *
+ * @n: hdev call event notifier, the one that device set to notify svq.
+ */
+static void vhost_svq_handle_call(EventNotifier *n)
+{
+    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+                                             hdev_call);
+    event_notifier_test_and_clear(n);
+    event_notifier_set(&svq->svq_call);
+}
+
+/**
+ * Set the call notifier for the SVQ to call the guest
+ *
+ * @svq: Shadow virtqueue
+ * @call_fd: call notifier
+ *
+ * Called on BQL context.
+ */
+void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd)
+{
+    if (call_fd == VHOST_FILE_UNBIND) {
+        /*
+         * Fail event_notifier_set if called handling device call.
+         *
+         * SVQ still needs device notifications, since it needs to keep
+         * forwarding used buffers even with the unbind.
+         */
+        memset(&svq->svq_call, 0, sizeof(svq->svq_call));
+    } else {
+        event_notifier_init_fd(&svq->svq_call, call_fd);
+    }
+}
+
 /**
  * Set a new file descriptor for the guest to kick the SVQ and notify for avail
  *
@@ -94,6 +130,7 @@ VhostShadowVirtqueue *vhost_svq_new(void)
     }
 
     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
+    event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
@@ -113,6 +150,7 @@ void vhost_svq_free(gpointer pvq)
     VhostShadowVirtqueue *vq = pvq;
     vhost_svq_stop(vq);
     event_notifier_cleanup(&vq->hdev_kick);
+    event_notifier_set_handler(&vq->hdev_call, NULL);
     event_notifier_cleanup(&vq->hdev_call);
     g_free(vq);
 }
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 1dd799b3ef..d5865a5d77 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -724,6 +724,13 @@ static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
     return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
 }
 
+static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
+                                         struct vhost_vring_file *file)
+{
+    trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
+}
+
 /**
  * Set the shadow virtqueue descriptors to the device
  *
@@ -731,6 +738,9 @@ static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
  * @svq: The shadow virtqueue
  * @idx: The index of the virtqueue in the vhost device
  * @errp: Error
+ *
+ * Note that this function does not rewind kick file descriptor if cannot set
+ * call one.
  */
 static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
                                  VhostShadowVirtqueue *svq,
@@ -747,6 +757,14 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
     r = vhost_vdpa_set_vring_dev_kick(dev, &file);
     if (unlikely(r != 0)) {
         error_setg_errno(errp, -r, "Can't set device kick fd");
+        return false;
+    }
+
+    event_notifier = &svq->hdev_call;
+    file.fd = event_notifier_get_fd(event_notifier);
+    r = vhost_vdpa_set_vring_dev_call(dev, &file);
+    if (unlikely(r != 0)) {
+        error_setg_errno(errp, -r, "Can't set device call fd");
     }
 
     return r == 0;
@@ -872,8 +890,17 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
 static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
                                        struct vhost_vring_file *file)
 {
-    trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
-    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (v->shadow_vqs_enabled) {
+        int vdpa_idx = file->index - dev->vq_index;
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+
+        vhost_svq_set_svq_call_fd(svq, file->fd);
+        return 0;
+    } else {
+        return vhost_vdpa_set_vring_dev_call(dev, file);
+    }
 }
 
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 04/15] vhost: Add vhost_svq_valid_features to shadow vq
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 03/15] vhost: Add Shadow VirtQueue call " Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 05/15] virtio: Add vhost_svq_get_vring_addr Eugenio Pérez
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This allows SVQ to negotiate features with the guest and the device. For
the device, SVQ is a driver. While this function bypasses all
non-transport features, it needs to disable the features that SVQ does
not support when forwarding buffers. This includes packed vq layout,
indirect descriptors or event idx.

Future changes can add support to offer more features to the guest,
since the use of VirtQueue gives this for free. This is left out at the
moment for simplicity.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c             | 15 ++++++++++
 3 files changed, 61 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index cbc5213579..9e12f77201 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -33,6 +33,8 @@ typedef struct VhostShadowVirtqueue {
     EventNotifier svq_call;
 } VhostShadowVirtqueue;
 
+bool vhost_svq_valid_features(uint64_t features, Error **errp);
+
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 5c1e09be5d..280736e30d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,9 +11,53 @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
 #include "qemu/error-report.h"
+#include "qapi/error.h"
 #include "qemu/main-loop.h"
 #include "linux-headers/linux/vhost.h"
 
+/**
+ * Validate the transport device features that both guests can use with the SVQ
+ * and SVQs can use with the device.
+ *
+ * @dev_features: The features
+ * @errp: Error pointer
+ */
+bool vhost_svq_valid_features(uint64_t features, Error **errp)
+{
+    bool ok = true;
+    uint64_t svq_features = features;
+
+    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
+         ++b) {
+        switch (b) {
+        case VIRTIO_F_ANY_LAYOUT:
+            continue;
+
+        case VIRTIO_F_ACCESS_PLATFORM:
+            /* SVQ trust in the host's IOMMU to translate addresses */
+        case VIRTIO_F_VERSION_1:
+            /* SVQ trust that the guest vring is little endian */
+            if (!(svq_features & BIT_ULL(b))) {
+                set_bit(b, &svq_features);
+                ok = false;
+            }
+            continue;
+
+        default:
+            if (svq_features & BIT_ULL(b)) {
+                clear_bit(b, &svq_features);
+                ok = false;
+            }
+        }
+    }
+
+    if (!ok) {
+        error_setg(errp, "SVQ Invalid device feature flags, offer: 0x%"PRIx64
+                         ", ok: 0x%"PRIx64, features, svq_features);
+    }
+    return ok;
+}
+
 /**
  * Forward guest notifications.
  *
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d5865a5d77..77ad56e06c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -348,11 +348,26 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
                                Error **errp)
 {
     g_autoptr(GPtrArray) shadow_vqs = NULL;
+    uint64_t dev_features, svq_features;
+    int r;
+    bool ok;
 
     if (!v->shadow_vqs_enabled) {
         return 0;
     }
 
+    r = hdev->vhost_ops->vhost_get_features(hdev, &dev_features);
+    if (r != 0) {
+        error_setg_errno(errp, -r, "Can't get vdpa device features");
+        return r;
+    }
+
+    svq_features = dev_features;
+    ok = vhost_svq_valid_features(svq_features, errp);
+    if (unlikely(!ok)) {
+        return -1;
+    }
+
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
         g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 05/15] virtio: Add vhost_svq_get_vring_addr
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 04/15] vhost: Add vhost_svq_valid_features to shadow vq Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 06/15] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

It reports the shadow virtqueue address from qemu virtual address space.

Since this will be different from the guest's vaddr, but the device can
access it, SVQ takes special care about its alignment & lack of garbage
data. It assumes that IOMMU will work in host_page_size ranges for that.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  9 +++++++++
 hw/virtio/vhost-shadow-virtqueue.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 9e12f77201..82cea1c3fa 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -11,9 +11,14 @@
 #define VHOST_SHADOW_VIRTQUEUE_H
 
 #include "qemu/event_notifier.h"
+#include "hw/virtio/virtio.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
+    /* Shadow vring */
+    struct vring vring;
+
     /* Shadow kick notifier, sent to vhost */
     EventNotifier hdev_kick;
     /* Shadow call notifier, sent to vhost */
@@ -37,6 +42,10 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+                              struct vhost_vring_addr *addr);
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 280736e30d..b44759e1a4 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -107,6 +107,35 @@ void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd)
     }
 }
 
+/**
+ * Get the shadow vq vring address.
+ * @svq: Shadow virtqueue
+ * @addr: Destination to store address
+ */
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+                              struct vhost_vring_addr *addr)
+{
+    addr->desc_user_addr = (uint64_t)svq->vring.desc;
+    addr->avail_user_addr = (uint64_t)svq->vring.avail;
+    addr->used_user_addr = (uint64_t)svq->vring.used;
+}
+
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
+{
+    size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
+    size_t avail_size = offsetof(vring_avail_t, ring) +
+                                             sizeof(uint16_t) * svq->vring.num;
+
+    return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size);
+}
+
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
+{
+    size_t used_size = offsetof(vring_used_t, ring) +
+                                    sizeof(vring_used_elem_t) * svq->vring.num;
+    return ROUND_UP(used_size, qemu_real_host_page_size);
+}
+
 /**
  * Set a new file descriptor for the guest to kick the SVQ and notify for avail
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 06/15] vdpa: adapt vhost_ops callbacks to svq
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 05/15] virtio: Add vhost_svq_get_vring_addr Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 07/15] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

First half of the buffers forwarding part, preparing vhost-vdpa
callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
this is effectively dead code at the moment, but it helps to reduce
patch size.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 77ad56e06c..6a7575f13e 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -732,6 +732,13 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
     return ret;
  }
 
+static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
+                                         struct vhost_vring_state *ring)
+{
+    trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
+}
+
 static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
                                          struct vhost_vring_file *file)
 {
@@ -746,6 +753,18 @@ static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
     return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }
 
+static int vhost_vdpa_set_vring_dev_addr(struct vhost_dev *dev,
+                                         struct vhost_vring_addr *addr)
+{
+    trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
+                                addr->desc_user_addr, addr->used_user_addr,
+                                addr->avail_user_addr,
+                                addr->log_guest_addr);
+
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
+
+}
+
 /**
  * Set the shadow virtqueue descriptors to the device
  *
@@ -856,11 +875,17 @@ static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
 static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
                                        struct vhost_vring_addr *addr)
 {
-    trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
-                                    addr->desc_user_addr, addr->used_user_addr,
-                                    addr->avail_user_addr,
-                                    addr->log_guest_addr);
-    return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (v->shadow_vqs_enabled) {
+        /*
+         * Device vring addr was set at device start. SVQ base is handled by
+         * VirtQueue code.
+         */
+        return 0;
+    }
+
+    return vhost_vdpa_set_vring_dev_addr(dev, addr);
 }
 
 static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
@@ -873,8 +898,17 @@ static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
 static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
-    trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
-    return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (v->shadow_vqs_enabled) {
+        /*
+         * Device vring base was set at device start. SVQ base is handled by
+         * VirtQueue code.
+         */
+        return 0;
+    }
+
+    return vhost_vdpa_set_dev_vring_base(dev, ring);
 }
 
 static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 07/15] vhost: Shadow virtqueue buffers forwarding
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 06/15] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 08/15] util: Add iova_tree_alloc_map Eugenio Pérez
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

Initial version of shadow virtqueue that actually forward buffers. There
is no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.

For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.

It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review.

However, forwarding buffers have some particular pieces: One of the most
unexpected ones is that a guest's buffer can expand through more than
one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it by checking for this condition at both guest's
kicks and device's calls. The code may be more elegant in the future if
SVQ code runs in its own iocontext.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  26 +++
 hw/virtio/vhost-shadow-virtqueue.c | 353 ++++++++++++++++++++++++++++-
 hw/virtio/vhost-vdpa.c             | 159 ++++++++++++-
 3 files changed, 526 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 82cea1c3fa..38b3b91ca7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -36,6 +36,30 @@ typedef struct VhostShadowVirtqueue {
 
     /* Guest's call notifier, where the SVQ calls guest. */
     EventNotifier svq_call;
+
+    /* Virtio queue shadowing */
+    VirtQueue *vq;
+
+    /* Virtio device */
+    VirtIODevice *vdev;
+
+    /* Map for use the guest's descriptors */
+    VirtQueueElement **ring_id_maps;
+
+    /* Next VirtQueue element that guest made available */
+    VirtQueueElement *next_guest_avail_elem;
+
+    /* Next head to expose to the device */
+    uint16_t shadow_avail_idx;
+
+    /* Next free descriptor */
+    uint16_t free_head;
+
+    /* Last seen used idx */
+    uint16_t shadow_used_idx;
+
+    /* Next head to consume from the device */
+    uint16_t last_used_idx;
 } VhostShadowVirtqueue;
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
@@ -47,6 +71,8 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
 size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
 size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+                     VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(void);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index b44759e1a4..5543a50222 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -13,6 +13,7 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "linux-headers/linux/vhost.h"
 
 /**
@@ -59,29 +60,310 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp)
 }
 
 /**
- * Forward guest notifications.
+ * Number of descriptors that the SVQ can make available from the guest.
+ *
+ * @svq: The svq
+ */
+static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
+{
+    return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
+}
+
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+                                    const struct iovec *iovec,
+                                    size_t num, bool more_descs, bool write)
+{
+    uint16_t i = svq->free_head, last = svq->free_head;
+    unsigned n;
+    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
+    vring_desc_t *descs = svq->vring.desc;
+
+    if (num == 0) {
+        return;
+    }
+
+    for (n = 0; n < num; n++) {
+        if (more_descs || (n + 1 < num)) {
+            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
+        } else {
+            descs[i].flags = flags;
+        }
+        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
+        descs[i].len = cpu_to_le32(iovec[n].iov_len);
+
+        last = i;
+        i = cpu_to_le16(descs[i].next);
+    }
+
+    svq->free_head = le16_to_cpu(descs[last].next);
+}
+
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
+                                VirtQueueElement *elem,
+                                unsigned *head)
+{
+    unsigned avail_idx;
+    vring_avail_t *avail = svq->vring.avail;
+
+    *head = svq->free_head;
+
+    /* We need some descriptors here */
+    if (unlikely(!elem->out_num && !elem->in_num)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "Guest provided element with no descriptors");
+        return false;
+    }
+
+    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
+                            elem->in_num > 0, false);
+    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
+
+    /*
+     * Put the entry in the available array (but don't update avail->idx until
+     * they do sync).
+     */
+    avail_idx = svq->shadow_avail_idx & (svq->vring.num - 1);
+    avail->ring[avail_idx] = cpu_to_le16(*head);
+    svq->shadow_avail_idx++;
+
+    /* Update the avail index after write the descriptor */
+    smp_wmb();
+    avail->idx = cpu_to_le16(svq->shadow_avail_idx);
+
+    return true;
+}
+
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+{
+    unsigned qemu_head;
+    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    svq->ring_id_maps[qemu_head] = elem;
+    return true;
+}
+
+static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+{
+    /*
+     * We need to expose the available array entries before checking the used
+     * flags
+     */
+    smp_mb();
+    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
+        return;
+    }
+
+    event_notifier_set(&svq->hdev_kick);
+}
+
+/**
+ * Forward available buffers.
+ *
+ * @svq: Shadow VirtQueue
+ *
+ * Note that this function does not guarantee that all guest's available
+ * buffers are available to the device in SVQ avail ring. The guest may have
+ * exposed a GPA / GIOVA contiguous buffer, but it may not be contiguous in
+ * qemu vaddr.
+ *
+ * If that happens, guest's kick notifications will be disabled until the
+ * device uses some buffers.
+ */
+static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
+{
+    /* Clear event notifier */
+    event_notifier_test_and_clear(&svq->svq_kick);
+
+    /* Forward to the device as many available buffers as possible */
+    do {
+        virtio_queue_set_notification(svq->vq, false);
+
+        while (true) {
+            VirtQueueElement *elem;
+            bool ok;
+
+            if (svq->next_guest_avail_elem) {
+                elem = g_steal_pointer(&svq->next_guest_avail_elem);
+            } else {
+                elem = virtqueue_pop(svq->vq, sizeof(*elem));
+            }
+
+            if (!elem) {
+                break;
+            }
+
+            if (elem->out_num + elem->in_num >
+                vhost_svq_available_slots(svq)) {
+                /*
+                 * This condition is possible since a contiguous buffer in GPA
+                 * does not imply a contiguous buffer in qemu's VA
+                 * scatter-gather segments. If that happens, the buffer exposed
+                 * to the device needs to be a chain of descriptors at this
+                 * moment.
+                 *
+                 * SVQ cannot hold more available buffers if we are here:
+                 * queue the current guest descriptor and ignore further kicks
+                 * until some elements are used.
+                 */
+                svq->next_guest_avail_elem = elem;
+                return;
+            }
+
+            ok = vhost_svq_add(svq, elem);
+            if (unlikely(!ok)) {
+                /* VQ is broken, just return and ignore any other kicks */
+                return;
+            }
+            vhost_svq_kick(svq);
+        }
+
+        virtio_queue_set_notification(svq->vq, true);
+    } while (!virtio_queue_empty(svq->vq));
+}
+
+/**
+ * Handle guest's kick.
  *
  * @n: guest kick event notifier, the one that guest set to notify svq.
  */
-static void vhost_handle_guest_kick(EventNotifier *n)
+static void vhost_handle_guest_kick_notifier(EventNotifier *n)
 {
     VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
                                              svq_kick);
     event_notifier_test_and_clear(n);
-    event_notifier_set(&svq->hdev_kick);
+    vhost_handle_guest_kick(svq);
+}
+
+static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
+{
+    if (svq->last_used_idx != svq->shadow_used_idx) {
+        return true;
+    }
+
+    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
+
+    return svq->last_used_idx != svq->shadow_used_idx;
 }
 
 /**
- * Forward vhost notifications
+ * Enable vhost device calls after disable them.
+ *
+ * @svq: The svq
+ *
+ * It returns false if there are pending used buffers from the vhost device,
+ * avoiding the possible races between SVQ checking for more work and enabling
+ * callbacks. True if SVQ used vring has no more pending buffers.
+ */
+static bool vhost_svq_enable_notification(VhostShadowVirtqueue *svq)
+{
+    svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+    /* Make sure the flag is written before the read of used_idx */
+    smp_mb();
+    return !vhost_svq_more_used(svq);
+}
+
+static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
+{
+    svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+}
+
+static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
+                                           uint32_t *len)
+{
+    vring_desc_t *descs = svq->vring.desc;
+    const vring_used_t *used = svq->vring.used;
+    vring_used_elem_t used_elem;
+    uint16_t last_used;
+
+    if (!vhost_svq_more_used(svq)) {
+        return NULL;
+    }
+
+    /* Only get used array entries after they have been exposed by dev */
+    smp_rmb();
+    last_used = svq->last_used_idx & (svq->vring.num - 1);
+    used_elem.id = le32_to_cpu(used->ring[last_used].id);
+    used_elem.len = le32_to_cpu(used->ring[last_used].len);
+
+    svq->last_used_idx++;
+    if (unlikely(used_elem.id >= svq->vring.num)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
+                      svq->vdev->name, used_elem.id);
+        return NULL;
+    }
+
+    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+            "Device %s says index %u is used, but it was not available",
+            svq->vdev->name, used_elem.id);
+        return NULL;
+    }
+
+    descs[used_elem.id].next = svq->free_head;
+    svq->free_head = used_elem.id;
+
+    *len = used_elem.len;
+    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
+}
+
+static void vhost_svq_flush(VhostShadowVirtqueue *svq,
+                            bool check_for_avail_queue)
+{
+    VirtQueue *vq = svq->vq;
+
+    /* Forward as many used buffers as possible. */
+    do {
+        unsigned i = 0;
+
+        vhost_svq_disable_notification(svq);
+        while (true) {
+            uint32_t len;
+            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
+            if (!elem) {
+                break;
+            }
+
+            if (unlikely(i >= svq->vring.num)) {
+                qemu_log_mask(LOG_GUEST_ERROR,
+                         "More than %u used buffers obtained in a %u size SVQ",
+                         i, svq->vring.num);
+                virtqueue_fill(vq, elem, len, i);
+                virtqueue_flush(vq, i);
+                return;
+            }
+            virtqueue_fill(vq, elem, len, i++);
+        }
+
+        virtqueue_flush(vq, i);
+        event_notifier_set(&svq->svq_call);
+
+        if (check_for_avail_queue && svq->next_guest_avail_elem) {
+            /*
+             * Avail ring was full when vhost_svq_flush was called, so it's a
+             * good moment to make more descriptors available if possible.
+             */
+            vhost_handle_guest_kick(svq);
+        }
+    } while (!vhost_svq_enable_notification(svq));
+}
+
+/**
+ * Forward used buffers.
  *
  * @n: hdev call event notifier, the one that device set to notify svq.
+ *
+ * Note that we are not making any buffers available in the loop, there is no
+ * way that it runs more than virtqueue size times.
  */
 static void vhost_svq_handle_call(EventNotifier *n)
 {
     VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
                                              hdev_call);
     event_notifier_test_and_clear(n);
-    event_notifier_set(&svq->svq_call);
+    vhost_svq_flush(svq, true);
 }
 
 /**
@@ -162,7 +444,41 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
     if (poll_start) {
         event_notifier_init_fd(svq_kick, svq_kick_fd);
         event_notifier_set(svq_kick);
-        event_notifier_set_handler(svq_kick, vhost_handle_guest_kick);
+        event_notifier_set_handler(svq_kick, vhost_handle_guest_kick_notifier);
+    }
+}
+
+/**
+ * Start the shadow virtqueue operation.
+ *
+ * @svq: Shadow Virtqueue
+ * @vdev: VirtIO device
+ * @vq: Virtqueue to shadow
+ */
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+                     VirtQueue *vq)
+{
+    size_t desc_size, driver_size, device_size;
+
+    svq->next_guest_avail_elem = NULL;
+    svq->shadow_avail_idx = 0;
+    svq->shadow_used_idx = 0;
+    svq->last_used_idx = 0;
+    svq->vdev = vdev;
+    svq->vq = vq;
+
+    svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq));
+    driver_size = vhost_svq_driver_area_size(svq);
+    device_size = vhost_svq_device_area_size(svq);
+    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
+    desc_size = sizeof(vring_desc_t) * svq->vring.num;
+    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
+    memset(svq->vring.desc, 0, driver_size);
+    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
+    memset(svq->vring.used, 0, device_size);
+    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
+        svq->vring.desc[i].next = cpu_to_le16(i + 1);
     }
 }
 
@@ -173,6 +489,31 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
 void vhost_svq_stop(VhostShadowVirtqueue *svq)
 {
     event_notifier_set_handler(&svq->svq_kick, NULL);
+    g_autofree VirtQueueElement *next_avail_elem = NULL;
+
+    if (!svq->vq) {
+        return;
+    }
+
+    /* Send all pending used descriptors to guest */
+    vhost_svq_flush(svq, false);
+
+    for (unsigned i = 0; i < svq->vring.num; ++i) {
+        g_autofree VirtQueueElement *elem = NULL;
+        elem = g_steal_pointer(&svq->ring_id_maps[i]);
+        if (elem) {
+            virtqueue_detach_element(svq->vq, elem, 0);
+        }
+    }
+
+    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
+    if (next_avail_elem) {
+        virtqueue_detach_element(svq->vq, next_avail_elem, 0);
+    }
+    svq->vq = NULL;
+    g_free(svq->ring_id_maps);
+    qemu_vfree(svq->vring.desc);
+    qemu_vfree(svq->vring.used);
 }
 
 /**
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 6a7575f13e..a9dc7f0fce 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -776,10 +776,10 @@ static int vhost_vdpa_set_vring_dev_addr(struct vhost_dev *dev,
  * Note that this function does not rewind kick file descriptor if cannot set
  * call one.
  */
-static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
-                                 VhostShadowVirtqueue *svq,
-                                 unsigned idx,
-                                 Error **errp)
+static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
+                                  VhostShadowVirtqueue *svq,
+                                  unsigned idx,
+                                  Error **errp)
 {
     struct vhost_vring_file file = {
         .index = dev->vq_index + idx,
@@ -791,7 +791,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
     r = vhost_vdpa_set_vring_dev_kick(dev, &file);
     if (unlikely(r != 0)) {
         error_setg_errno(errp, -r, "Can't set device kick fd");
-        return false;
+        return r;
     }
 
     event_notifier = &svq->hdev_call;
@@ -801,6 +801,96 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
         error_setg_errno(errp, -r, "Can't set device call fd");
     }
 
+    return r;
+}
+
+/**
+ * Unmap a SVQ area in the device
+ */
+static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
+                                      hwaddr size)
+{
+    int r;
+
+    size = ROUND_UP(size, qemu_real_host_page_size);
+    r = vhost_vdpa_dma_unmap(v, iova, size);
+    return r == 0;
+}
+
+static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
+                                       const VhostShadowVirtqueue *svq)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    struct vhost_vring_addr svq_addr;
+    size_t device_size = vhost_svq_device_area_size(svq);
+    size_t driver_size = vhost_svq_driver_area_size(svq);
+    bool ok;
+
+    vhost_svq_get_vring_addr(svq, &svq_addr);
+
+    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
+}
+
+/**
+ * Map the shadow virtqueue rings in the device
+ *
+ * @dev: The vhost device
+ * @svq: The shadow virtqueue
+ * @addr: Assigned IOVA addresses
+ * @errp: Error pointer
+ */
+static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
+                                     const VhostShadowVirtqueue *svq,
+                                     struct vhost_vring_addr *addr,
+                                     Error **errp)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    size_t device_size = vhost_svq_device_area_size(svq);
+    size_t driver_size = vhost_svq_driver_area_size(svq);
+    int r;
+
+    ERRP_GUARD();
+    vhost_svq_get_vring_addr(svq, addr);
+
+    r = vhost_vdpa_dma_map(v, addr->desc_user_addr, driver_size,
+                           (void *)addr->desc_user_addr, true);
+    if (unlikely(r != 0)) {
+        error_setg_errno(errp, -r, "Cannot create vq driver region: ");
+        return false;
+    }
+
+    r = vhost_vdpa_dma_map(v, addr->used_user_addr, device_size,
+                           (void *)addr->used_user_addr, false);
+    if (unlikely(r != 0)) {
+        error_setg_errno(errp, -r, "Cannot create vq device region: ");
+    }
+
+    return r == 0;
+}
+
+static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
+                                 VhostShadowVirtqueue *svq,
+                                 unsigned idx,
+                                 Error **errp)
+{
+    uint16_t vq_index = dev->vq_index + idx;
+    struct vhost_vring_state s = {
+        .index = vq_index,
+    };
+    int r;
+
+    r = vhost_vdpa_set_dev_vring_base(dev, &s);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "Cannot set vring base");
+        return false;
+    }
+
+    r = vhost_vdpa_svq_set_fds(dev, svq, idx, errp);
     return r == 0;
 }
 
@@ -815,10 +905,63 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
     }
 
     for (i = 0; i < v->shadow_vqs->len; ++i) {
+        VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
+        struct vhost_vring_addr addr = {
+            .index = i,
+        };
+        int r;
         bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
         if (unlikely(!ok)) {
-            error_reportf_err(err, "Cannot setup SVQ %u: ", i);
+            goto err;
+        }
+
+        vhost_svq_start(svq, dev->vdev, vq);
+        ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err);
+        if (unlikely(!ok)) {
+            goto err_map;
+        }
+
+        /* Override vring GPA set by vhost subsystem */
+        r = vhost_vdpa_set_vring_dev_addr(dev, &addr);
+        if (unlikely(r != 0)) {
+            error_setg_errno(&err, -r, "Cannot set device address");
+            goto err_set_addr;
+        }
+    }
+
+    return true;
+
+err_set_addr:
+    vhost_vdpa_svq_unmap_rings(dev, g_ptr_array_index(v->shadow_vqs, i));
+
+err_map:
+    vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, i));
+
+err:
+    error_reportf_err(err, "Cannot setup SVQ %u: ", i);
+    for (unsigned j = 0; j < i; ++j) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, j);
+        vhost_vdpa_svq_unmap_rings(dev, svq);
+        vhost_svq_stop(svq);
+    }
+
+    return false;
+}
+
+static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (!v->shadow_vqs) {
+        return true;
+    }
+
+    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                      i);
+        bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
+        if (unlikely(!ok)) {
             return false;
         }
     }
@@ -840,6 +983,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         }
         vhost_vdpa_set_vring_ready(dev);
     } else {
+        ok = vhost_vdpa_svqs_stop(dev);
+        if (unlikely(!ok)) {
+            return -1;
+        }
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 08/15] util: Add iova_tree_alloc_map
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 07/15] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 09/15] util: add iova_tree_find_iova Eugenio Pérez
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This iova tree function allows it to look for a hole in allocated
regions and return a totally new translation for a given translated
address.

It's usage is mainly to allow devices to access qemu address space,
remapping guest's one into a new iova space where qemu can add chunks of
addresses.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/iova-tree.h |  18 ++++++
 util/iova-tree.c         | 135 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8249edd764..d066400f09 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -29,6 +29,7 @@
 #define  IOVA_OK           (0)
 #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
 #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
 
 typedef struct IOVATree IOVATree;
 typedef struct DMAMap {
@@ -119,6 +120,23 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
  */
 void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
 
+/**
+ * iova_tree_alloc_map:
+ *
+ * @tree: the iova tree to allocate from
+ * @map: the new map (as translated addr & size) to allocate in the iova region
+ * @iova_begin: the minimum address of the allocation
+ * @iova_end: the maximum addressable direction of the allocation
+ *
+ * Allocates a new region of a given size, between iova_min and iova_max.
+ *
+ * Return: Same as iova_tree_insert, but cannot overlap and can return error if
+ * iova tree is out of free contiguous range. The caller gets the assigned iova
+ * in map->iova.
+ */
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                        hwaddr iova_end);
+
 /**
  * iova_tree_destroy:
  *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 23ea35b7a4..3160c50d3b 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -16,6 +16,39 @@ struct IOVATree {
     GTree *tree;
 };
 
+/* Args to pass to iova_tree_alloc foreach function. */
+struct IOVATreeAllocArgs {
+    /* Size of the desired allocation */
+    size_t new_size;
+
+    /* The minimum address allowed in the allocation */
+    hwaddr iova_begin;
+
+    /* Map at the left of the hole, can be NULL if "this" is first one */
+    const DMAMap *prev;
+
+    /* Map at the right of the hole, can be NULL if "prev" is the last one */
+    const DMAMap *this;
+
+    /* If found, we fill in the IOVA here */
+    hwaddr iova_result;
+
+    /* Whether have we found a valid IOVA */
+    bool iova_found;
+};
+
+/**
+ * Iterate args to the next hole
+ *
+ * @args: The alloc arguments
+ * @next: The next mapping in the tree. Can be NULL to signal the last one
+ */
+static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
+                                         const DMAMap *next) {
+    args->prev = args->this;
+    args->this = next;
+}
+
 static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
 {
     const DMAMap *m1 = a, *m2 = b;
@@ -107,6 +140,108 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
     return IOVA_OK;
 }
 
+/**
+ * Try to find an unallocated IOVA range between prev and this elements.
+ *
+ * @args: Arguments to allocation
+ *
+ * Cases:
+ *
+ * (1) !prev, !this: No entries allocated, always succeed
+ *
+ * (2) !prev, this: We're iterating at the 1st element.
+ *
+ * (3) prev, !this: We're iterating at the last element.
+ *
+ * (4) prev, this: this is the most common case, we'll try to find a hole
+ * between "prev" and "this" mapping.
+ *
+ * Note that this function assumes the last valid iova is HWADDR_MAX, but it
+ * searches linearly so it's easy to discard the result if it's not the case.
+ */
+static void iova_tree_alloc_map_in_hole(struct IOVATreeAllocArgs *args)
+{
+    const DMAMap *prev = args->prev, *this = args->this;
+    uint64_t hole_start, hole_last;
+
+    if (this && this->iova + this->size < args->iova_begin) {
+        return;
+    }
+
+    hole_start = MAX(prev ? prev->iova + prev->size + 1 : 0, args->iova_begin);
+    hole_last = this ? this->iova : HWADDR_MAX;
+
+    if (hole_last - hole_start > args->new_size) {
+        args->iova_result = hole_start;
+        args->iova_found = true;
+    }
+}
+
+/**
+ * Foreach dma node in the tree, compare if there is a hole with its previous
+ * node (or minimum iova address allowed) and the node.
+ *
+ * @key: Node iterating
+ * @value: Node iterating
+ * @pargs: Struct to communicate with the outside world
+ *
+ * Return: false to keep iterating, true if needs break.
+ */
+static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
+                                         gpointer pargs)
+{
+    struct IOVATreeAllocArgs *args = pargs;
+    DMAMap *node = value;
+
+    assert(key == value);
+
+    iova_tree_alloc_args_iterate(args, node);
+    iova_tree_alloc_map_in_hole(args);
+    return args->iova_found;
+}
+
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+                        hwaddr iova_last)
+{
+    struct IOVATreeAllocArgs args = {
+        .new_size = map->size,
+        .iova_begin = iova_begin,
+    };
+
+    if (unlikely(iova_last < iova_begin)) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /*
+     * Find a valid hole for the mapping
+     *
+     * Assuming low iova_begin, so no need to do a binary search to
+     * locate the first node.
+     *
+     * TODO: Replace all this with g_tree_node_first/next/last when available
+     * (from glib since 2.68). To do it with g_tree_foreach complicates the
+     * code a lot.
+     *
+     */
+    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
+    if (!args.iova_found) {
+        /*
+         * Either tree is empty or the last hole is still not checked.
+         * g_tree_foreach does not compare (last, iova_last] range, so we check
+         * it here.
+         */
+        iova_tree_alloc_args_iterate(&args, NULL);
+        iova_tree_alloc_map_in_hole(&args);
+    }
+
+    if (!args.iova_found || args.iova_result + map->size > iova_last) {
+        return IOVA_ERR_NOMEM;
+    }
+
+    map->iova = args.iova_result;
+    return iova_tree_insert(tree, map);
+}
+
 void iova_tree_destroy(IOVATree *tree)
 {
     g_tree_destroy(tree->tree);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 09/15] util: add iova_tree_find_iova
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 08/15] util: Add iova_tree_alloc_map Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 10/15] vhost: Add VhostIOVATree Eugenio Pérez
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This function does the reverse operation of iova_tree_find: To look for
a mapping that match a translated address so we can do the reverse.

This have linear complexity instead of logarithmic, but it supports
overlapping HVA. Future developments could reduce it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/qemu/iova-tree.h | 20 +++++++++++++++++++-
 util/iova-tree.c         | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index d066400f09..c938fb0793 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -83,7 +83,7 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map);
  * @tree: the iova tree to search from
  * @map: the mapping to search
  *
- * Search for a mapping in the iova tree that overlaps with the
+ * Search for a mapping in the iova tree that iova overlaps with the
  * mapping range specified.  Only the first found mapping will be
  * returned.
  *
@@ -95,6 +95,24 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map);
  */
 const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map);
 
+/**
+ * iova_tree_find_iova:
+ *
+ * @tree: the iova tree to search from
+ * @map: the mapping to search
+ *
+ * Search for a mapping in the iova tree that translated_addr overlaps with the
+ * mapping range specified.  Only the first found mapping will be
+ * returned.
+ *
+ * Return: DMAMap pointer if found, or NULL if not found.  Note that
+ * the returned DMAMap pointer is maintained internally.  User should
+ * only read the content but never modify or free the content.  Also,
+ * user is responsible to make sure the pointer is valid (say, no
+ * concurrent deletion in progress).
+ */
+const DMAMap *iova_tree_find_iova(const IOVATree *tree, const DMAMap *map);
+
 /**
  * iova_tree_find_address:
  *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 3160c50d3b..f015598977 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -37,6 +37,11 @@ struct IOVATreeAllocArgs {
     bool iova_found;
 };
 
+typedef struct IOVATreeFindIOVAArgs {
+    const DMAMap *needle;
+    const DMAMap *result;
+} IOVATreeFindIOVAArgs;
+
 /**
  * Iterate args to the next hole
  *
@@ -80,6 +85,35 @@ const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map)
     return g_tree_lookup(tree->tree, map);
 }
 
+static gboolean iova_tree_find_address_iterator(gpointer key, gpointer value,
+                                                gpointer data)
+{
+    const DMAMap *map = key;
+    IOVATreeFindIOVAArgs *args = data;
+    const DMAMap *needle;
+
+    g_assert(key == value);
+
+    needle = args->needle;
+    if (map->translated_addr + map->size < needle->translated_addr ||
+        needle->translated_addr + needle->size < map->translated_addr) {
+        return false;
+    }
+
+    args->result = map;
+    return true;
+}
+
+const DMAMap *iova_tree_find_iova(const IOVATree *tree, const DMAMap *map)
+{
+    IOVATreeFindIOVAArgs args = {
+        .needle = map,
+    };
+
+    g_tree_foreach(tree->tree, iova_tree_find_address_iterator, &args);
+    return args.result;
+}
+
 const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova)
 {
     const DMAMap map = { .iova = iova, .size = 0 };
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 10/15] vhost: Add VhostIOVATree
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 09/15] util: add iova_tree_find_iova Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 11/15] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This tree is able to look for a translated address from an IOVA address.

At first glance it is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities, like allocating
IOVA chunks or performing reverse translations (qemu addresses to iova).

The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of it.

It duplicates the tree so it can search efficiently in both directions,
and it will signal overlap if iova or the translated address is present
in any tree.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  27 +++++++++
 hw/virtio/vhost-iova-tree.c | 110 ++++++++++++++++++++++++++++++++++++
 hw/virtio/meson.build       |   2 +-
 3 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-iova-tree.c

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
new file mode 100644
index 0000000000..6a4f24e0f9
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.h
@@ -0,0 +1,27 @@
+/*
+ * vhost software live migration iova tree
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
+#define HW_VIRTIO_VHOST_IOVA_TREE_H
+
+#include "qemu/iova-tree.h"
+#include "exec/memory.h"
+
+typedef struct VhostIOVATree VhostIOVATree;
+
+VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
+                                        const DMAMap *map);
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
+
+#endif
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
new file mode 100644
index 0000000000..55fed1fefb
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.c
@@ -0,0 +1,110 @@
+/*
+ * vhost software live migration iova tree
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iova-tree.h"
+#include "vhost-iova-tree.h"
+
+#define iova_min_addr qemu_real_host_page_size
+
+/**
+ * VhostIOVATree, able to:
+ * - Translate iova address
+ * - Reverse translate iova address (from translated to iova)
+ * - Allocate IOVA regions for translated range (linear operation)
+ */
+struct VhostIOVATree {
+    /* First addressable iova address in the device */
+    uint64_t iova_first;
+
+    /* Last addressable iova address in the device */
+    uint64_t iova_last;
+
+    /* IOVA address to qemu memory maps. */
+    IOVATree *iova_taddr_map;
+};
+
+/**
+ * Create a new IOVA tree
+ *
+ * Returns the new IOVA tree
+ */
+VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
+{
+    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
+
+    /* Some devices do not like 0 addresses */
+    tree->iova_first = MAX(iova_first, iova_min_addr);
+    tree->iova_last = iova_last;
+
+    tree->iova_taddr_map = iova_tree_new();
+    return tree;
+}
+
+/**
+ * Delete an iova tree
+ */
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
+{
+    iova_tree_destroy(iova_tree->iova_taddr_map);
+    g_free(iova_tree);
+}
+
+/**
+ * Find the IOVA address stored from a memory address
+ *
+ * @tree: The iova tree
+ * @map: The map with the memory address
+ *
+ * Return the stored mapping, or NULL if not found.
+ */
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
+                                        const DMAMap *map)
+{
+    return iova_tree_find_iova(tree->iova_taddr_map, map);
+}
+
+/**
+ * Allocate a new mapping
+ *
+ * @tree: The iova tree
+ * @map: The iova map
+ *
+ * Returns:
+ * - IOVA_OK if the map fits in the container
+ * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
+ * - IOVA_ERR_NOMEM if tree cannot allocate more space.
+ *
+ * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
+ */
+int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
+{
+    /* Some vhost devices do not like addr 0. Skip first page */
+    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
+
+    if (map->translated_addr + map->size < map->translated_addr ||
+        map->perm == IOMMU_NONE) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /* Allocate a node in IOVA address */
+    return iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first,
+                               tree->iova_last);
+}
+
+/**
+ * Remove existing mappings from iova tree
+ *
+ * @iova_tree: The vhost iova tree
+ * @map: The map to remove
+ */
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
+{
+    iova_tree_remove(iova_tree->iova_taddr_map, map);
+}
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 2dc87613bc..6047670804 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
 virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 11/15] vdpa: Add custom IOTLB translations to SVQ
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 10/15] vhost: Add VhostIOVATree Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 12/15] vdpa: Adapt vhost_vdpa_get_vring_base " Eugenio Pérez
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

Use translations added in VhostIOVATree in SVQ.

Only introduce usage here, not allocation and deallocation. As with
previous patches, we use the dead code paths of shadow_vqs_enabled to
avoid commiting too many changes at once. These are impossible to take
at the moment.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |   6 +-
 include/hw/virtio/vhost-vdpa.h     |   3 +
 hw/virtio/vhost-shadow-virtqueue.c |  75 +++++++++++++++++-
 hw/virtio/vhost-vdpa.c             | 122 ++++++++++++++++++++++++-----
 4 files changed, 181 insertions(+), 25 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 38b3b91ca7..e5e24c536d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -13,6 +13,7 @@
 #include "qemu/event_notifier.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
+#include "hw/virtio/vhost-iova-tree.h"
 
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
@@ -43,6 +44,9 @@ typedef struct VhostShadowVirtqueue {
     /* Virtio device */
     VirtIODevice *vdev;
 
+    /* IOVA mapping */
+    VhostIOVATree *iova_tree;
+
     /* Map for use the guest's descriptors */
     VirtQueueElement **ring_id_maps;
 
@@ -75,7 +79,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(void);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 009a9f3b6b..ee8e939ad0 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -14,6 +14,7 @@
 
 #include <gmodule.h>
 
+#include "hw/virtio/vhost-iova-tree.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
     bool shadow_vqs_enabled;
+    /* IOVA mapping used by the Shadow Virtqueue */
+    VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 5543a50222..71a1d2f6bb 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -69,7 +69,58 @@ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
     return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
 }
 
+/**
+ * Translate addresses between the qemu's virtual address and the SVQ IOVA
+ *
+ * @svq: Shadow VirtQueue
+ * @vaddr: Translated IOVA addresses
+ * @iovec: Source qemu's VA addresses
+ * @num: Length of iovec and minimum length of vaddr
+ */
+static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
+                                     void **addrs, const struct iovec *iovec,
+                                     size_t num)
+{
+    if (num == 0) {
+        return true;
+    }
+
+    for (size_t i = 0; i < num; ++i) {
+        DMAMap needle = {
+            .translated_addr = (hwaddr)iovec[i].iov_base,
+            .size = iovec[i].iov_len,
+        };
+        size_t off;
+
+        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
+        /*
+         * Map cannot be NULL since iova map contains all guest space and
+         * qemu already has a physical address mapped
+         */
+        if (unlikely(!map)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "Invalid address 0x%"HWADDR_PRIx" given by guest",
+                          needle.translated_addr);
+            return false;
+        }
+
+        off = needle.translated_addr - map->translated_addr;
+        addrs[i] = (void *)(map->iova + off);
+
+        if (unlikely(int128_gt(int128_add(needle.translated_addr,
+                                          iovec[i].iov_len),
+                               map->translated_addr + map->size))) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "Guest buffer expands over iova range");
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+                                    void * const *sg,
                                     const struct iovec *iovec,
                                     size_t num, bool more_descs, bool write)
 {
@@ -88,7 +139,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
         } else {
             descs[i].flags = flags;
         }
-        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
+        descs[i].addr = cpu_to_le64((hwaddr)sg[n]);
         descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
         last = i;
@@ -104,6 +155,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 {
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
+    bool ok;
+    g_autofree void **sgs = g_new(void *, MAX(elem->out_num, elem->in_num));
 
     *head = svq->free_head;
 
@@ -114,9 +167,20 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
         return false;
     }
 
-    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
+    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
                             elem->in_num > 0, false);
-    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
+
+
+    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
 
     /*
      * Put the entry in the available array (but don't update avail->idx until
@@ -520,11 +584,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * Creates vhost shadow virtqueue, and instructs the vhost device to use the
  * shadow methods and file descriptors.
  *
+ * @iova_tree: Tree to perform descriptors translations
+ *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(void)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -545,6 +611,7 @@ VhostShadowVirtqueue *vhost_svq_new(void)
 
     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
+    svq->iova_tree = iova_tree;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index a9dc7f0fce..8630d624f6 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -209,6 +209,21 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                          vaddr, section->readonly);
 
     llsize = int128_sub(llend, int128_make64(iova));
+    if (v->shadow_vqs_enabled) {
+        DMAMap mem_region = {
+            .translated_addr = (hwaddr)vaddr,
+            .size = int128_get64(llsize) - 1,
+            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
+        };
+
+        int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
+        if (unlikely(r != IOVA_OK)) {
+            error_report("Can't allocate a mapping (%d)", r);
+            goto fail;
+        }
+
+        iova = mem_region.iova;
+    }
 
     vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
@@ -261,6 +276,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
 
     llsize = int128_sub(llend, int128_make64(iova));
 
+    if (v->shadow_vqs_enabled) {
+        const DMAMap *result;
+        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+        DMAMap mem_region = {
+            .translated_addr = (hwaddr)vaddr,
+            .size = int128_get64(llsize) - 1,
+        };
+
+        result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
+        iova = result->iova;
+        vhost_iova_tree_remove(v->iova_tree, &mem_region);
+    }
     vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
     if (ret) {
@@ -370,7 +399,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();
+        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
@@ -807,33 +836,70 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
 /**
  * Unmap a SVQ area in the device
  */
-static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
-                                      hwaddr size)
+static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
+                                      const DMAMap *needle)
 {
+    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
+    hwaddr size;
     int r;
 
-    size = ROUND_UP(size, qemu_real_host_page_size);
-    r = vhost_vdpa_dma_unmap(v, iova, size);
+    if (unlikely(!result)) {
+        error_report("Unable to find SVQ address to unmap");
+        return false;
+    }
+
+    size = ROUND_UP(result->size, qemu_real_host_page_size);
+    r = vhost_vdpa_dma_unmap(v, result->iova, size);
     return r == 0;
 }
 
 static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
                                        const VhostShadowVirtqueue *svq)
 {
+    DMAMap needle = {};
     struct vhost_vdpa *v = dev->opaque;
     struct vhost_vring_addr svq_addr;
-    size_t device_size = vhost_svq_device_area_size(svq);
-    size_t driver_size = vhost_svq_driver_area_size(svq);
     bool ok;
 
     vhost_svq_get_vring_addr(svq, &svq_addr);
 
-    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
+    needle.translated_addr = svq_addr.desc_user_addr;
+    ok = vhost_vdpa_svq_unmap_ring(v, &needle);
     if (unlikely(!ok)) {
         return false;
     }
 
-    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
+    needle.translated_addr = svq_addr.used_user_addr;
+    return vhost_vdpa_svq_unmap_ring(v, &needle);
+}
+
+/**
+ * Map the SVQ area in the device
+ *
+ * @v: Vhost-vdpa device
+ * @needle: The area to search iova
+ * @errorp: Error pointer
+ */
+static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
+                                    Error **errp)
+{
+    int r;
+
+    r = vhost_iova_tree_map_alloc(v->iova_tree, needle);
+    if (unlikely(r != IOVA_OK)) {
+        error_setg(errp, "Cannot allocate iova (%d)", r);
+        return false;
+    }
+
+    r = vhost_vdpa_dma_map(v, needle->iova, needle->size + 1,
+                           (void *)needle->translated_addr,
+                           needle->perm == IOMMU_RO);
+    if (unlikely(r != 0)) {
+        error_setg_errno(errp, -r, "Cannot map region to device");
+        vhost_iova_tree_remove(v->iova_tree, needle);
+    }
+
+    return r == 0;
 }
 
 /**
@@ -849,28 +915,44 @@ static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
                                      struct vhost_vring_addr *addr,
                                      Error **errp)
 {
+    DMAMap device_region, driver_region;
+    struct vhost_vring_addr svq_addr;
     struct vhost_vdpa *v = dev->opaque;
     size_t device_size = vhost_svq_device_area_size(svq);
     size_t driver_size = vhost_svq_driver_area_size(svq);
-    int r;
+    size_t avail_offset;
+    bool ok;
 
     ERRP_GUARD();
-    vhost_svq_get_vring_addr(svq, addr);
+    vhost_svq_get_vring_addr(svq, &svq_addr);
 
-    r = vhost_vdpa_dma_map(v, addr->desc_user_addr, driver_size,
-                           (void *)addr->desc_user_addr, true);
-    if (unlikely(r != 0)) {
-        error_setg_errno(errp, -r, "Cannot create vq driver region: ");
+    driver_region = (DMAMap) {
+        .translated_addr = svq_addr.desc_user_addr,
+        .size = driver_size - 1,
+        .perm = IOMMU_RO,
+    };
+    ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp);
+    if (unlikely(!ok)) {
+        error_prepend(errp, "Cannot create vq driver region: ");
         return false;
     }
+    addr->desc_user_addr = driver_region.iova;
+    avail_offset = svq_addr.avail_user_addr - svq_addr.desc_user_addr;
+    addr->avail_user_addr = driver_region.iova + avail_offset;
 
-    r = vhost_vdpa_dma_map(v, addr->used_user_addr, device_size,
-                           (void *)addr->used_user_addr, false);
-    if (unlikely(r != 0)) {
-        error_setg_errno(errp, -r, "Cannot create vq device region: ");
+    device_region = (DMAMap) {
+        .translated_addr = svq_addr.used_user_addr,
+        .size = device_size - 1,
+        .perm = IOMMU_RW,
+    };
+    ok = vhost_vdpa_svq_map_ring(v, &device_region, errp);
+    if (unlikely(!ok)) {
+        error_prepend(errp, "Cannot create vq device region: ");
+        vhost_vdpa_svq_unmap_ring(v, &driver_region);
     }
+    addr->used_user_addr = device_region.iova;
 
-    return r == 0;
+    return ok;
 }
 
 static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 12/15] vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (10 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 11/15] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 13/15] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

This is needed to achieve migration, so the destination can restore its
index.

Setting base as last used idx, so destination will see as available all
the entries that the device did not use, including the in-flight
processing ones.

This is ok for networking, but other kinds of devices might have
problems with these retransmissions.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8630d624f6..69a4bfd0d4 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1143,8 +1143,25 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
 static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int ret;
 
+    if (v->shadow_vqs_enabled) {
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                      ring->index);
+
+        /*
+         * Setting base as last used idx, so destination will see as available
+         * all the entries that the device did not use, including the in-flight
+         * processing ones.
+         *
+         * TODO: This is ok for networking, but other kinds of devices might
+         * have problems with these retransmissions.
+         */
+        ring->num = svq->last_used_idx;
+        return 0;
+    }
+
     ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
     trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
     return ret;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 13/15] vdpa: Never set log_base addr if SVQ is enabled
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (11 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 12/15] vdpa: Adapt vhost_vdpa_get_vring_base " Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 14/15] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

Setting the log address would make the device start reporting invalid
dirty memory because the SVQ vrings are located in qemu's memory.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 69a4bfd0d4..5470566ce2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1092,7 +1092,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
                                      struct vhost_log *log)
 {
-    if (vhost_vdpa_one_time_request(dev)) {
+    struct vhost_vdpa *v = dev->opaque;
+    if (v->shadow_vqs_enabled || vhost_vdpa_one_time_request(dev)) {
         return 0;
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 14/15] vdpa: Expose VHOST_F_LOG_ALL on SVQ
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (12 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 13/15] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-07 15:33 ` [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
  2022-03-08  6:03   ` Jason Wang
  15 siblings, 0 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

SVQ is able to log the dirty bits by itself, so let's use it to not
block migration.

Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
enabled. Even if the device supports it, the reports would be nonsense
because SVQ memory is in the qemu region.

The log region is still allocated. Future changes might skip that, but
this series is already long enough.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h |  1 +
 hw/virtio/vhost-vdpa.c         | 39 ++++++++++++++++++++++++++++++----
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index ee8e939ad0..a29dbb3f53 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -30,6 +30,7 @@ typedef struct vhost_vdpa {
     bool iotlb_batch_begin_sent;
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
+    uint64_t acked_features;
     bool shadow_vqs_enabled;
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 5470566ce2..8dedab4450 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -373,6 +373,16 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
     return v->index != 0;
 }
 
+static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
+                                       uint64_t *features)
+{
+    int ret;
+
+    ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
+    trace_vhost_vdpa_get_features(dev, *features);
+    return ret;
+}
+
 static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
                                Error **errp)
 {
@@ -385,7 +395,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
         return 0;
     }
 
-    r = hdev->vhost_ops->vhost_get_features(hdev, &dev_features);
+    r = vhost_vdpa_get_dev_features(hdev, &dev_features);
     if (r != 0) {
         error_setg_errno(errp, -r, "Can't get vdpa device features");
         return r;
@@ -609,12 +619,29 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
 static int vhost_vdpa_set_features(struct vhost_dev *dev,
                                    uint64_t features)
 {
+    struct vhost_vdpa *v = dev->opaque;
     int ret;
 
     if (vhost_vdpa_one_time_request(dev)) {
         return 0;
     }
 
+    if (v->shadow_vqs_enabled) {
+        if ((v->acked_features ^ features) == BIT_ULL(VHOST_F_LOG_ALL)) {
+            /*
+             * QEMU is just trying to enable or disable logging. SVQ handles
+             * this sepparately, so no need to forward this.
+             */
+            v->acked_features = features;
+            return 0;
+        }
+
+        v->acked_features = features;
+
+        /* We must not ack _F_LOG if SVQ is enabled */
+        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
+    }
+
     trace_vhost_vdpa_set_features(dev, features);
     ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
     if (ret) {
@@ -1202,10 +1229,14 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
                                      uint64_t *features)
 {
-    int ret;
+    struct vhost_vdpa *v = dev->opaque;
+    int ret = vhost_vdpa_get_dev_features(dev, features);
+
+    if (ret == 0 && v->shadow_vqs_enabled) {
+        /* Add SVQ logging capabilities */
+        *features |= BIT_ULL(VHOST_F_LOG_ALL);
+    }
 
-    ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
-    trace_vhost_vdpa_get_features(dev, *features);
     return ret;
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
                   ` (13 preceding siblings ...)
  2022-03-07 15:33 ` [PATCH v5 14/15] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
@ 2022-03-07 15:33 ` Eugenio Pérez
  2022-03-08  7:11     ` Michael S. Tsirkin
  2022-03-08  9:29   ` Markus Armbruster
  2022-03-08  6:03   ` Jason Wang
  15 siblings, 2 replies; 60+ messages in thread
From: Eugenio Pérez @ 2022-03-07 15:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, virtualization,
	Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |  8 +++++++-
 net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 7fab2e7cd8..d626fa441c 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,18 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #          (default: 1)
 #
+# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
+#
+# Features:
+# @unstable: Member @svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
     '*vhostdev':     'str',
-    '*queues':       'int' } }
+    '*queues':       'int',
+    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetClientDriver:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1e9fe47c03..c827921654 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -127,7 +127,11 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_dev *dev = s->vhost_vdpa.dev;
 
+    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static int vhost_vdpa_get_iova_range(int fd,
+                                     struct vhost_vdpa_iova_range *iova_range)
+{
+    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+    return ret < 0 ? -errno : 0;
+}
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-                                           const char *device,
-                                           const char *name,
-                                           int vdpa_device_fd,
-                                           int queue_pair_index,
-                                           int nvqs,
-                                           bool is_datapath)
+                                       const char *device,
+                                       const char *name,
+                                       int vdpa_device_fd,
+                                       int queue_pair_index,
+                                       int nvqs,
+                                       bool is_datapath,
+                                       bool svq,
+                                       VhostIOVATree *iova_tree)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.iova_tree = iova_tree;
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         qemu_del_net_client(nc);
@@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, i, has_cvq = 0;
+    g_autoptr(VhostIOVATree) iova_tree = NULL;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         qemu_close(vdpa_device_fd);
         return queue_pairs;
     }
+    if (opts->svq) {
+        struct vhost_vdpa_iova_range iova_range;
+
+        if (has_cvq) {
+            error_setg(errp, "vdpa svq does not work with cvq");
+            goto err_svq;
+        }
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    }
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true);
+                                     vdpa_device_fd, i, 2, true, opts->svq,
+                                     iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false);
+                                 vdpa_device_fd, i, 1, false, opts->svq,
+                                 iova_tree);
         if (!nc)
             goto err;
     }
 
+    iova_tree = NULL;
     return 0;
 
 err:
     if (i) {
         qemu_del_net_client(ncs[0]);
     }
+
+err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
@ 2022-03-08  6:03   ` Jason Wang
  2022-03-07 15:33 ` [PATCH v5 02/15] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  6:03 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin
  Cc: Michael S. Tsirkin, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan


在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked.
>
> This effectively means that vDPA device passthrough is intercepted by
> qemu. While SVQ should only be enabled at migration time, the switching
> from regular mode to SVQ mode is left for a future series.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with cmdline:
>
> -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
>
> The first three patches enables notifications forwarding with
> assistance of qemu. It's easy to enable only this if the relevant
> cmdline part of the last patch is applied on top of these.
>
> Next four patches implement the actual buffer forwarding. However,
> address are not translated from HVA so they will need a host device with
> an iommu allowing them to access all of the HVA range.
>
> The last part of the series uses properly the host iommu, so qemu
> creates a new iova address space in the device's range and translates
> the buffers in it. Finally, it adds the cmdline parameter.
>
> Some simple performance tests with netperf were done. They used a nested
> guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> baseline average of ~9009.96Mbps:
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 131072  16384  16384    30.01    9061.03
> 131072  16384  16384    30.01    8962.94
> 131072  16384  16384    30.01    9005.92
>
> To enable SVQ buffers forwarding reduce throughput to about
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 131072  16384  16384    30.01    7689.72
> 131072  16384  16384    30.00    7752.07
> 131072  16384  16384    30.01    7750.30
>
> However, many performance improvements were left out of this series for
> simplicity, so difference should shrink in the future.
>
> Comments are welcome.


Hi Michael:

What do you think of this series? It looks good to me as a start. The 
feature could only be enabled as a dedicated parameter. If you're ok, 
I'd try to make it for 7.0.

Thanks


>
> TODO on future series:
> * Event, indirect, packed, and others features of virtio.
> * To support different set of features between the device<->SVQ and the
>    SVQ<->guest communication.
> * Support of device host notifier memory regions.
> * To sepparate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4:
> * Iterate iova->hva tree instead on maintain own tree so we support HVA
>    overlaps.
> * Fix: Errno completion at failure.
> * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
>
> Changes from v3:
> * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> v3 link:
> https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
>
> Changes from v2:
> * Less assertions and more error handling in iova tree code.
> * Better documentation, both fixing errors and making @param: format
> * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
>    prefix at both times.
> * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> * Split vhost_svq_{enable,disable}_notification, so the code looks more
>    like the kernel driver code.
> * Small improvements.
> v2 link:
> https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
>
> Changes from v1:
> * Feature set at device->SVQ is now the same as SVQ->guest.
> * Size of SVQ is not max available device size anymore, but guest's
>    negotiated.
> * Add VHOST_FILE_UNBIND kick and call fd treatment.
> * Make SVQ a public struct
> * Come back to previous approach to iova-tree
> * Some assertions are now fail paths. Some errors are now log_guest.
> * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> * Refactor some errors and messages. Add missing error unwindings.
> * Add memory barrier at _F_NO_NOTIFY set.
> * Stop checking for features flags out of transport range.
> v1 link:
> https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (15):
>    vhost: Add VhostShadowVirtqueue
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Add Shadow VirtQueue call forwarding capabilities
>    vhost: Add vhost_svq_valid_features to shadow vq
>    virtio: Add vhost_svq_get_vring_addr
>    vdpa: adapt vhost_ops callbacks to svq
>    vhost: Shadow virtqueue buffers forwarding
>    util: Add iova_tree_alloc_map
>    util: add iova_tree_find_iova
>    vhost: Add VhostIOVATree
>    vdpa: Add custom IOTLB translations to SVQ
>    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>    vdpa: Never set log_base addr if SVQ is enabled
>    vdpa: Expose VHOST_F_LOG_ALL on SVQ
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>
>   qapi/net.json                      |   8 +-
>   hw/virtio/vhost-iova-tree.h        |  27 ++
>   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
>   include/hw/virtio/vhost-vdpa.h     |   8 +
>   include/qemu/iova-tree.h           |  38 +-
>   hw/virtio/vhost-iova-tree.c        | 110 +++++
>   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
>   net/vhost-vdpa.c                   |  48 ++-
>   util/iova-tree.c                   | 169 ++++++++
>   hw/virtio/meson.build              |   2 +-
>   11 files changed, 1633 insertions(+), 26 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  6:03   ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  6:03 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin
  Cc: Michael S. Tsirkin, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan


在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked.
>
> This effectively means that vDPA device passthrough is intercepted by
> qemu. While SVQ should only be enabled at migration time, the switching
> from regular mode to SVQ mode is left for a future series.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with cmdline:
>
> -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
>
> The first three patches enables notifications forwarding with
> assistance of qemu. It's easy to enable only this if the relevant
> cmdline part of the last patch is applied on top of these.
>
> Next four patches implement the actual buffer forwarding. However,
> address are not translated from HVA so they will need a host device with
> an iommu allowing them to access all of the HVA range.
>
> The last part of the series uses properly the host iommu, so qemu
> creates a new iova address space in the device's range and translates
> the buffers in it. Finally, it adds the cmdline parameter.
>
> Some simple performance tests with netperf were done. They used a nested
> guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> baseline average of ~9009.96Mbps:
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 131072  16384  16384    30.01    9061.03
> 131072  16384  16384    30.01    8962.94
> 131072  16384  16384    30.01    9005.92
>
> To enable SVQ buffers forwarding reduce throughput to about
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 131072  16384  16384    30.01    7689.72
> 131072  16384  16384    30.00    7752.07
> 131072  16384  16384    30.01    7750.30
>
> However, many performance improvements were left out of this series for
> simplicity, so difference should shrink in the future.
>
> Comments are welcome.


Hi Michael:

What do you think of this series? It looks good to me as a start. The 
feature could only be enabled as a dedicated parameter. If you're ok, 
I'd try to make it for 7.0.

Thanks


>
> TODO on future series:
> * Event, indirect, packed, and others features of virtio.
> * To support different set of features between the device<->SVQ and the
>    SVQ<->guest communication.
> * Support of device host notifier memory regions.
> * To sepparate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4:
> * Iterate iova->hva tree instead on maintain own tree so we support HVA
>    overlaps.
> * Fix: Errno completion at failure.
> * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
>
> Changes from v3:
> * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> v3 link:
> https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
>
> Changes from v2:
> * Less assertions and more error handling in iova tree code.
> * Better documentation, both fixing errors and making @param: format
> * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
>    prefix at both times.
> * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> * Split vhost_svq_{enable,disable}_notification, so the code looks more
>    like the kernel driver code.
> * Small improvements.
> v2 link:
> https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
>
> Changes from v1:
> * Feature set at device->SVQ is now the same as SVQ->guest.
> * Size of SVQ is not max available device size anymore, but guest's
>    negotiated.
> * Add VHOST_FILE_UNBIND kick and call fd treatment.
> * Make SVQ a public struct
> * Come back to previous approach to iova-tree
> * Some assertions are now fail paths. Some errors are now log_guest.
> * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> * Refactor some errors and messages. Add missing error unwindings.
> * Add memory barrier at _F_NO_NOTIFY set.
> * Stop checking for features flags out of transport range.
> v1 link:
> https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (15):
>    vhost: Add VhostShadowVirtqueue
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Add Shadow VirtQueue call forwarding capabilities
>    vhost: Add vhost_svq_valid_features to shadow vq
>    virtio: Add vhost_svq_get_vring_addr
>    vdpa: adapt vhost_ops callbacks to svq
>    vhost: Shadow virtqueue buffers forwarding
>    util: Add iova_tree_alloc_map
>    util: add iova_tree_find_iova
>    vhost: Add VhostIOVATree
>    vdpa: Add custom IOTLB translations to SVQ
>    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>    vdpa: Never set log_base addr if SVQ is enabled
>    vdpa: Expose VHOST_F_LOG_ALL on SVQ
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>
>   qapi/net.json                      |   8 +-
>   hw/virtio/vhost-iova-tree.h        |  27 ++
>   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
>   include/hw/virtio/vhost-vdpa.h     |   8 +
>   include/qemu/iova-tree.h           |  38 +-
>   hw/virtio/vhost-iova-tree.c        | 110 +++++
>   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
>   net/vhost-vdpa.c                   |  48 ++-
>   util/iova-tree.c                   | 169 ++++++++
>   hw/virtio/meson.build              |   2 +-
>   11 files changed, 1633 insertions(+), 26 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  6:03   ` Jason Wang
@ 2022-03-08  7:11     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> 
> 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> > 
> > When SVQ is enabled qemu offers a new virtual address space to the
> > device to read and write into, and it maps new vrings and the guest
> > memory in it. SVQ also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked.
> > 
> > This effectively means that vDPA device passthrough is intercepted by
> > qemu. While SVQ should only be enabled at migration time, the switching
> > from regular mode to SVQ mode is left for a future series.
> > 
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> > 
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx.
> > 
> > SVQ needs to be enabled with cmdline:
> > 
> > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on

A stable API for an incomplete feature is a problem imho.


> > 
> > The first three patches enables notifications forwarding with
> > assistance of qemu. It's easy to enable only this if the relevant
> > cmdline part of the last patch is applied on top of these.
> > 
> > Next four patches implement the actual buffer forwarding. However,
> > address are not translated from HVA so they will need a host device with
> > an iommu allowing them to access all of the HVA range.
> > 
> > The last part of the series uses properly the host iommu, so qemu
> > creates a new iova address space in the device's range and translates
> > the buffers in it. Finally, it adds the cmdline parameter.
> > 
> > Some simple performance tests with netperf were done. They used a nested
> > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > baseline average of ~9009.96Mbps:
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 131072  16384  16384    30.01    9061.03
> > 131072  16384  16384    30.01    8962.94
> > 131072  16384  16384    30.01    9005.92
> > 
> > To enable SVQ buffers forwarding reduce throughput to about
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 131072  16384  16384    30.01    7689.72
> > 131072  16384  16384    30.00    7752.07
> > 131072  16384  16384    30.01    7750.30
> > 
> > However, many performance improvements were left out of this series for
> > simplicity, so difference should shrink in the future.
> > 
> > Comments are welcome.
> 
> 
> Hi Michael:
> 
> What do you think of this series? It looks good to me as a start. The
> feature could only be enabled as a dedicated parameter. If you're ok, I'd
> try to make it for 7.0.
> 
> Thanks

Well that's cutting it awfully close, and it's not really useful
at the current stage, is it?

The IOVA trick does not feel complete either.

> 
> > 
> > TODO on future series:
> > * Event, indirect, packed, and others features of virtio.
> > * To support different set of features between the device<->SVQ and the
> >    SVQ<->guest communication.
> > * Support of device host notifier memory regions.
> > * To sepparate buffers forwarding in its own AIO context, so we can
> >    throw more threads to that task and we don't need to stop the main
> >    event loop.
> > * Support multiqueue virtio-net vdpa.
> > * Proper documentation.
> > 
> > Changes from v4:
> > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> >    overlaps.
> > * Fix: Errno completion at failure.
> > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > 
> > Changes from v3:
> > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > v3 link:
> > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > 
> > Changes from v2:
> > * Less assertions and more error handling in iova tree code.
> > * Better documentation, both fixing errors and making @param: format
> > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> >    prefix at both times.
> > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> >    like the kernel driver code.
> > * Small improvements.
> > v2 link:
> > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > 
> > Changes from v1:
> > * Feature set at device->SVQ is now the same as SVQ->guest.
> > * Size of SVQ is not max available device size anymore, but guest's
> >    negotiated.
> > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > * Make SVQ a public struct
> > * Come back to previous approach to iova-tree
> > * Some assertions are now fail paths. Some errors are now log_guest.
> > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > * Refactor some errors and messages. Add missing error unwindings.
> > * Add memory barrier at _F_NO_NOTIFY set.
> > * Stop checking for features flags out of transport range.
> > v1 link:
> > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > 
> > Changes from v4 RFC:
> > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> >    already present iova-tree for that.
> > * Proper validation of guest features. Now SVQ can negotiate a
> >    different set of features with the device when enabled.
> > * Support of host notifiers memory regions
> > * Handling of SVQ full queue in case guest's descriptors span to
> >    different memory regions (qemu's VA chunks).
> > * Flush pending used buffers at end of SVQ operation.
> > * QMP command now looks by NetClientState name. Other devices will need
> >    to implement it's way to enable vdpa.
> > * Rename QMP command to set, so it looks more like a way of working
> > * Better use of qemu error system
> > * Make a few assertions proper error-handling paths.
> > * Add more documentation
> > * Less coupling of virtio / vhost, that could cause friction on changes
> > * Addressed many other small comments and small fixes.
> > 
> > Changes from v3 RFC:
> >    * Move everything to vhost-vdpa backend. A big change, this allowed
> >      some cleanup but more code has been added in other places.
> >    * More use of glib utilities, especially to manage memory.
> > v3 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > 
> > Changes from v2 RFC:
> >    * Adding vhost-vdpa devices support
> >    * Fixed some memory leaks pointed by different comments
> > v2 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > 
> > Changes from v1 RFC:
> >    * Use QMP instead of migration to start SVQ mode.
> >    * Only accepting IOMMU devices, closer behavior with target devices
> >      (vDPA)
> >    * Fix invalid masking/unmasking of vhost call fd.
> >    * Use of proper methods for synchronization.
> >    * No need to modify VirtIO device code, all of the changes are
> >      contained in vhost code.
> >    * Delete superfluous code.
> >    * An intermediate RFC was sent with only the notifications forwarding
> >      changes. It can be seen in
> >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > v1 link:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > 
> > Eugenio Pérez (20):
> >        virtio: Add VIRTIO_F_QUEUE_STATE
> >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> >        virtio: Add virtio_queue_is_host_notifier_enabled
> >        vhost: Make vhost_virtqueue_{start,stop} public
> >        vhost: Add x-vhost-enable-shadow-vq qmp
> >        vhost: Add VhostShadowVirtqueue
> >        vdpa: Register vdpa devices in a list
> >        vhost: Route guest->host notification through shadow virtqueue
> >        Add vhost_svq_get_svq_call_notifier
> >        Add vhost_svq_set_guest_call_notifier
> >        vdpa: Save call_fd in vhost-vdpa
> >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >        vhost: Route host->guest notification through shadow virtqueue
> >        virtio: Add vhost_shadow_vq_get_vring_addr
> >        vdpa: Save host and guest features
> >        vhost: Add vhost_svq_valid_device_features to shadow vq
> >        vhost: Shadow virtqueue buffers forwarding
> >        vhost: Add VhostIOVATree
> >        vhost: Use a tree to store memory mappings
> >        vdpa: Add custom IOTLB translations to SVQ
> > 
> > Eugenio Pérez (15):
> >    vhost: Add VhostShadowVirtqueue
> >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> >    vhost: Add Shadow VirtQueue call forwarding capabilities
> >    vhost: Add vhost_svq_valid_features to shadow vq
> >    virtio: Add vhost_svq_get_vring_addr
> >    vdpa: adapt vhost_ops callbacks to svq
> >    vhost: Shadow virtqueue buffers forwarding
> >    util: Add iova_tree_alloc_map
> >    util: add iova_tree_find_iova
> >    vhost: Add VhostIOVATree
> >    vdpa: Add custom IOTLB translations to SVQ
> >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> >    vdpa: Never set log_base addr if SVQ is enabled
> >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > 
> >   qapi/net.json                      |   8 +-
> >   hw/virtio/vhost-iova-tree.h        |  27 ++
> >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> >   include/hw/virtio/vhost-vdpa.h     |   8 +
> >   include/qemu/iova-tree.h           |  38 +-
> >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> >   net/vhost-vdpa.c                   |  48 ++-
> >   util/iova-tree.c                   | 169 ++++++++
> >   hw/virtio/meson.build              |   2 +-
> >   11 files changed, 1633 insertions(+), 26 deletions(-)
> >   create mode 100644 hw/virtio/vhost-iova-tree.h
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> >   create mode 100644 hw/virtio/vhost-iova-tree.c
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  7:11     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> 
> 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> > 
> > When SVQ is enabled qemu offers a new virtual address space to the
> > device to read and write into, and it maps new vrings and the guest
> > memory in it. SVQ also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked.
> > 
> > This effectively means that vDPA device passthrough is intercepted by
> > qemu. While SVQ should only be enabled at migration time, the switching
> > from regular mode to SVQ mode is left for a future series.
> > 
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> > 
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx.
> > 
> > SVQ needs to be enabled with cmdline:
> > 
> > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on

A stable API for an incomplete feature is a problem imho.


> > 
> > The first three patches enables notifications forwarding with
> > assistance of qemu. It's easy to enable only this if the relevant
> > cmdline part of the last patch is applied on top of these.
> > 
> > Next four patches implement the actual buffer forwarding. However,
> > address are not translated from HVA so they will need a host device with
> > an iommu allowing them to access all of the HVA range.
> > 
> > The last part of the series uses properly the host iommu, so qemu
> > creates a new iova address space in the device's range and translates
> > the buffers in it. Finally, it adds the cmdline parameter.
> > 
> > Some simple performance tests with netperf were done. They used a nested
> > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > baseline average of ~9009.96Mbps:
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 131072  16384  16384    30.01    9061.03
> > 131072  16384  16384    30.01    8962.94
> > 131072  16384  16384    30.01    9005.92
> > 
> > To enable SVQ buffers forwarding reduce throughput to about
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 131072  16384  16384    30.01    7689.72
> > 131072  16384  16384    30.00    7752.07
> > 131072  16384  16384    30.01    7750.30
> > 
> > However, many performance improvements were left out of this series for
> > simplicity, so difference should shrink in the future.
> > 
> > Comments are welcome.
> 
> 
> Hi Michael:
> 
> What do you think of this series? It looks good to me as a start. The
> feature could only be enabled as a dedicated parameter. If you're ok, I'd
> try to make it for 7.0.
> 
> Thanks

Well that's cutting it awfully close, and it's not really useful
at the current stage, is it?

The IOVA trick does not feel complete either.

> 
> > 
> > TODO on future series:
> > * Event, indirect, packed, and others features of virtio.
> > * To support different set of features between the device<->SVQ and the
> >    SVQ<->guest communication.
> > * Support of device host notifier memory regions.
> > * To sepparate buffers forwarding in its own AIO context, so we can
> >    throw more threads to that task and we don't need to stop the main
> >    event loop.
> > * Support multiqueue virtio-net vdpa.
> > * Proper documentation.
> > 
> > Changes from v4:
> > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> >    overlaps.
> > * Fix: Errno completion at failure.
> > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > 
> > Changes from v3:
> > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > v3 link:
> > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > 
> > Changes from v2:
> > * Less assertions and more error handling in iova tree code.
> > * Better documentation, both fixing errors and making @param: format
> > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> >    prefix at both times.
> > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> >    like the kernel driver code.
> > * Small improvements.
> > v2 link:
> > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > 
> > Changes from v1:
> > * Feature set at device->SVQ is now the same as SVQ->guest.
> > * Size of SVQ is not max available device size anymore, but guest's
> >    negotiated.
> > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > * Make SVQ a public struct
> > * Come back to previous approach to iova-tree
> > * Some assertions are now fail paths. Some errors are now log_guest.
> > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > * Refactor some errors and messages. Add missing error unwindings.
> > * Add memory barrier at _F_NO_NOTIFY set.
> > * Stop checking for features flags out of transport range.
> > v1 link:
> > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > 
> > Changes from v4 RFC:
> > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> >    already present iova-tree for that.
> > * Proper validation of guest features. Now SVQ can negotiate a
> >    different set of features with the device when enabled.
> > * Support of host notifiers memory regions
> > * Handling of SVQ full queue in case guest's descriptors span to
> >    different memory regions (qemu's VA chunks).
> > * Flush pending used buffers at end of SVQ operation.
> > * QMP command now looks by NetClientState name. Other devices will need
> >    to implement it's way to enable vdpa.
> > * Rename QMP command to set, so it looks more like a way of working
> > * Better use of qemu error system
> > * Make a few assertions proper error-handling paths.
> > * Add more documentation
> > * Less coupling of virtio / vhost, that could cause friction on changes
> > * Addressed many other small comments and small fixes.
> > 
> > Changes from v3 RFC:
> >    * Move everything to vhost-vdpa backend. A big change, this allowed
> >      some cleanup but more code has been added in other places.
> >    * More use of glib utilities, especially to manage memory.
> > v3 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > 
> > Changes from v2 RFC:
> >    * Adding vhost-vdpa devices support
> >    * Fixed some memory leaks pointed by different comments
> > v2 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > 
> > Changes from v1 RFC:
> >    * Use QMP instead of migration to start SVQ mode.
> >    * Only accepting IOMMU devices, closer behavior with target devices
> >      (vDPA)
> >    * Fix invalid masking/unmasking of vhost call fd.
> >    * Use of proper methods for synchronization.
> >    * No need to modify VirtIO device code, all of the changes are
> >      contained in vhost code.
> >    * Delete superfluous code.
> >    * An intermediate RFC was sent with only the notifications forwarding
> >      changes. It can be seen in
> >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > v1 link:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > 
> > Eugenio Pérez (20):
> >        virtio: Add VIRTIO_F_QUEUE_STATE
> >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> >        virtio: Add virtio_queue_is_host_notifier_enabled
> >        vhost: Make vhost_virtqueue_{start,stop} public
> >        vhost: Add x-vhost-enable-shadow-vq qmp
> >        vhost: Add VhostShadowVirtqueue
> >        vdpa: Register vdpa devices in a list
> >        vhost: Route guest->host notification through shadow virtqueue
> >        Add vhost_svq_get_svq_call_notifier
> >        Add vhost_svq_set_guest_call_notifier
> >        vdpa: Save call_fd in vhost-vdpa
> >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >        vhost: Route host->guest notification through shadow virtqueue
> >        virtio: Add vhost_shadow_vq_get_vring_addr
> >        vdpa: Save host and guest features
> >        vhost: Add vhost_svq_valid_device_features to shadow vq
> >        vhost: Shadow virtqueue buffers forwarding
> >        vhost: Add VhostIOVATree
> >        vhost: Use a tree to store memory mappings
> >        vdpa: Add custom IOTLB translations to SVQ
> > 
> > Eugenio Pérez (15):
> >    vhost: Add VhostShadowVirtqueue
> >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> >    vhost: Add Shadow VirtQueue call forwarding capabilities
> >    vhost: Add vhost_svq_valid_features to shadow vq
> >    virtio: Add vhost_svq_get_vring_addr
> >    vdpa: adapt vhost_ops callbacks to svq
> >    vhost: Shadow virtqueue buffers forwarding
> >    util: Add iova_tree_alloc_map
> >    util: add iova_tree_find_iova
> >    vhost: Add VhostIOVATree
> >    vdpa: Add custom IOTLB translations to SVQ
> >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> >    vdpa: Never set log_base addr if SVQ is enabled
> >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > 
> >   qapi/net.json                      |   8 +-
> >   hw/virtio/vhost-iova-tree.h        |  27 ++
> >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> >   include/hw/virtio/vhost-vdpa.h     |   8 +
> >   include/qemu/iova-tree.h           |  38 +-
> >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> >   net/vhost-vdpa.c                   |  48 ++-
> >   util/iova-tree.c                   | 169 ++++++++
> >   hw/virtio/meson.build              |   2 +-
> >   11 files changed, 1633 insertions(+), 26 deletions(-)
> >   create mode 100644 hw/virtio/vhost-iova-tree.h
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> >   create mode 100644 hw/virtio/vhost-iova-tree.c
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-07 15:33 ` [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
@ 2022-03-08  7:11     ` Michael S. Tsirkin
  2022-03-08  9:29   ` Markus Armbruster
  1 sibling, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:11 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> Finally offering the possibility to enable SVQ from the command line.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  qapi/net.json    |  8 +++++++-
>  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 47 insertions(+), 9 deletions(-)
> 
> diff --git a/qapi/net.json b/qapi/net.json
> index 7fab2e7cd8..d626fa441c 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -445,12 +445,18 @@
>  # @queues: number of queues to be created for multiqueue vhost-vdpa
>  #          (default: 1)
>  #
> +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> +#
> +# Features:
> +# @unstable: Member @svq is experimental.
> +#
>  # Since: 5.1
>  ##
>  { 'struct': 'NetdevVhostVDPAOptions',
>    'data': {
>      '*vhostdev':     'str',
> -    '*queues':       'int' } }
> +    '*queues':       'int',
> +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
>  
>  ##
>  # @NetClientDriver:

I think this should be x-svq same as other unstable features.

> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1e9fe47c03..c827921654 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -127,7 +127,11 @@ err_init:
>  static void vhost_vdpa_cleanup(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_dev *dev = s->vhost_vdpa.dev;
>  
> +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> +    }
>      if (s->vhost_net) {
>          vhost_net_cleanup(s->vhost_net);
>          g_free(s->vhost_net);
> @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
>          .check_peer_type = vhost_vdpa_check_peer_type,
>  };
>  
> +static int vhost_vdpa_get_iova_range(int fd,
> +                                     struct vhost_vdpa_iova_range *iova_range)
> +{
> +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> +
> +    return ret < 0 ? -errno : 0;
> +}
> +
>  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -                                           const char *device,
> -                                           const char *name,
> -                                           int vdpa_device_fd,
> -                                           int queue_pair_index,
> -                                           int nvqs,
> -                                           bool is_datapath)
> +                                       const char *device,
> +                                       const char *name,
> +                                       int vdpa_device_fd,
> +                                       int queue_pair_index,
> +                                       int nvqs,
> +                                       bool is_datapath,
> +                                       bool svq,
> +                                       VhostIOVATree *iova_tree)
>  {
>      NetClientState *nc = NULL;
>      VhostVDPAState *s;
> @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>  
>      s->vhost_vdpa.device_fd = vdpa_device_fd;
>      s->vhost_vdpa.index = queue_pair_index;
> +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> +    s->vhost_vdpa.iova_tree = iova_tree;
>      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>      if (ret) {
>          qemu_del_net_client(nc);
> @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>      g_autofree NetClientState **ncs = NULL;
>      NetClientState *nc;
>      int queue_pairs, i, has_cvq = 0;
> +    g_autoptr(VhostIOVATree) iova_tree = NULL;
>  
>      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>      opts = &netdev->u.vhost_vdpa;
> @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>          qemu_close(vdpa_device_fd);
>          return queue_pairs;
>      }
> +    if (opts->svq) {
> +        struct vhost_vdpa_iova_range iova_range;
> +
> +        if (has_cvq) {
> +            error_setg(errp, "vdpa svq does not work with cvq");
> +            goto err_svq;
> +        }
> +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> +    }
>  
>      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>  
>      for (i = 0; i < queue_pairs; i++) {
>          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true);
> +                                     vdpa_device_fd, i, 2, true, opts->svq,
> +                                     iova_tree);
>          if (!ncs[i])
>              goto err;
>      }
>  
>      if (has_cvq) {
>          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false);
> +                                 vdpa_device_fd, i, 1, false, opts->svq,
> +                                 iova_tree);
>          if (!nc)
>              goto err;
>      }
>  
> +    iova_tree = NULL;
>      return 0;
>  
>  err:
>      if (i) {
>          qemu_del_net_client(ncs[0]);
>      }
> +
> +err_svq:
>      qemu_close(vdpa_device_fd);
>  
>      return -1;
> -- 
> 2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
@ 2022-03-08  7:11     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:11 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> Finally offering the possibility to enable SVQ from the command line.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  qapi/net.json    |  8 +++++++-
>  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 47 insertions(+), 9 deletions(-)
> 
> diff --git a/qapi/net.json b/qapi/net.json
> index 7fab2e7cd8..d626fa441c 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -445,12 +445,18 @@
>  # @queues: number of queues to be created for multiqueue vhost-vdpa
>  #          (default: 1)
>  #
> +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> +#
> +# Features:
> +# @unstable: Member @svq is experimental.
> +#
>  # Since: 5.1
>  ##
>  { 'struct': 'NetdevVhostVDPAOptions',
>    'data': {
>      '*vhostdev':     'str',
> -    '*queues':       'int' } }
> +    '*queues':       'int',
> +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
>  
>  ##
>  # @NetClientDriver:

I think this should be x-svq same as other unstable features.

> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1e9fe47c03..c827921654 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -127,7 +127,11 @@ err_init:
>  static void vhost_vdpa_cleanup(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_dev *dev = s->vhost_vdpa.dev;
>  
> +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> +    }
>      if (s->vhost_net) {
>          vhost_net_cleanup(s->vhost_net);
>          g_free(s->vhost_net);
> @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
>          .check_peer_type = vhost_vdpa_check_peer_type,
>  };
>  
> +static int vhost_vdpa_get_iova_range(int fd,
> +                                     struct vhost_vdpa_iova_range *iova_range)
> +{
> +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> +
> +    return ret < 0 ? -errno : 0;
> +}
> +
>  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -                                           const char *device,
> -                                           const char *name,
> -                                           int vdpa_device_fd,
> -                                           int queue_pair_index,
> -                                           int nvqs,
> -                                           bool is_datapath)
> +                                       const char *device,
> +                                       const char *name,
> +                                       int vdpa_device_fd,
> +                                       int queue_pair_index,
> +                                       int nvqs,
> +                                       bool is_datapath,
> +                                       bool svq,
> +                                       VhostIOVATree *iova_tree)
>  {
>      NetClientState *nc = NULL;
>      VhostVDPAState *s;
> @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>  
>      s->vhost_vdpa.device_fd = vdpa_device_fd;
>      s->vhost_vdpa.index = queue_pair_index;
> +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> +    s->vhost_vdpa.iova_tree = iova_tree;
>      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>      if (ret) {
>          qemu_del_net_client(nc);
> @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>      g_autofree NetClientState **ncs = NULL;
>      NetClientState *nc;
>      int queue_pairs, i, has_cvq = 0;
> +    g_autoptr(VhostIOVATree) iova_tree = NULL;
>  
>      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>      opts = &netdev->u.vhost_vdpa;
> @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>          qemu_close(vdpa_device_fd);
>          return queue_pairs;
>      }
> +    if (opts->svq) {
> +        struct vhost_vdpa_iova_range iova_range;
> +
> +        if (has_cvq) {
> +            error_setg(errp, "vdpa svq does not work with cvq");
> +            goto err_svq;
> +        }
> +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> +    }
>  
>      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>  
>      for (i = 0; i < queue_pairs; i++) {
>          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true);
> +                                     vdpa_device_fd, i, 2, true, opts->svq,
> +                                     iova_tree);
>          if (!ncs[i])
>              goto err;
>      }
>  
>      if (has_cvq) {
>          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false);
> +                                 vdpa_device_fd, i, 1, false, opts->svq,
> +                                 iova_tree);
>          if (!nc)
>              goto err;
>      }
>  
> +    iova_tree = NULL;
>      return 0;
>  
>  err:
>      if (i) {
>          qemu_del_net_client(ncs[0]);
>      }
> +
> +err_svq:
>      qemu_close(vdpa_device_fd);
>  
>      return -1;
> -- 
> 2.27.0



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:11     ` Michael S. Tsirkin
@ 2022-03-08  7:14       ` Jason Wang
  -1 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  7:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> >
> > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > is intended as a new method of tracking the memory the devices touch
> > > during a migration process: Instead of relay on vhost device's dirty
> > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > descriptors between VM and device. This way qemu is the effective
> > > writer of guests memory, like in qemu's virtio device operation.
> > >
> > > When SVQ is enabled qemu offers a new virtual address space to the
> > > device to read and write into, and it maps new vrings and the guest
> > > memory in it. SVQ also intercepts kicks and calls between the device
> > > and the guest. Used buffers relay would cause dirty memory being
> > > tracked.
> > >
> > > This effectively means that vDPA device passthrough is intercepted by
> > > qemu. While SVQ should only be enabled at migration time, the switching
> > > from regular mode to SVQ mode is left for a future series.
> > >
> > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > not map the shadow vq in guest's VA, but in qemu's.
> > >
> > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > features like event_idx.
> > >
> > > SVQ needs to be enabled with cmdline:
> > >
> > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
>
> A stable API for an incomplete feature is a problem imho.

It should be "x-svq".

>
>
> > >
> > > The first three patches enables notifications forwarding with
> > > assistance of qemu. It's easy to enable only this if the relevant
> > > cmdline part of the last patch is applied on top of these.
> > >
> > > Next four patches implement the actual buffer forwarding. However,
> > > address are not translated from HVA so they will need a host device with
> > > an iommu allowing them to access all of the HVA range.
> > >
> > > The last part of the series uses properly the host iommu, so qemu
> > > creates a new iova address space in the device's range and translates
> > > the buffers in it. Finally, it adds the cmdline parameter.
> > >
> > > Some simple performance tests with netperf were done. They used a nested
> > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > baseline average of ~9009.96Mbps:
> > > Recv   Send    Send
> > > Socket Socket  Message  Elapsed
> > > Size   Size    Size     Time     Throughput
> > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > 131072  16384  16384    30.01    9061.03
> > > 131072  16384  16384    30.01    8962.94
> > > 131072  16384  16384    30.01    9005.92
> > >
> > > To enable SVQ buffers forwarding reduce throughput to about
> > > Recv   Send    Send
> > > Socket Socket  Message  Elapsed
> > > Size   Size    Size     Time     Throughput
> > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > 131072  16384  16384    30.01    7689.72
> > > 131072  16384  16384    30.00    7752.07
> > > 131072  16384  16384    30.01    7750.30
> > >
> > > However, many performance improvements were left out of this series for
> > > simplicity, so difference should shrink in the future.
> > >
> > > Comments are welcome.
> >
> >
> > Hi Michael:
> >
> > What do you think of this series? It looks good to me as a start. The
> > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > try to make it for 7.0.
> >
> > Thanks
>
> Well that's cutting it awfully close, and it's not really useful
> at the current stage, is it?

This allows vDPA to be migrated when using "x-svq=on". But anyhow it's
experimental.

>
> The IOVA trick does not feel complete either.

I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
for shadow vq) did. So we won't suffer from the issues of DPDK.

Thanks

>
> >
> > >
> > > TODO on future series:
> > > * Event, indirect, packed, and others features of virtio.
> > > * To support different set of features between the device<->SVQ and the
> > >    SVQ<->guest communication.
> > > * Support of device host notifier memory regions.
> > > * To sepparate buffers forwarding in its own AIO context, so we can
> > >    throw more threads to that task and we don't need to stop the main
> > >    event loop.
> > > * Support multiqueue virtio-net vdpa.
> > > * Proper documentation.
> > >
> > > Changes from v4:
> > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > >    overlaps.
> > > * Fix: Errno completion at failure.
> > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > >
> > > Changes from v3:
> > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > v3 link:
> > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > >
> > > Changes from v2:
> > > * Less assertions and more error handling in iova tree code.
> > > * Better documentation, both fixing errors and making @param: format
> > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > >    prefix at both times.
> > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > >    like the kernel driver code.
> > > * Small improvements.
> > > v2 link:
> > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > >
> > > Changes from v1:
> > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > * Size of SVQ is not max available device size anymore, but guest's
> > >    negotiated.
> > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > * Make SVQ a public struct
> > > * Come back to previous approach to iova-tree
> > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > * Refactor some errors and messages. Add missing error unwindings.
> > > * Add memory barrier at _F_NO_NOTIFY set.
> > > * Stop checking for features flags out of transport range.
> > > v1 link:
> > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > >
> > > Changes from v4 RFC:
> > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > >    already present iova-tree for that.
> > > * Proper validation of guest features. Now SVQ can negotiate a
> > >    different set of features with the device when enabled.
> > > * Support of host notifiers memory regions
> > > * Handling of SVQ full queue in case guest's descriptors span to
> > >    different memory regions (qemu's VA chunks).
> > > * Flush pending used buffers at end of SVQ operation.
> > > * QMP command now looks by NetClientState name. Other devices will need
> > >    to implement it's way to enable vdpa.
> > > * Rename QMP command to set, so it looks more like a way of working
> > > * Better use of qemu error system
> > > * Make a few assertions proper error-handling paths.
> > > * Add more documentation
> > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > * Addressed many other small comments and small fixes.
> > >
> > > Changes from v3 RFC:
> > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > >      some cleanup but more code has been added in other places.
> > >    * More use of glib utilities, especially to manage memory.
> > > v3 link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > >
> > > Changes from v2 RFC:
> > >    * Adding vhost-vdpa devices support
> > >    * Fixed some memory leaks pointed by different comments
> > > v2 link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > >
> > > Changes from v1 RFC:
> > >    * Use QMP instead of migration to start SVQ mode.
> > >    * Only accepting IOMMU devices, closer behavior with target devices
> > >      (vDPA)
> > >    * Fix invalid masking/unmasking of vhost call fd.
> > >    * Use of proper methods for synchronization.
> > >    * No need to modify VirtIO device code, all of the changes are
> > >      contained in vhost code.
> > >    * Delete superfluous code.
> > >    * An intermediate RFC was sent with only the notifications forwarding
> > >      changes. It can be seen in
> > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > v1 link:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > >
> > > Eugenio Pérez (20):
> > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > >        vhost: Make vhost_virtqueue_{start,stop} public
> > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > >        vhost: Add VhostShadowVirtqueue
> > >        vdpa: Register vdpa devices in a list
> > >        vhost: Route guest->host notification through shadow virtqueue
> > >        Add vhost_svq_get_svq_call_notifier
> > >        Add vhost_svq_set_guest_call_notifier
> > >        vdpa: Save call_fd in vhost-vdpa
> > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > >        vhost: Route host->guest notification through shadow virtqueue
> > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > >        vdpa: Save host and guest features
> > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > >        vhost: Shadow virtqueue buffers forwarding
> > >        vhost: Add VhostIOVATree
> > >        vhost: Use a tree to store memory mappings
> > >        vdpa: Add custom IOTLB translations to SVQ
> > >
> > > Eugenio Pérez (15):
> > >    vhost: Add VhostShadowVirtqueue
> > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > >    vhost: Add vhost_svq_valid_features to shadow vq
> > >    virtio: Add vhost_svq_get_vring_addr
> > >    vdpa: adapt vhost_ops callbacks to svq
> > >    vhost: Shadow virtqueue buffers forwarding
> > >    util: Add iova_tree_alloc_map
> > >    util: add iova_tree_find_iova
> > >    vhost: Add VhostIOVATree
> > >    vdpa: Add custom IOTLB translations to SVQ
> > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > >    vdpa: Never set log_base addr if SVQ is enabled
> > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > >
> > >   qapi/net.json                      |   8 +-
> > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > >   include/qemu/iova-tree.h           |  38 +-
> > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > >   net/vhost-vdpa.c                   |  48 ++-
> > >   util/iova-tree.c                   | 169 ++++++++
> > >   hw/virtio/meson.build              |   2 +-
> > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  7:14       ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  7:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> >
> > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > is intended as a new method of tracking the memory the devices touch
> > > during a migration process: Instead of relay on vhost device's dirty
> > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > descriptors between VM and device. This way qemu is the effective
> > > writer of guests memory, like in qemu's virtio device operation.
> > >
> > > When SVQ is enabled qemu offers a new virtual address space to the
> > > device to read and write into, and it maps new vrings and the guest
> > > memory in it. SVQ also intercepts kicks and calls between the device
> > > and the guest. Used buffers relay would cause dirty memory being
> > > tracked.
> > >
> > > This effectively means that vDPA device passthrough is intercepted by
> > > qemu. While SVQ should only be enabled at migration time, the switching
> > > from regular mode to SVQ mode is left for a future series.
> > >
> > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > not map the shadow vq in guest's VA, but in qemu's.
> > >
> > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > features like event_idx.
> > >
> > > SVQ needs to be enabled with cmdline:
> > >
> > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
>
> A stable API for an incomplete feature is a problem imho.

It should be "x-svq".

>
>
> > >
> > > The first three patches enables notifications forwarding with
> > > assistance of qemu. It's easy to enable only this if the relevant
> > > cmdline part of the last patch is applied on top of these.
> > >
> > > Next four patches implement the actual buffer forwarding. However,
> > > address are not translated from HVA so they will need a host device with
> > > an iommu allowing them to access all of the HVA range.
> > >
> > > The last part of the series uses properly the host iommu, so qemu
> > > creates a new iova address space in the device's range and translates
> > > the buffers in it. Finally, it adds the cmdline parameter.
> > >
> > > Some simple performance tests with netperf were done. They used a nested
> > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > baseline average of ~9009.96Mbps:
> > > Recv   Send    Send
> > > Socket Socket  Message  Elapsed
> > > Size   Size    Size     Time     Throughput
> > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > 131072  16384  16384    30.01    9061.03
> > > 131072  16384  16384    30.01    8962.94
> > > 131072  16384  16384    30.01    9005.92
> > >
> > > To enable SVQ buffers forwarding reduce throughput to about
> > > Recv   Send    Send
> > > Socket Socket  Message  Elapsed
> > > Size   Size    Size     Time     Throughput
> > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > 131072  16384  16384    30.01    7689.72
> > > 131072  16384  16384    30.00    7752.07
> > > 131072  16384  16384    30.01    7750.30
> > >
> > > However, many performance improvements were left out of this series for
> > > simplicity, so difference should shrink in the future.
> > >
> > > Comments are welcome.
> >
> >
> > Hi Michael:
> >
> > What do you think of this series? It looks good to me as a start. The
> > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > try to make it for 7.0.
> >
> > Thanks
>
> Well that's cutting it awfully close, and it's not really useful
> at the current stage, is it?

This allows vDPA to be migrated when using "x-svq=on". But anyhow it's
experimental.

>
> The IOVA trick does not feel complete either.

I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
for shadow vq) did. So we won't suffer from the issues of DPDK.

Thanks

>
> >
> > >
> > > TODO on future series:
> > > * Event, indirect, packed, and others features of virtio.
> > > * To support different set of features between the device<->SVQ and the
> > >    SVQ<->guest communication.
> > > * Support of device host notifier memory regions.
> > > * To sepparate buffers forwarding in its own AIO context, so we can
> > >    throw more threads to that task and we don't need to stop the main
> > >    event loop.
> > > * Support multiqueue virtio-net vdpa.
> > > * Proper documentation.
> > >
> > > Changes from v4:
> > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > >    overlaps.
> > > * Fix: Errno completion at failure.
> > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > >
> > > Changes from v3:
> > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > v3 link:
> > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > >
> > > Changes from v2:
> > > * Less assertions and more error handling in iova tree code.
> > > * Better documentation, both fixing errors and making @param: format
> > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > >    prefix at both times.
> > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > >    like the kernel driver code.
> > > * Small improvements.
> > > v2 link:
> > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > >
> > > Changes from v1:
> > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > * Size of SVQ is not max available device size anymore, but guest's
> > >    negotiated.
> > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > * Make SVQ a public struct
> > > * Come back to previous approach to iova-tree
> > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > * Refactor some errors and messages. Add missing error unwindings.
> > > * Add memory barrier at _F_NO_NOTIFY set.
> > > * Stop checking for features flags out of transport range.
> > > v1 link:
> > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > >
> > > Changes from v4 RFC:
> > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > >    already present iova-tree for that.
> > > * Proper validation of guest features. Now SVQ can negotiate a
> > >    different set of features with the device when enabled.
> > > * Support of host notifiers memory regions
> > > * Handling of SVQ full queue in case guest's descriptors span to
> > >    different memory regions (qemu's VA chunks).
> > > * Flush pending used buffers at end of SVQ operation.
> > > * QMP command now looks by NetClientState name. Other devices will need
> > >    to implement it's way to enable vdpa.
> > > * Rename QMP command to set, so it looks more like a way of working
> > > * Better use of qemu error system
> > > * Make a few assertions proper error-handling paths.
> > > * Add more documentation
> > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > * Addressed many other small comments and small fixes.
> > >
> > > Changes from v3 RFC:
> > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > >      some cleanup but more code has been added in other places.
> > >    * More use of glib utilities, especially to manage memory.
> > > v3 link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > >
> > > Changes from v2 RFC:
> > >    * Adding vhost-vdpa devices support
> > >    * Fixed some memory leaks pointed by different comments
> > > v2 link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > >
> > > Changes from v1 RFC:
> > >    * Use QMP instead of migration to start SVQ mode.
> > >    * Only accepting IOMMU devices, closer behavior with target devices
> > >      (vDPA)
> > >    * Fix invalid masking/unmasking of vhost call fd.
> > >    * Use of proper methods for synchronization.
> > >    * No need to modify VirtIO device code, all of the changes are
> > >      contained in vhost code.
> > >    * Delete superfluous code.
> > >    * An intermediate RFC was sent with only the notifications forwarding
> > >      changes. It can be seen in
> > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > v1 link:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > >
> > > Eugenio Pérez (20):
> > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > >        vhost: Make vhost_virtqueue_{start,stop} public
> > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > >        vhost: Add VhostShadowVirtqueue
> > >        vdpa: Register vdpa devices in a list
> > >        vhost: Route guest->host notification through shadow virtqueue
> > >        Add vhost_svq_get_svq_call_notifier
> > >        Add vhost_svq_set_guest_call_notifier
> > >        vdpa: Save call_fd in vhost-vdpa
> > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > >        vhost: Route host->guest notification through shadow virtqueue
> > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > >        vdpa: Save host and guest features
> > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > >        vhost: Shadow virtqueue buffers forwarding
> > >        vhost: Add VhostIOVATree
> > >        vhost: Use a tree to store memory mappings
> > >        vdpa: Add custom IOTLB translations to SVQ
> > >
> > > Eugenio Pérez (15):
> > >    vhost: Add VhostShadowVirtqueue
> > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > >    vhost: Add vhost_svq_valid_features to shadow vq
> > >    virtio: Add vhost_svq_get_vring_addr
> > >    vdpa: adapt vhost_ops callbacks to svq
> > >    vhost: Shadow virtqueue buffers forwarding
> > >    util: Add iova_tree_alloc_map
> > >    util: add iova_tree_find_iova
> > >    vhost: Add VhostIOVATree
> > >    vdpa: Add custom IOTLB translations to SVQ
> > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > >    vdpa: Never set log_base addr if SVQ is enabled
> > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > >
> > >   qapi/net.json                      |   8 +-
> > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > >   include/qemu/iova-tree.h           |  38 +-
> > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > >   net/vhost-vdpa.c                   |  48 ++-
> > >   util/iova-tree.c                   | 169 ++++++++
> > >   hw/virtio/meson.build              |   2 +-
> > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > >
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:14       ` Jason Wang
@ 2022-03-08  7:27         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > >
> > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > is intended as a new method of tracking the memory the devices touch
> > > > during a migration process: Instead of relay on vhost device's dirty
> > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > descriptors between VM and device. This way qemu is the effective
> > > > writer of guests memory, like in qemu's virtio device operation.
> > > >
> > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > device to read and write into, and it maps new vrings and the guest
> > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > and the guest. Used buffers relay would cause dirty memory being
> > > > tracked.
> > > >
> > > > This effectively means that vDPA device passthrough is intercepted by
> > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > from regular mode to SVQ mode is left for a future series.
> > > >
> > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > not map the shadow vq in guest's VA, but in qemu's.
> > > >
> > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > features like event_idx.
> > > >
> > > > SVQ needs to be enabled with cmdline:
> > > >
> > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> >
> > A stable API for an incomplete feature is a problem imho.
> 
> It should be "x-svq".


Well look at patch 15.

> >
> >
> > > >
> > > > The first three patches enables notifications forwarding with
> > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > cmdline part of the last patch is applied on top of these.
> > > >
> > > > Next four patches implement the actual buffer forwarding. However,
> > > > address are not translated from HVA so they will need a host device with
> > > > an iommu allowing them to access all of the HVA range.
> > > >
> > > > The last part of the series uses properly the host iommu, so qemu
> > > > creates a new iova address space in the device's range and translates
> > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > >
> > > > Some simple performance tests with netperf were done. They used a nested
> > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > baseline average of ~9009.96Mbps:
> > > > Recv   Send    Send
> > > > Socket Socket  Message  Elapsed
> > > > Size   Size    Size     Time     Throughput
> > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > 131072  16384  16384    30.01    9061.03
> > > > 131072  16384  16384    30.01    8962.94
> > > > 131072  16384  16384    30.01    9005.92
> > > >
> > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > Recv   Send    Send
> > > > Socket Socket  Message  Elapsed
> > > > Size   Size    Size     Time     Throughput
> > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > 131072  16384  16384    30.01    7689.72
> > > > 131072  16384  16384    30.00    7752.07
> > > > 131072  16384  16384    30.01    7750.30
> > > >
> > > > However, many performance improvements were left out of this series for
> > > > simplicity, so difference should shrink in the future.
> > > >
> > > > Comments are welcome.
> > >
> > >
> > > Hi Michael:
> > >
> > > What do you think of this series? It looks good to me as a start. The
> > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > try to make it for 7.0.
> > >
> > > Thanks
> >
> > Well that's cutting it awfully close, and it's not really useful
> > at the current stage, is it?
> 
> This allows vDPA to be migrated when using "x-svq=on".
> But anyhow it's
> experimental.

it's less experimental than incomplete. It seems pretty clearly not
the way it will work down the road, we don't want svq involved
at all times.

> >
> > The IOVA trick does not feel complete either.
> 
> I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> for shadow vq) did. So we won't suffer from the issues of DPDK.
> 
> Thanks

Maybe I misundrstand how this all works.
I refer to all the iova_tree_alloc_map things.

> >
> > >
> > > >
> > > > TODO on future series:
> > > > * Event, indirect, packed, and others features of virtio.
> > > > * To support different set of features between the device<->SVQ and the
> > > >    SVQ<->guest communication.
> > > > * Support of device host notifier memory regions.
> > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > >    throw more threads to that task and we don't need to stop the main
> > > >    event loop.
> > > > * Support multiqueue virtio-net vdpa.
> > > > * Proper documentation.
> > > >
> > > > Changes from v4:
> > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > >    overlaps.
> > > > * Fix: Errno completion at failure.
> > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > >
> > > > Changes from v3:
> > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > v3 link:
> > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > >
> > > > Changes from v2:
> > > > * Less assertions and more error handling in iova tree code.
> > > > * Better documentation, both fixing errors and making @param: format
> > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > >    prefix at both times.
> > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > >    like the kernel driver code.
> > > > * Small improvements.
> > > > v2 link:
> > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > >
> > > > Changes from v1:
> > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > * Size of SVQ is not max available device size anymore, but guest's
> > > >    negotiated.
> > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > * Make SVQ a public struct
> > > > * Come back to previous approach to iova-tree
> > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > * Stop checking for features flags out of transport range.
> > > > v1 link:
> > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > >
> > > > Changes from v4 RFC:
> > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > >    already present iova-tree for that.
> > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > >    different set of features with the device when enabled.
> > > > * Support of host notifiers memory regions
> > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > >    different memory regions (qemu's VA chunks).
> > > > * Flush pending used buffers at end of SVQ operation.
> > > > * QMP command now looks by NetClientState name. Other devices will need
> > > >    to implement it's way to enable vdpa.
> > > > * Rename QMP command to set, so it looks more like a way of working
> > > > * Better use of qemu error system
> > > > * Make a few assertions proper error-handling paths.
> > > > * Add more documentation
> > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > * Addressed many other small comments and small fixes.
> > > >
> > > > Changes from v3 RFC:
> > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > >      some cleanup but more code has been added in other places.
> > > >    * More use of glib utilities, especially to manage memory.
> > > > v3 link:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > >
> > > > Changes from v2 RFC:
> > > >    * Adding vhost-vdpa devices support
> > > >    * Fixed some memory leaks pointed by different comments
> > > > v2 link:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > >
> > > > Changes from v1 RFC:
> > > >    * Use QMP instead of migration to start SVQ mode.
> > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > >      (vDPA)
> > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > >    * Use of proper methods for synchronization.
> > > >    * No need to modify VirtIO device code, all of the changes are
> > > >      contained in vhost code.
> > > >    * Delete superfluous code.
> > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > >      changes. It can be seen in
> > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > v1 link:
> > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > >
> > > > Eugenio Pérez (20):
> > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > >        vhost: Add VhostShadowVirtqueue
> > > >        vdpa: Register vdpa devices in a list
> > > >        vhost: Route guest->host notification through shadow virtqueue
> > > >        Add vhost_svq_get_svq_call_notifier
> > > >        Add vhost_svq_set_guest_call_notifier
> > > >        vdpa: Save call_fd in vhost-vdpa
> > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > >        vhost: Route host->guest notification through shadow virtqueue
> > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > >        vdpa: Save host and guest features
> > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > >        vhost: Shadow virtqueue buffers forwarding
> > > >        vhost: Add VhostIOVATree
> > > >        vhost: Use a tree to store memory mappings
> > > >        vdpa: Add custom IOTLB translations to SVQ
> > > >
> > > > Eugenio Pérez (15):
> > > >    vhost: Add VhostShadowVirtqueue
> > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > >    virtio: Add vhost_svq_get_vring_addr
> > > >    vdpa: adapt vhost_ops callbacks to svq
> > > >    vhost: Shadow virtqueue buffers forwarding
> > > >    util: Add iova_tree_alloc_map
> > > >    util: add iova_tree_find_iova
> > > >    vhost: Add VhostIOVATree
> > > >    vdpa: Add custom IOTLB translations to SVQ
> > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > >
> > > >   qapi/net.json                      |   8 +-
> > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > >   include/qemu/iova-tree.h           |  38 +-
> > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > >   net/vhost-vdpa.c                   |  48 ++-
> > > >   util/iova-tree.c                   | 169 ++++++++
> > > >   hw/virtio/meson.build              |   2 +-
> > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  7:27         ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > >
> > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > is intended as a new method of tracking the memory the devices touch
> > > > during a migration process: Instead of relay on vhost device's dirty
> > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > descriptors between VM and device. This way qemu is the effective
> > > > writer of guests memory, like in qemu's virtio device operation.
> > > >
> > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > device to read and write into, and it maps new vrings and the guest
> > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > and the guest. Used buffers relay would cause dirty memory being
> > > > tracked.
> > > >
> > > > This effectively means that vDPA device passthrough is intercepted by
> > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > from regular mode to SVQ mode is left for a future series.
> > > >
> > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > not map the shadow vq in guest's VA, but in qemu's.
> > > >
> > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > features like event_idx.
> > > >
> > > > SVQ needs to be enabled with cmdline:
> > > >
> > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> >
> > A stable API for an incomplete feature is a problem imho.
> 
> It should be "x-svq".


Well look at patch 15.

> >
> >
> > > >
> > > > The first three patches enables notifications forwarding with
> > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > cmdline part of the last patch is applied on top of these.
> > > >
> > > > Next four patches implement the actual buffer forwarding. However,
> > > > address are not translated from HVA so they will need a host device with
> > > > an iommu allowing them to access all of the HVA range.
> > > >
> > > > The last part of the series uses properly the host iommu, so qemu
> > > > creates a new iova address space in the device's range and translates
> > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > >
> > > > Some simple performance tests with netperf were done. They used a nested
> > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > baseline average of ~9009.96Mbps:
> > > > Recv   Send    Send
> > > > Socket Socket  Message  Elapsed
> > > > Size   Size    Size     Time     Throughput
> > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > 131072  16384  16384    30.01    9061.03
> > > > 131072  16384  16384    30.01    8962.94
> > > > 131072  16384  16384    30.01    9005.92
> > > >
> > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > Recv   Send    Send
> > > > Socket Socket  Message  Elapsed
> > > > Size   Size    Size     Time     Throughput
> > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > 131072  16384  16384    30.01    7689.72
> > > > 131072  16384  16384    30.00    7752.07
> > > > 131072  16384  16384    30.01    7750.30
> > > >
> > > > However, many performance improvements were left out of this series for
> > > > simplicity, so difference should shrink in the future.
> > > >
> > > > Comments are welcome.
> > >
> > >
> > > Hi Michael:
> > >
> > > What do you think of this series? It looks good to me as a start. The
> > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > try to make it for 7.0.
> > >
> > > Thanks
> >
> > Well that's cutting it awfully close, and it's not really useful
> > at the current stage, is it?
> 
> This allows vDPA to be migrated when using "x-svq=on".
> But anyhow it's
> experimental.

it's less experimental than incomplete. It seems pretty clearly not
the way it will work down the road, we don't want svq involved
at all times.

> >
> > The IOVA trick does not feel complete either.
> 
> I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> for shadow vq) did. So we won't suffer from the issues of DPDK.
> 
> Thanks

Maybe I misundrstand how this all works.
I refer to all the iova_tree_alloc_map things.

> >
> > >
> > > >
> > > > TODO on future series:
> > > > * Event, indirect, packed, and others features of virtio.
> > > > * To support different set of features between the device<->SVQ and the
> > > >    SVQ<->guest communication.
> > > > * Support of device host notifier memory regions.
> > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > >    throw more threads to that task and we don't need to stop the main
> > > >    event loop.
> > > > * Support multiqueue virtio-net vdpa.
> > > > * Proper documentation.
> > > >
> > > > Changes from v4:
> > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > >    overlaps.
> > > > * Fix: Errno completion at failure.
> > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > >
> > > > Changes from v3:
> > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > v3 link:
> > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > >
> > > > Changes from v2:
> > > > * Less assertions and more error handling in iova tree code.
> > > > * Better documentation, both fixing errors and making @param: format
> > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > >    prefix at both times.
> > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > >    like the kernel driver code.
> > > > * Small improvements.
> > > > v2 link:
> > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > >
> > > > Changes from v1:
> > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > * Size of SVQ is not max available device size anymore, but guest's
> > > >    negotiated.
> > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > * Make SVQ a public struct
> > > > * Come back to previous approach to iova-tree
> > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > * Stop checking for features flags out of transport range.
> > > > v1 link:
> > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > >
> > > > Changes from v4 RFC:
> > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > >    already present iova-tree for that.
> > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > >    different set of features with the device when enabled.
> > > > * Support of host notifiers memory regions
> > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > >    different memory regions (qemu's VA chunks).
> > > > * Flush pending used buffers at end of SVQ operation.
> > > > * QMP command now looks by NetClientState name. Other devices will need
> > > >    to implement it's way to enable vdpa.
> > > > * Rename QMP command to set, so it looks more like a way of working
> > > > * Better use of qemu error system
> > > > * Make a few assertions proper error-handling paths.
> > > > * Add more documentation
> > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > * Addressed many other small comments and small fixes.
> > > >
> > > > Changes from v3 RFC:
> > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > >      some cleanup but more code has been added in other places.
> > > >    * More use of glib utilities, especially to manage memory.
> > > > v3 link:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > >
> > > > Changes from v2 RFC:
> > > >    * Adding vhost-vdpa devices support
> > > >    * Fixed some memory leaks pointed by different comments
> > > > v2 link:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > >
> > > > Changes from v1 RFC:
> > > >    * Use QMP instead of migration to start SVQ mode.
> > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > >      (vDPA)
> > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > >    * Use of proper methods for synchronization.
> > > >    * No need to modify VirtIO device code, all of the changes are
> > > >      contained in vhost code.
> > > >    * Delete superfluous code.
> > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > >      changes. It can be seen in
> > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > v1 link:
> > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > >
> > > > Eugenio Pérez (20):
> > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > >        vhost: Add VhostShadowVirtqueue
> > > >        vdpa: Register vdpa devices in a list
> > > >        vhost: Route guest->host notification through shadow virtqueue
> > > >        Add vhost_svq_get_svq_call_notifier
> > > >        Add vhost_svq_set_guest_call_notifier
> > > >        vdpa: Save call_fd in vhost-vdpa
> > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > >        vhost: Route host->guest notification through shadow virtqueue
> > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > >        vdpa: Save host and guest features
> > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > >        vhost: Shadow virtqueue buffers forwarding
> > > >        vhost: Add VhostIOVATree
> > > >        vhost: Use a tree to store memory mappings
> > > >        vdpa: Add custom IOTLB translations to SVQ
> > > >
> > > > Eugenio Pérez (15):
> > > >    vhost: Add VhostShadowVirtqueue
> > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > >    virtio: Add vhost_svq_get_vring_addr
> > > >    vdpa: adapt vhost_ops callbacks to svq
> > > >    vhost: Shadow virtqueue buffers forwarding
> > > >    util: Add iova_tree_alloc_map
> > > >    util: add iova_tree_find_iova
> > > >    vhost: Add VhostIOVATree
> > > >    vdpa: Add custom IOTLB translations to SVQ
> > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > >
> > > >   qapi/net.json                      |   8 +-
> > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > >   include/qemu/iova-tree.h           |  38 +-
> > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > >   net/vhost-vdpa.c                   |  48 ++-
> > > >   util/iova-tree.c                   | 169 ++++++++
> > > >   hw/virtio/meson.build              |   2 +-
> > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > >
> >



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-08  7:11     ` Michael S. Tsirkin
  (?)
@ 2022-03-08  7:32     ` Eugenio Perez Martin
  2022-03-08  7:33       ` Eugenio Perez Martin
  2022-03-08  8:02         ` Michael S. Tsirkin
  -1 siblings, 2 replies; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08  7:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-level, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > Finally offering the possibility to enable SVQ from the command line.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  qapi/net.json    |  8 +++++++-
> >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> >  2 files changed, 47 insertions(+), 9 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 7fab2e7cd8..d626fa441c 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -445,12 +445,18 @@
> >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> >  #          (default: 1)
> >  #
> > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > +#
> > +# Features:
> > +# @unstable: Member @svq is experimental.
> > +#
> >  # Since: 5.1
> >  ##
> >  { 'struct': 'NetdevVhostVDPAOptions',
> >    'data': {
> >      '*vhostdev':     'str',
> > -    '*queues':       'int' } }
> > +    '*queues':       'int',
> > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> >
> >  ##
> >  # @NetClientDriver:
>
> I think this should be x-svq same as other unstable features.
>

I'm fine with both, but I was pointed to the other direction at [1] and [2].

Thanks!

[1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
[2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/

> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 1e9fe47c03..c827921654 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -127,7 +127,11 @@ err_init:
> >  static void vhost_vdpa_cleanup(NetClientState *nc)
> >  {
> >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> >
> > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > +    }
> >      if (s->vhost_net) {
> >          vhost_net_cleanup(s->vhost_net);
> >          g_free(s->vhost_net);
> > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> >          .check_peer_type = vhost_vdpa_check_peer_type,
> >  };
> >
> > +static int vhost_vdpa_get_iova_range(int fd,
> > +                                     struct vhost_vdpa_iova_range *iova_range)
> > +{
> > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > +
> > +    return ret < 0 ? -errno : 0;
> > +}
> > +
> >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > -                                           const char *device,
> > -                                           const char *name,
> > -                                           int vdpa_device_fd,
> > -                                           int queue_pair_index,
> > -                                           int nvqs,
> > -                                           bool is_datapath)
> > +                                       const char *device,
> > +                                       const char *name,
> > +                                       int vdpa_device_fd,
> > +                                       int queue_pair_index,
> > +                                       int nvqs,
> > +                                       bool is_datapath,
> > +                                       bool svq,
> > +                                       VhostIOVATree *iova_tree)
> >  {
> >      NetClientState *nc = NULL;
> >      VhostVDPAState *s;
> > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> >      s->vhost_vdpa.index = queue_pair_index;
> > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > +    s->vhost_vdpa.iova_tree = iova_tree;
> >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >      if (ret) {
> >          qemu_del_net_client(nc);
> > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >      g_autofree NetClientState **ncs = NULL;
> >      NetClientState *nc;
> >      int queue_pairs, i, has_cvq = 0;
> > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >
> >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >      opts = &netdev->u.vhost_vdpa;
> > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >          qemu_close(vdpa_device_fd);
> >          return queue_pairs;
> >      }
> > +    if (opts->svq) {
> > +        struct vhost_vdpa_iova_range iova_range;
> > +
> > +        if (has_cvq) {
> > +            error_setg(errp, "vdpa svq does not work with cvq");
> > +            goto err_svq;
> > +        }
> > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > +    }
> >
> >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >
> >      for (i = 0; i < queue_pairs; i++) {
> >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                     vdpa_device_fd, i, 2, true);
> > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > +                                     iova_tree);
> >          if (!ncs[i])
> >              goto err;
> >      }
> >
> >      if (has_cvq) {
> >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                 vdpa_device_fd, i, 1, false);
> > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > +                                 iova_tree);
> >          if (!nc)
> >              goto err;
> >      }
> >
> > +    iova_tree = NULL;
> >      return 0;
> >
> >  err:
> >      if (i) {
> >          qemu_del_net_client(ncs[0]);
> >      }
> > +
> > +err_svq:
> >      qemu_close(vdpa_device_fd);
> >
> >      return -1;
> > --
> > 2.27.0
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-08  7:32     ` Eugenio Perez Martin
@ 2022-03-08  7:33       ` Eugenio Perez Martin
  2022-03-08  8:02         ` Michael S. Tsirkin
  1 sibling, 0 replies; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08  7:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-level, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 8:32 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > > Finally offering the possibility to enable SVQ from the command line.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  qapi/net.json    |  8 +++++++-
> > >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > >  2 files changed, 47 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/qapi/net.json b/qapi/net.json
> > > index 7fab2e7cd8..d626fa441c 100644
> > > --- a/qapi/net.json
> > > +++ b/qapi/net.json
> > > @@ -445,12 +445,18 @@
> > >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> > >  #          (default: 1)
> > >  #
> > > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > > +#
> > > +# Features:
> > > +# @unstable: Member @svq is experimental.
> > > +#
> > >  # Since: 5.1
> > >  ##
> > >  { 'struct': 'NetdevVhostVDPAOptions',
> > >    'data': {
> > >      '*vhostdev':     'str',
> > > -    '*queues':       'int' } }
> > > +    '*queues':       'int',
> > > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> > >
> > >  ##
> > >  # @NetClientDriver:
> >
> > I think this should be x-svq same as other unstable features.
> >
>
> I'm fine with both, but I was pointed to the other direction at [1] and [2].
>

(Sorry, I hit "send" too quick)

What I totally missed was to change the subject of this patch, I could
send a new series with that if you want.

> Thanks!
>
> [1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
> [2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/
>
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 1e9fe47c03..c827921654 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -127,7 +127,11 @@ err_init:
> > >  static void vhost_vdpa_cleanup(NetClientState *nc)
> > >  {
> > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > >
> > > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > +    }
> > >      if (s->vhost_net) {
> > >          vhost_net_cleanup(s->vhost_net);
> > >          g_free(s->vhost_net);
> > > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > >  };
> > >
> > > +static int vhost_vdpa_get_iova_range(int fd,
> > > +                                     struct vhost_vdpa_iova_range *iova_range)
> > > +{
> > > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > > +
> > > +    return ret < 0 ? -errno : 0;
> > > +}
> > > +
> > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > -                                           const char *device,
> > > -                                           const char *name,
> > > -                                           int vdpa_device_fd,
> > > -                                           int queue_pair_index,
> > > -                                           int nvqs,
> > > -                                           bool is_datapath)
> > > +                                       const char *device,
> > > +                                       const char *name,
> > > +                                       int vdpa_device_fd,
> > > +                                       int queue_pair_index,
> > > +                                       int nvqs,
> > > +                                       bool is_datapath,
> > > +                                       bool svq,
> > > +                                       VhostIOVATree *iova_tree)
> > >  {
> > >      NetClientState *nc = NULL;
> > >      VhostVDPAState *s;
> > > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >
> > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > >      s->vhost_vdpa.index = queue_pair_index;
> > > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > +    s->vhost_vdpa.iova_tree = iova_tree;
> > >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > >      if (ret) {
> > >          qemu_del_net_client(nc);
> > > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >      g_autofree NetClientState **ncs = NULL;
> > >      NetClientState *nc;
> > >      int queue_pairs, i, has_cvq = 0;
> > > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > >
> > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >      opts = &netdev->u.vhost_vdpa;
> > > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >          qemu_close(vdpa_device_fd);
> > >          return queue_pairs;
> > >      }
> > > +    if (opts->svq) {
> > > +        struct vhost_vdpa_iova_range iova_range;
> > > +
> > > +        if (has_cvq) {
> > > +            error_setg(errp, "vdpa svq does not work with cvq");
> > > +            goto err_svq;
> > > +        }
> > > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > > +    }
> > >
> > >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > >
> > >      for (i = 0; i < queue_pairs; i++) {
> > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                     vdpa_device_fd, i, 2, true);
> > > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > > +                                     iova_tree);
> > >          if (!ncs[i])
> > >              goto err;
> > >      }
> > >
> > >      if (has_cvq) {
> > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                 vdpa_device_fd, i, 1, false);
> > > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > > +                                 iova_tree);
> > >          if (!nc)
> > >              goto err;
> > >      }
> > >
> > > +    iova_tree = NULL;
> > >      return 0;
> > >
> > >  err:
> > >      if (i) {
> > >          qemu_del_net_client(ncs[0]);
> > >      }
> > > +
> > > +err_svq:
> > >      qemu_close(vdpa_device_fd);
> > >
> > >      return -1;
> > > --
> > > 2.27.0
> >



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:27         ` Michael S. Tsirkin
@ 2022-03-08  7:34           ` Jason Wang
  -1 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  7:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > is intended as a new method of tracking the memory the devices touch
> > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > descriptors between VM and device. This way qemu is the effective
> > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > >
> > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > device to read and write into, and it maps new vrings and the guest
> > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > tracked.
> > > > >
> > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > from regular mode to SVQ mode is left for a future series.
> > > > >
> > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > >
> > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > features like event_idx.
> > > > >
> > > > > SVQ needs to be enabled with cmdline:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > >
> > > A stable API for an incomplete feature is a problem imho.
> >
> > It should be "x-svq".
>
>
> Well look at patch 15.

It's a bug that needs to be fixed.

>
> > >
> > >
> > > > >
> > > > > The first three patches enables notifications forwarding with
> > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > cmdline part of the last patch is applied on top of these.
> > > > >
> > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > address are not translated from HVA so they will need a host device with
> > > > > an iommu allowing them to access all of the HVA range.
> > > > >
> > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > creates a new iova address space in the device's range and translates
> > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > >
> > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > baseline average of ~9009.96Mbps:
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    9061.03
> > > > > 131072  16384  16384    30.01    8962.94
> > > > > 131072  16384  16384    30.01    9005.92
> > > > >
> > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    7689.72
> > > > > 131072  16384  16384    30.00    7752.07
> > > > > 131072  16384  16384    30.01    7750.30
> > > > >
> > > > > However, many performance improvements were left out of this series for
> > > > > simplicity, so difference should shrink in the future.
> > > > >
> > > > > Comments are welcome.
> > > >
> > > >
> > > > Hi Michael:
> > > >
> > > > What do you think of this series? It looks good to me as a start. The
> > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > try to make it for 7.0.
> > > >
> > > > Thanks
> > >
> > > Well that's cutting it awfully close, and it's not really useful
> > > at the current stage, is it?
> >
> > This allows vDPA to be migrated when using "x-svq=on".
> > But anyhow it's
> > experimental.
>
> it's less experimental than incomplete. It seems pretty clearly not
> the way it will work down the road, we don't want svq involved
> at all times.

Right, but SVQ could be used for other places e.g providing migration
compatibility when the destination lacks some features.

>
> > >
> > > The IOVA trick does not feel complete either.
> >
> > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > for shadow vq) did. So we won't suffer from the issues of DPDK.
> >
> > Thanks
>
> Maybe I misundrstand how this all works.
> I refer to all the iova_tree_alloc_map things.

It's a simple IOVA allocater actually. Anything wrong with that?

I'm fine with making it for the future release.

Thanks

>
> > >
> > > >
> > > > >
> > > > > TODO on future series:
> > > > > * Event, indirect, packed, and others features of virtio.
> > > > > * To support different set of features between the device<->SVQ and the
> > > > >    SVQ<->guest communication.
> > > > > * Support of device host notifier memory regions.
> > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > >    throw more threads to that task and we don't need to stop the main
> > > > >    event loop.
> > > > > * Support multiqueue virtio-net vdpa.
> > > > > * Proper documentation.
> > > > >
> > > > > Changes from v4:
> > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > >    overlaps.
> > > > > * Fix: Errno completion at failure.
> > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > >
> > > > > Changes from v3:
> > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > v3 link:
> > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > >
> > > > > Changes from v2:
> > > > > * Less assertions and more error handling in iova tree code.
> > > > > * Better documentation, both fixing errors and making @param: format
> > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > >    prefix at both times.
> > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > >    like the kernel driver code.
> > > > > * Small improvements.
> > > > > v2 link:
> > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > >
> > > > > Changes from v1:
> > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > >    negotiated.
> > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > * Make SVQ a public struct
> > > > > * Come back to previous approach to iova-tree
> > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > * Stop checking for features flags out of transport range.
> > > > > v1 link:
> > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > >
> > > > > Changes from v4 RFC:
> > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > >    already present iova-tree for that.
> > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > >    different set of features with the device when enabled.
> > > > > * Support of host notifiers memory regions
> > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > >    different memory regions (qemu's VA chunks).
> > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > >    to implement it's way to enable vdpa.
> > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > * Better use of qemu error system
> > > > > * Make a few assertions proper error-handling paths.
> > > > > * Add more documentation
> > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > * Addressed many other small comments and small fixes.
> > > > >
> > > > > Changes from v3 RFC:
> > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > >      some cleanup but more code has been added in other places.
> > > > >    * More use of glib utilities, especially to manage memory.
> > > > > v3 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > >
> > > > > Changes from v2 RFC:
> > > > >    * Adding vhost-vdpa devices support
> > > > >    * Fixed some memory leaks pointed by different comments
> > > > > v2 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > >
> > > > > Changes from v1 RFC:
> > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > >      (vDPA)
> > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > >    * Use of proper methods for synchronization.
> > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > >      contained in vhost code.
> > > > >    * Delete superfluous code.
> > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > >      changes. It can be seen in
> > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > v1 link:
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > >
> > > > > Eugenio Pérez (20):
> > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > >        vhost: Add VhostShadowVirtqueue
> > > > >        vdpa: Register vdpa devices in a list
> > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > >        Add vhost_svq_get_svq_call_notifier
> > > > >        Add vhost_svq_set_guest_call_notifier
> > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > >        vdpa: Save host and guest features
> > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > >        vhost: Add VhostIOVATree
> > > > >        vhost: Use a tree to store memory mappings
> > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > >
> > > > > Eugenio Pérez (15):
> > > > >    vhost: Add VhostShadowVirtqueue
> > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > >    util: Add iova_tree_alloc_map
> > > > >    util: add iova_tree_find_iova
> > > > >    vhost: Add VhostIOVATree
> > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >
> > > > >   qapi/net.json                      |   8 +-
> > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > >   hw/virtio/meson.build              |   2 +-
> > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  7:34           ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  7:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > is intended as a new method of tracking the memory the devices touch
> > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > descriptors between VM and device. This way qemu is the effective
> > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > >
> > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > device to read and write into, and it maps new vrings and the guest
> > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > tracked.
> > > > >
> > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > from regular mode to SVQ mode is left for a future series.
> > > > >
> > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > >
> > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > features like event_idx.
> > > > >
> > > > > SVQ needs to be enabled with cmdline:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > >
> > > A stable API for an incomplete feature is a problem imho.
> >
> > It should be "x-svq".
>
>
> Well look at patch 15.

It's a bug that needs to be fixed.

>
> > >
> > >
> > > > >
> > > > > The first three patches enables notifications forwarding with
> > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > cmdline part of the last patch is applied on top of these.
> > > > >
> > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > address are not translated from HVA so they will need a host device with
> > > > > an iommu allowing them to access all of the HVA range.
> > > > >
> > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > creates a new iova address space in the device's range and translates
> > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > >
> > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > baseline average of ~9009.96Mbps:
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    9061.03
> > > > > 131072  16384  16384    30.01    8962.94
> > > > > 131072  16384  16384    30.01    9005.92
> > > > >
> > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    7689.72
> > > > > 131072  16384  16384    30.00    7752.07
> > > > > 131072  16384  16384    30.01    7750.30
> > > > >
> > > > > However, many performance improvements were left out of this series for
> > > > > simplicity, so difference should shrink in the future.
> > > > >
> > > > > Comments are welcome.
> > > >
> > > >
> > > > Hi Michael:
> > > >
> > > > What do you think of this series? It looks good to me as a start. The
> > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > try to make it for 7.0.
> > > >
> > > > Thanks
> > >
> > > Well that's cutting it awfully close, and it's not really useful
> > > at the current stage, is it?
> >
> > This allows vDPA to be migrated when using "x-svq=on".
> > But anyhow it's
> > experimental.
>
> it's less experimental than incomplete. It seems pretty clearly not
> the way it will work down the road, we don't want svq involved
> at all times.

Right, but SVQ could be used for other places e.g providing migration
compatibility when the destination lacks some features.

>
> > >
> > > The IOVA trick does not feel complete either.
> >
> > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > for shadow vq) did. So we won't suffer from the issues of DPDK.
> >
> > Thanks
>
> Maybe I misundrstand how this all works.
> I refer to all the iova_tree_alloc_map things.

It's a simple IOVA allocater actually. Anything wrong with that?

I'm fine with making it for the future release.

Thanks

>
> > >
> > > >
> > > > >
> > > > > TODO on future series:
> > > > > * Event, indirect, packed, and others features of virtio.
> > > > > * To support different set of features between the device<->SVQ and the
> > > > >    SVQ<->guest communication.
> > > > > * Support of device host notifier memory regions.
> > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > >    throw more threads to that task and we don't need to stop the main
> > > > >    event loop.
> > > > > * Support multiqueue virtio-net vdpa.
> > > > > * Proper documentation.
> > > > >
> > > > > Changes from v4:
> > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > >    overlaps.
> > > > > * Fix: Errno completion at failure.
> > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > >
> > > > > Changes from v3:
> > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > v3 link:
> > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > >
> > > > > Changes from v2:
> > > > > * Less assertions and more error handling in iova tree code.
> > > > > * Better documentation, both fixing errors and making @param: format
> > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > >    prefix at both times.
> > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > >    like the kernel driver code.
> > > > > * Small improvements.
> > > > > v2 link:
> > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > >
> > > > > Changes from v1:
> > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > >    negotiated.
> > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > * Make SVQ a public struct
> > > > > * Come back to previous approach to iova-tree
> > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > * Stop checking for features flags out of transport range.
> > > > > v1 link:
> > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > >
> > > > > Changes from v4 RFC:
> > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > >    already present iova-tree for that.
> > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > >    different set of features with the device when enabled.
> > > > > * Support of host notifiers memory regions
> > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > >    different memory regions (qemu's VA chunks).
> > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > >    to implement it's way to enable vdpa.
> > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > * Better use of qemu error system
> > > > > * Make a few assertions proper error-handling paths.
> > > > > * Add more documentation
> > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > * Addressed many other small comments and small fixes.
> > > > >
> > > > > Changes from v3 RFC:
> > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > >      some cleanup but more code has been added in other places.
> > > > >    * More use of glib utilities, especially to manage memory.
> > > > > v3 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > >
> > > > > Changes from v2 RFC:
> > > > >    * Adding vhost-vdpa devices support
> > > > >    * Fixed some memory leaks pointed by different comments
> > > > > v2 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > >
> > > > > Changes from v1 RFC:
> > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > >      (vDPA)
> > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > >    * Use of proper methods for synchronization.
> > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > >      contained in vhost code.
> > > > >    * Delete superfluous code.
> > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > >      changes. It can be seen in
> > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > v1 link:
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > >
> > > > > Eugenio Pérez (20):
> > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > >        vhost: Add VhostShadowVirtqueue
> > > > >        vdpa: Register vdpa devices in a list
> > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > >        Add vhost_svq_get_svq_call_notifier
> > > > >        Add vhost_svq_set_guest_call_notifier
> > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > >        vdpa: Save host and guest features
> > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > >        vhost: Add VhostIOVATree
> > > > >        vhost: Use a tree to store memory mappings
> > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > >
> > > > > Eugenio Pérez (15):
> > > > >    vhost: Add VhostShadowVirtqueue
> > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > >    util: Add iova_tree_alloc_map
> > > > >    util: add iova_tree_find_iova
> > > > >    vhost: Add VhostIOVATree
> > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >
> > > > >   qapi/net.json                      |   8 +-
> > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > >   hw/virtio/meson.build              |   2 +-
> > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > >
> > >
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:27         ` Michael S. Tsirkin
  (?)
  (?)
@ 2022-03-08  7:49         ` Eugenio Perez Martin
  -1 siblings, 0 replies; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08  7:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 8:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > is intended as a new method of tracking the memory the devices touch
> > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > descriptors between VM and device. This way qemu is the effective
> > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > >
> > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > device to read and write into, and it maps new vrings and the guest
> > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > tracked.
> > > > >
> > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > from regular mode to SVQ mode is left for a future series.
> > > > >
> > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > >
> > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > features like event_idx.
> > > > >
> > > > > SVQ needs to be enabled with cmdline:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > >
> > > A stable API for an incomplete feature is a problem imho.
> >
> > It should be "x-svq".
>
>
> Well look at patch 15.
>

(Adding here for completion)

I was pointed to name it svq and simply mark it as experimental using
the tag @unstable, since x- prefix was not homogeneous between stable
and unstable features.

> > >
> > >
> > > > >
> > > > > The first three patches enables notifications forwarding with
> > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > cmdline part of the last patch is applied on top of these.
> > > > >
> > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > address are not translated from HVA so they will need a host device with
> > > > > an iommu allowing them to access all of the HVA range.
> > > > >
> > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > creates a new iova address space in the device's range and translates
> > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > >
> > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > baseline average of ~9009.96Mbps:
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    9061.03
> > > > > 131072  16384  16384    30.01    8962.94
> > > > > 131072  16384  16384    30.01    9005.92
> > > > >
> > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    7689.72
> > > > > 131072  16384  16384    30.00    7752.07
> > > > > 131072  16384  16384    30.01    7750.30
> > > > >
> > > > > However, many performance improvements were left out of this series for
> > > > > simplicity, so difference should shrink in the future.
> > > > >
> > > > > Comments are welcome.
> > > >
> > > >
> > > > Hi Michael:
> > > >
> > > > What do you think of this series? It looks good to me as a start. The
> > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > try to make it for 7.0.
> > > >
> > > > Thanks
> > >
> > > Well that's cutting it awfully close, and it's not really useful
> > > at the current stage, is it?
> >
> > This allows vDPA to be migrated when using "x-svq=on".
> > But anyhow it's
> > experimental.
>
> it's less experimental than incomplete. It seems pretty clearly not
> the way it will work down the road, we don't want svq involved
> at all times.
>

That's right, but it is the intended way it works at migration time.
Just the switch from passthrough to migration time is missing, because
that part is discussed upstream in virtio. But it's pretty close to
the final form IMO: Both virtual and real devices have been already
tested with the switching, it's just it will make this series way
bigger and hard to review.

It already enables the migration of the simpler devices in its current
form, and the migration with all the machinery to enable svq
dynamically has been POC based on this.

> > >
> > > The IOVA trick does not feel complete either.
> >
> > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > for shadow vq) did. So we won't suffer from the issues of DPDK.
> >
> > Thanks
>
> Maybe I misundrstand how this all works.
> I refer to all the iova_tree_alloc_map things.
>

It allocates iova ranges any time the device needs to access memory,
either from guest or for qemu memory like shadow vrings, in the range
the device supports.

That part is pretty opaque and self-contained from the caller point of
view, so in my opinion it is not worth changing at the moment; If it
needs changes like more performance we can use a tree like previous
versions of this series. But I can give more details of it if needed.

Thanks!

> > >
> > > >
> > > > >
> > > > > TODO on future series:
> > > > > * Event, indirect, packed, and others features of virtio.
> > > > > * To support different set of features between the device<->SVQ and the
> > > > >    SVQ<->guest communication.
> > > > > * Support of device host notifier memory regions.
> > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > >    throw more threads to that task and we don't need to stop the main
> > > > >    event loop.
> > > > > * Support multiqueue virtio-net vdpa.
> > > > > * Proper documentation.
> > > > >
> > > > > Changes from v4:
> > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > >    overlaps.
> > > > > * Fix: Errno completion at failure.
> > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > >
> > > > > Changes from v3:
> > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > v3 link:
> > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > >
> > > > > Changes from v2:
> > > > > * Less assertions and more error handling in iova tree code.
> > > > > * Better documentation, both fixing errors and making @param: format
> > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > >    prefix at both times.
> > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > >    like the kernel driver code.
> > > > > * Small improvements.
> > > > > v2 link:
> > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > >
> > > > > Changes from v1:
> > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > >    negotiated.
> > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > * Make SVQ a public struct
> > > > > * Come back to previous approach to iova-tree
> > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > * Stop checking for features flags out of transport range.
> > > > > v1 link:
> > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > >
> > > > > Changes from v4 RFC:
> > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > >    already present iova-tree for that.
> > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > >    different set of features with the device when enabled.
> > > > > * Support of host notifiers memory regions
> > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > >    different memory regions (qemu's VA chunks).
> > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > >    to implement it's way to enable vdpa.
> > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > * Better use of qemu error system
> > > > > * Make a few assertions proper error-handling paths.
> > > > > * Add more documentation
> > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > * Addressed many other small comments and small fixes.
> > > > >
> > > > > Changes from v3 RFC:
> > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > >      some cleanup but more code has been added in other places.
> > > > >    * More use of glib utilities, especially to manage memory.
> > > > > v3 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > >
> > > > > Changes from v2 RFC:
> > > > >    * Adding vhost-vdpa devices support
> > > > >    * Fixed some memory leaks pointed by different comments
> > > > > v2 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > >
> > > > > Changes from v1 RFC:
> > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > >      (vDPA)
> > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > >    * Use of proper methods for synchronization.
> > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > >      contained in vhost code.
> > > > >    * Delete superfluous code.
> > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > >      changes. It can be seen in
> > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > v1 link:
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > >
> > > > > Eugenio Pérez (20):
> > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > >        vhost: Add VhostShadowVirtqueue
> > > > >        vdpa: Register vdpa devices in a list
> > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > >        Add vhost_svq_get_svq_call_notifier
> > > > >        Add vhost_svq_set_guest_call_notifier
> > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > >        vdpa: Save host and guest features
> > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > >        vhost: Add VhostIOVATree
> > > > >        vhost: Use a tree to store memory mappings
> > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > >
> > > > > Eugenio Pérez (15):
> > > > >    vhost: Add VhostShadowVirtqueue
> > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > >    util: Add iova_tree_alloc_map
> > > > >    util: add iova_tree_find_iova
> > > > >    vhost: Add VhostIOVATree
> > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >
> > > > >   qapi/net.json                      |   8 +-
> > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > >   hw/virtio/meson.build              |   2 +-
> > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > >
> > >
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:34           ` Jason Wang
@ 2022-03-08  7:55             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > >
> > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > tracked.
> > > > > >
> > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > >
> > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > >
> > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > features like event_idx.
> > > > > >
> > > > > > SVQ needs to be enabled with cmdline:
> > > > > >
> > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > >
> > > > A stable API for an incomplete feature is a problem imho.
> > >
> > > It should be "x-svq".
> >
> >
> > Well look at patch 15.
> 
> It's a bug that needs to be fixed.
> 
> >
> > > >
> > > >
> > > > > >
> > > > > > The first three patches enables notifications forwarding with
> > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > cmdline part of the last patch is applied on top of these.
> > > > > >
> > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > address are not translated from HVA so they will need a host device with
> > > > > > an iommu allowing them to access all of the HVA range.
> > > > > >
> > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > creates a new iova address space in the device's range and translates
> > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > >
> > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > baseline average of ~9009.96Mbps:
> > > > > > Recv   Send    Send
> > > > > > Socket Socket  Message  Elapsed
> > > > > > Size   Size    Size     Time     Throughput
> > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > 131072  16384  16384    30.01    9005.92
> > > > > >
> > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > Recv   Send    Send
> > > > > > Socket Socket  Message  Elapsed
> > > > > > Size   Size    Size     Time     Throughput
> > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > 131072  16384  16384    30.01    7750.30
> > > > > >
> > > > > > However, many performance improvements were left out of this series for
> > > > > > simplicity, so difference should shrink in the future.
> > > > > >
> > > > > > Comments are welcome.
> > > > >
> > > > >
> > > > > Hi Michael:
> > > > >
> > > > > What do you think of this series? It looks good to me as a start. The
> > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > try to make it for 7.0.
> > > > >
> > > > > Thanks
> > > >
> > > > Well that's cutting it awfully close, and it's not really useful
> > > > at the current stage, is it?
> > >
> > > This allows vDPA to be migrated when using "x-svq=on".
> > > But anyhow it's
> > > experimental.
> >
> > it's less experimental than incomplete. It seems pretty clearly not
> > the way it will work down the road, we don't want svq involved
> > at all times.
> 
> Right, but SVQ could be used for other places e.g providing migration
> compatibility when the destination lacks some features.

In its current form? I don't see how.  Generally? Maybe but I suspect
we'll have to rework it completely for that.

> >
> > > >
> > > > The IOVA trick does not feel complete either.
> > >
> > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > >
> > > Thanks
> >
> > Maybe I misundrstand how this all works.
> > I refer to all the iova_tree_alloc_map things.
> 
> It's a simple IOVA allocater actually. Anything wrong with that?

Not by itself but I'm not sure we can guarantee guest will not
attempt to use the IOVA addresses we are reserving down
the road.

> I'm fine with making it for the future release.
> 
> Thanks
> 
> >
> > > >
> > > > >
> > > > > >
> > > > > > TODO on future series:
> > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > * To support different set of features between the device<->SVQ and the
> > > > > >    SVQ<->guest communication.
> > > > > > * Support of device host notifier memory regions.
> > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > >    event loop.
> > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > * Proper documentation.
> > > > > >
> > > > > > Changes from v4:
> > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > >    overlaps.
> > > > > > * Fix: Errno completion at failure.
> > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > >
> > > > > > Changes from v3:
> > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > v3 link:
> > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > >
> > > > > > Changes from v2:
> > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > >    prefix at both times.
> > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > >    like the kernel driver code.
> > > > > > * Small improvements.
> > > > > > v2 link:
> > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > >
> > > > > > Changes from v1:
> > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > >    negotiated.
> > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > * Make SVQ a public struct
> > > > > > * Come back to previous approach to iova-tree
> > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > * Stop checking for features flags out of transport range.
> > > > > > v1 link:
> > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > >
> > > > > > Changes from v4 RFC:
> > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > >    already present iova-tree for that.
> > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > >    different set of features with the device when enabled.
> > > > > > * Support of host notifiers memory regions
> > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > >    different memory regions (qemu's VA chunks).
> > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > >    to implement it's way to enable vdpa.
> > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > * Better use of qemu error system
> > > > > > * Make a few assertions proper error-handling paths.
> > > > > > * Add more documentation
> > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > * Addressed many other small comments and small fixes.
> > > > > >
> > > > > > Changes from v3 RFC:
> > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > >      some cleanup but more code has been added in other places.
> > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > v3 link:
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > >
> > > > > > Changes from v2 RFC:
> > > > > >    * Adding vhost-vdpa devices support
> > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > v2 link:
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > >
> > > > > > Changes from v1 RFC:
> > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > >      (vDPA)
> > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > >    * Use of proper methods for synchronization.
> > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > >      contained in vhost code.
> > > > > >    * Delete superfluous code.
> > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > >      changes. It can be seen in
> > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > v1 link:
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > >
> > > > > > Eugenio Pérez (20):
> > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > >        vdpa: Register vdpa devices in a list
> > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > >        vdpa: Save host and guest features
> > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > >        vhost: Add VhostIOVATree
> > > > > >        vhost: Use a tree to store memory mappings
> > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > >
> > > > > > Eugenio Pérez (15):
> > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > >    util: Add iova_tree_alloc_map
> > > > > >    util: add iova_tree_find_iova
> > > > > >    vhost: Add VhostIOVATree
> > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > >
> > > > > >   qapi/net.json                      |   8 +-
> > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  7:55             ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  7:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > >
> > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > tracked.
> > > > > >
> > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > >
> > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > >
> > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > features like event_idx.
> > > > > >
> > > > > > SVQ needs to be enabled with cmdline:
> > > > > >
> > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > >
> > > > A stable API for an incomplete feature is a problem imho.
> > >
> > > It should be "x-svq".
> >
> >
> > Well look at patch 15.
> 
> It's a bug that needs to be fixed.
> 
> >
> > > >
> > > >
> > > > > >
> > > > > > The first three patches enables notifications forwarding with
> > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > cmdline part of the last patch is applied on top of these.
> > > > > >
> > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > address are not translated from HVA so they will need a host device with
> > > > > > an iommu allowing them to access all of the HVA range.
> > > > > >
> > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > creates a new iova address space in the device's range and translates
> > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > >
> > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > baseline average of ~9009.96Mbps:
> > > > > > Recv   Send    Send
> > > > > > Socket Socket  Message  Elapsed
> > > > > > Size   Size    Size     Time     Throughput
> > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > 131072  16384  16384    30.01    9005.92
> > > > > >
> > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > Recv   Send    Send
> > > > > > Socket Socket  Message  Elapsed
> > > > > > Size   Size    Size     Time     Throughput
> > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > 131072  16384  16384    30.01    7750.30
> > > > > >
> > > > > > However, many performance improvements were left out of this series for
> > > > > > simplicity, so difference should shrink in the future.
> > > > > >
> > > > > > Comments are welcome.
> > > > >
> > > > >
> > > > > Hi Michael:
> > > > >
> > > > > What do you think of this series? It looks good to me as a start. The
> > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > try to make it for 7.0.
> > > > >
> > > > > Thanks
> > > >
> > > > Well that's cutting it awfully close, and it's not really useful
> > > > at the current stage, is it?
> > >
> > > This allows vDPA to be migrated when using "x-svq=on".
> > > But anyhow it's
> > > experimental.
> >
> > it's less experimental than incomplete. It seems pretty clearly not
> > the way it will work down the road, we don't want svq involved
> > at all times.
> 
> Right, but SVQ could be used for other places e.g providing migration
> compatibility when the destination lacks some features.

In its current form? I don't see how.  Generally? Maybe but I suspect
we'll have to rework it completely for that.

> >
> > > >
> > > > The IOVA trick does not feel complete either.
> > >
> > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > >
> > > Thanks
> >
> > Maybe I misundrstand how this all works.
> > I refer to all the iova_tree_alloc_map things.
> 
> It's a simple IOVA allocater actually. Anything wrong with that?

Not by itself but I'm not sure we can guarantee guest will not
attempt to use the IOVA addresses we are reserving down
the road.

> I'm fine with making it for the future release.
> 
> Thanks
> 
> >
> > > >
> > > > >
> > > > > >
> > > > > > TODO on future series:
> > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > * To support different set of features between the device<->SVQ and the
> > > > > >    SVQ<->guest communication.
> > > > > > * Support of device host notifier memory regions.
> > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > >    event loop.
> > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > * Proper documentation.
> > > > > >
> > > > > > Changes from v4:
> > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > >    overlaps.
> > > > > > * Fix: Errno completion at failure.
> > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > >
> > > > > > Changes from v3:
> > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > v3 link:
> > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > >
> > > > > > Changes from v2:
> > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > >    prefix at both times.
> > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > >    like the kernel driver code.
> > > > > > * Small improvements.
> > > > > > v2 link:
> > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > >
> > > > > > Changes from v1:
> > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > >    negotiated.
> > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > * Make SVQ a public struct
> > > > > > * Come back to previous approach to iova-tree
> > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > * Stop checking for features flags out of transport range.
> > > > > > v1 link:
> > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > >
> > > > > > Changes from v4 RFC:
> > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > >    already present iova-tree for that.
> > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > >    different set of features with the device when enabled.
> > > > > > * Support of host notifiers memory regions
> > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > >    different memory regions (qemu's VA chunks).
> > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > >    to implement it's way to enable vdpa.
> > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > * Better use of qemu error system
> > > > > > * Make a few assertions proper error-handling paths.
> > > > > > * Add more documentation
> > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > * Addressed many other small comments and small fixes.
> > > > > >
> > > > > > Changes from v3 RFC:
> > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > >      some cleanup but more code has been added in other places.
> > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > v3 link:
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > >
> > > > > > Changes from v2 RFC:
> > > > > >    * Adding vhost-vdpa devices support
> > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > v2 link:
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > >
> > > > > > Changes from v1 RFC:
> > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > >      (vDPA)
> > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > >    * Use of proper methods for synchronization.
> > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > >      contained in vhost code.
> > > > > >    * Delete superfluous code.
> > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > >      changes. It can be seen in
> > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > v1 link:
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > >
> > > > > > Eugenio Pérez (20):
> > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > >        vdpa: Register vdpa devices in a list
> > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > >        vdpa: Save host and guest features
> > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > >        vhost: Add VhostIOVATree
> > > > > >        vhost: Use a tree to store memory mappings
> > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > >
> > > > > > Eugenio Pérez (15):
> > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > >    util: Add iova_tree_alloc_map
> > > > > >    util: add iova_tree_find_iova
> > > > > >    vhost: Add VhostIOVATree
> > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > >
> > > > > >   qapi/net.json                      |   8 +-
> > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > >
> > > >
> >



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-08  7:32     ` Eugenio Perez Martin
@ 2022-03-08  8:02         ` Michael S. Tsirkin
  2022-03-08  8:02         ` Michael S. Tsirkin
  1 sibling, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  8:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 08:32:07AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > > Finally offering the possibility to enable SVQ from the command line.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  qapi/net.json    |  8 +++++++-
> > >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > >  2 files changed, 47 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/qapi/net.json b/qapi/net.json
> > > index 7fab2e7cd8..d626fa441c 100644
> > > --- a/qapi/net.json
> > > +++ b/qapi/net.json
> > > @@ -445,12 +445,18 @@
> > >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> > >  #          (default: 1)
> > >  #
> > > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > > +#
> > > +# Features:
> > > +# @unstable: Member @svq is experimental.
> > > +#
> > >  # Since: 5.1
> > >  ##
> > >  { 'struct': 'NetdevVhostVDPAOptions',
> > >    'data': {
> > >      '*vhostdev':     'str',
> > > -    '*queues':       'int' } }
> > > +    '*queues':       'int',
> > > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> > >
> > >  ##
> > >  # @NetClientDriver:
> >
> > I think this should be x-svq same as other unstable features.
> >
> 
> I'm fine with both, but I was pointed to the other direction at [1] and [2].
> 
> Thanks!
> 
> [1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
> [2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/


I think what Markus didn't know is that a bunch of changes in
behaviour will occur before we rename it to "svq".
The rename is thus less of a bother more of a bonus.

> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 1e9fe47c03..c827921654 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -127,7 +127,11 @@ err_init:
> > >  static void vhost_vdpa_cleanup(NetClientState *nc)
> > >  {
> > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > >
> > > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > +    }
> > >      if (s->vhost_net) {
> > >          vhost_net_cleanup(s->vhost_net);
> > >          g_free(s->vhost_net);
> > > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > >  };
> > >
> > > +static int vhost_vdpa_get_iova_range(int fd,
> > > +                                     struct vhost_vdpa_iova_range *iova_range)
> > > +{
> > > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > > +
> > > +    return ret < 0 ? -errno : 0;
> > > +}
> > > +
> > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > -                                           const char *device,
> > > -                                           const char *name,
> > > -                                           int vdpa_device_fd,
> > > -                                           int queue_pair_index,
> > > -                                           int nvqs,
> > > -                                           bool is_datapath)
> > > +                                       const char *device,
> > > +                                       const char *name,
> > > +                                       int vdpa_device_fd,
> > > +                                       int queue_pair_index,
> > > +                                       int nvqs,
> > > +                                       bool is_datapath,
> > > +                                       bool svq,
> > > +                                       VhostIOVATree *iova_tree)
> > >  {
> > >      NetClientState *nc = NULL;
> > >      VhostVDPAState *s;
> > > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >
> > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > >      s->vhost_vdpa.index = queue_pair_index;
> > > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > +    s->vhost_vdpa.iova_tree = iova_tree;
> > >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > >      if (ret) {
> > >          qemu_del_net_client(nc);
> > > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >      g_autofree NetClientState **ncs = NULL;
> > >      NetClientState *nc;
> > >      int queue_pairs, i, has_cvq = 0;
> > > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > >
> > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >      opts = &netdev->u.vhost_vdpa;
> > > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >          qemu_close(vdpa_device_fd);
> > >          return queue_pairs;
> > >      }
> > > +    if (opts->svq) {
> > > +        struct vhost_vdpa_iova_range iova_range;
> > > +
> > > +        if (has_cvq) {
> > > +            error_setg(errp, "vdpa svq does not work with cvq");
> > > +            goto err_svq;
> > > +        }
> > > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > > +    }
> > >
> > >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > >
> > >      for (i = 0; i < queue_pairs; i++) {
> > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                     vdpa_device_fd, i, 2, true);
> > > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > > +                                     iova_tree);
> > >          if (!ncs[i])
> > >              goto err;
> > >      }
> > >
> > >      if (has_cvq) {
> > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                 vdpa_device_fd, i, 1, false);
> > > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > > +                                 iova_tree);
> > >          if (!nc)
> > >              goto err;
> > >      }
> > >
> > > +    iova_tree = NULL;
> > >      return 0;
> > >
> > >  err:
> > >      if (i) {
> > >          qemu_del_net_client(ncs[0]);
> > >      }
> > > +
> > > +err_svq:
> > >      qemu_close(vdpa_device_fd);
> > >
> > >      return -1;
> > > --
> > > 2.27.0
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
@ 2022-03-08  8:02         ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  8:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, qemu-level, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 08:32:07AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > > Finally offering the possibility to enable SVQ from the command line.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  qapi/net.json    |  8 +++++++-
> > >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > >  2 files changed, 47 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/qapi/net.json b/qapi/net.json
> > > index 7fab2e7cd8..d626fa441c 100644
> > > --- a/qapi/net.json
> > > +++ b/qapi/net.json
> > > @@ -445,12 +445,18 @@
> > >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> > >  #          (default: 1)
> > >  #
> > > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > > +#
> > > +# Features:
> > > +# @unstable: Member @svq is experimental.
> > > +#
> > >  # Since: 5.1
> > >  ##
> > >  { 'struct': 'NetdevVhostVDPAOptions',
> > >    'data': {
> > >      '*vhostdev':     'str',
> > > -    '*queues':       'int' } }
> > > +    '*queues':       'int',
> > > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> > >
> > >  ##
> > >  # @NetClientDriver:
> >
> > I think this should be x-svq same as other unstable features.
> >
> 
> I'm fine with both, but I was pointed to the other direction at [1] and [2].
> 
> Thanks!
> 
> [1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
> [2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/


I think what Markus didn't know is that a bunch of changes in
behaviour will occur before we rename it to "svq".
The rename is thus less of a bother more of a bonus.

> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 1e9fe47c03..c827921654 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -127,7 +127,11 @@ err_init:
> > >  static void vhost_vdpa_cleanup(NetClientState *nc)
> > >  {
> > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > >
> > > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > +    }
> > >      if (s->vhost_net) {
> > >          vhost_net_cleanup(s->vhost_net);
> > >          g_free(s->vhost_net);
> > > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > >  };
> > >
> > > +static int vhost_vdpa_get_iova_range(int fd,
> > > +                                     struct vhost_vdpa_iova_range *iova_range)
> > > +{
> > > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > > +
> > > +    return ret < 0 ? -errno : 0;
> > > +}
> > > +
> > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > -                                           const char *device,
> > > -                                           const char *name,
> > > -                                           int vdpa_device_fd,
> > > -                                           int queue_pair_index,
> > > -                                           int nvqs,
> > > -                                           bool is_datapath)
> > > +                                       const char *device,
> > > +                                       const char *name,
> > > +                                       int vdpa_device_fd,
> > > +                                       int queue_pair_index,
> > > +                                       int nvqs,
> > > +                                       bool is_datapath,
> > > +                                       bool svq,
> > > +                                       VhostIOVATree *iova_tree)
> > >  {
> > >      NetClientState *nc = NULL;
> > >      VhostVDPAState *s;
> > > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >
> > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > >      s->vhost_vdpa.index = queue_pair_index;
> > > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > +    s->vhost_vdpa.iova_tree = iova_tree;
> > >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > >      if (ret) {
> > >          qemu_del_net_client(nc);
> > > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >      g_autofree NetClientState **ncs = NULL;
> > >      NetClientState *nc;
> > >      int queue_pairs, i, has_cvq = 0;
> > > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > >
> > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > >      opts = &netdev->u.vhost_vdpa;
> > > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > >          qemu_close(vdpa_device_fd);
> > >          return queue_pairs;
> > >      }
> > > +    if (opts->svq) {
> > > +        struct vhost_vdpa_iova_range iova_range;
> > > +
> > > +        if (has_cvq) {
> > > +            error_setg(errp, "vdpa svq does not work with cvq");
> > > +            goto err_svq;
> > > +        }
> > > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > > +    }
> > >
> > >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > >
> > >      for (i = 0; i < queue_pairs; i++) {
> > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                     vdpa_device_fd, i, 2, true);
> > > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > > +                                     iova_tree);
> > >          if (!ncs[i])
> > >              goto err;
> > >      }
> > >
> > >      if (has_cvq) {
> > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > -                                 vdpa_device_fd, i, 1, false);
> > > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > > +                                 iova_tree);
> > >          if (!nc)
> > >              goto err;
> > >      }
> > >
> > > +    iova_tree = NULL;
> > >      return 0;
> > >
> > >  err:
> > >      if (i) {
> > >          qemu_del_net_client(ncs[0]);
> > >      }
> > > +
> > > +err_svq:
> > >      qemu_close(vdpa_device_fd);
> > >
> > >      return -1;
> > > --
> > > 2.27.0
> >



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:55             ` Michael S. Tsirkin
  (?)
@ 2022-03-08  8:15             ` Eugenio Perez Martin
  2022-03-08  8:19                 ` Michael S. Tsirkin
  -1 siblings, 1 reply; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 8:55 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > >
> > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > tracked.
> > > > > > >
> > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > >
> > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > >
> > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > features like event_idx.
> > > > > > >
> > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > >
> > > > > A stable API for an incomplete feature is a problem imho.
> > > >
> > > > It should be "x-svq".
> > >
> > >
> > > Well look at patch 15.
> >
> > It's a bug that needs to be fixed.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The first three patches enables notifications forwarding with
> > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > >
> > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > >
> > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > >
> > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > >
> > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > >
> > > > > > > However, many performance improvements were left out of this series for
> > > > > > > simplicity, so difference should shrink in the future.
> > > > > > >
> > > > > > > Comments are welcome.
> > > > > >
> > > > > >
> > > > > > Hi Michael:
> > > > > >
> > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > try to make it for 7.0.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > at the current stage, is it?
> > > >
> > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > But anyhow it's
> > > > experimental.
> > >
> > > it's less experimental than incomplete. It seems pretty clearly not
> > > the way it will work down the road, we don't want svq involved
> > > at all times.
> >
> > Right, but SVQ could be used for other places e.g providing migration
> > compatibility when the destination lacks some features.
>
> In its current form? I don't see how.  Generally? Maybe but I suspect
> we'll have to rework it completely for that.
>

RFCs of the series already do that: guest to SVQ communication
supported indirect descriptors and packed virtqueue but SVQ to device
did not. Even SVQ vring size could be different from guest's vring
size. It's not a big diff actually, I can send it as a RFC on top of
this to show it.

But that part was left out for simplicity and added as a TODO, so it
just negotiates one set of features instead of two.

> > >
> > > > >
> > > > > The IOVA trick does not feel complete either.
> > > >
> > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > >
> > > > Thanks
> > >
> > > Maybe I misundrstand how this all works.
> > > I refer to all the iova_tree_alloc_map things.
> >
> > It's a simple IOVA allocater actually. Anything wrong with that?
>
> Not by itself but I'm not sure we can guarantee guest will not
> attempt to use the IOVA addresses we are reserving down
> the road.
>

The SVQ vring (the one that device's see) does not use GPA addresses
anymore, but this new iova space. If the guest tries to expose a
writable buffer with the address of SVQ vring, VirtQueue would refuse
to translate it, generating an error. The same way it would generate
an error with emulated devices.

If we hot-plug more physical memory to the guest, this new range is
added as a totally new iova entry, which cannot collide with previous
entries, both GPA and SVQ. Same thing happens if the vhost device is
added or removed, or if the guest reset or start (set DRIVER_OK) the
device.

Thanks!

> > I'm fine with making it for the future release.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO on future series:
> > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > >    SVQ<->guest communication.
> > > > > > > * Support of device host notifier memory regions.
> > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > >    event loop.
> > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > * Proper documentation.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > >    overlaps.
> > > > > > > * Fix: Errno completion at failure.
> > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > v3 link:
> > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > >    prefix at both times.
> > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > >    like the kernel driver code.
> > > > > > > * Small improvements.
> > > > > > > v2 link:
> > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > >    negotiated.
> > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > * Make SVQ a public struct
> > > > > > > * Come back to previous approach to iova-tree
> > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > v1 link:
> > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > >
> > > > > > > Changes from v4 RFC:
> > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > >    already present iova-tree for that.
> > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > >    different set of features with the device when enabled.
> > > > > > > * Support of host notifiers memory regions
> > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > >    to implement it's way to enable vdpa.
> > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > * Better use of qemu error system
> > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > * Add more documentation
> > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > * Addressed many other small comments and small fixes.
> > > > > > >
> > > > > > > Changes from v3 RFC:
> > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > >      some cleanup but more code has been added in other places.
> > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > v3 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > >
> > > > > > > Changes from v2 RFC:
> > > > > > >    * Adding vhost-vdpa devices support
> > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > v2 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > >
> > > > > > > Changes from v1 RFC:
> > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > >      (vDPA)
> > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > >    * Use of proper methods for synchronization.
> > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > >      contained in vhost code.
> > > > > > >    * Delete superfluous code.
> > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > >      changes. It can be seen in
> > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > v1 link:
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > >
> > > > > > > Eugenio Pérez (20):
> > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > >        vdpa: Save host and guest features
> > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > >        vhost: Add VhostIOVATree
> > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > >
> > > > > > > Eugenio Pérez (15):
> > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > >    util: Add iova_tree_alloc_map
> > > > > > >    util: add iova_tree_find_iova
> > > > > > >    vhost: Add VhostIOVATree
> > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >
> > > > > > >   qapi/net.json                      |   8 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > >
> > > > >
> > >
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  8:15             ` Eugenio Perez Martin
@ 2022-03-08  8:19                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  8:19 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 09:15:37AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 8:55 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > > >
> > > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > > >
> > > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > > tracked.
> > > > > > > >
> > > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > > >
> > > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > > >
> > > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > > features like event_idx.
> > > > > > > >
> > > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > > >
> > > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > > >
> > > > > > A stable API for an incomplete feature is a problem imho.
> > > > >
> > > > > It should be "x-svq".
> > > >
> > > >
> > > > Well look at patch 15.
> > >
> > > It's a bug that needs to be fixed.
> > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > The first three patches enables notifications forwarding with
> > > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > > >
> > > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > > >
> > > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > > >
> > > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > > Recv   Send    Send
> > > > > > > > Socket Socket  Message  Elapsed
> > > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > > >
> > > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > > Recv   Send    Send
> > > > > > > > Socket Socket  Message  Elapsed
> > > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > > >
> > > > > > > > However, many performance improvements were left out of this series for
> > > > > > > > simplicity, so difference should shrink in the future.
> > > > > > > >
> > > > > > > > Comments are welcome.
> > > > > > >
> > > > > > >
> > > > > > > Hi Michael:
> > > > > > >
> > > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > > try to make it for 7.0.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > > at the current stage, is it?
> > > > >
> > > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > > But anyhow it's
> > > > > experimental.
> > > >
> > > > it's less experimental than incomplete. It seems pretty clearly not
> > > > the way it will work down the road, we don't want svq involved
> > > > at all times.
> > >
> > > Right, but SVQ could be used for other places e.g providing migration
> > > compatibility when the destination lacks some features.
> >
> > In its current form? I don't see how.  Generally? Maybe but I suspect
> > we'll have to rework it completely for that.
> >
> 
> RFCs of the series already do that: guest to SVQ communication
> supported indirect descriptors and packed virtqueue but SVQ to device
> did not. Even SVQ vring size could be different from guest's vring
> size. It's not a big diff actually, I can send it as a RFC on top of
> this to show it.
> 
> But that part was left out for simplicity and added as a TODO, so it
> just negotiates one set of features instead of two.
> 
> > > >
> > > > > >
> > > > > > The IOVA trick does not feel complete either.
> > > > >
> > > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > > >
> > > > > Thanks
> > > >
> > > > Maybe I misundrstand how this all works.
> > > > I refer to all the iova_tree_alloc_map things.
> > >
> > > It's a simple IOVA allocater actually. Anything wrong with that?
> >
> > Not by itself but I'm not sure we can guarantee guest will not
> > attempt to use the IOVA addresses we are reserving down
> > the road.
> >
> 
> The SVQ vring (the one that device's see) does not use GPA addresses
> anymore, but this new iova space. If the guest tries to expose a
> writable buffer with the address of SVQ vring, VirtQueue would refuse
> to translate it, generating an error. The same way it would generate
> an error with emulated devices.

Right. But guests are not really set up to handle such
errors except by failing the transaction, are they?



> If we hot-plug more physical memory to the guest, this new range is
> added as a totally new iova entry, which cannot collide with previous
> entries, both GPA and SVQ. Same thing happens if the vhost device is
> added or removed, or if the guest reset or start (set DRIVER_OK) the
> device.
> 
> Thanks!
> 
> > > I'm fine with making it for the future release.
> > >
> > > Thanks
> > >
> > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > TODO on future series:
> > > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > > >    SVQ<->guest communication.
> > > > > > > > * Support of device host notifier memory regions.
> > > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > > >    event loop.
> > > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > > * Proper documentation.
> > > > > > > >
> > > > > > > > Changes from v4:
> > > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > > >    overlaps.
> > > > > > > > * Fix: Errno completion at failure.
> > > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > > >
> > > > > > > > Changes from v3:
> > > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > > v3 link:
> > > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > > >
> > > > > > > > Changes from v2:
> > > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > > >    prefix at both times.
> > > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > > >    like the kernel driver code.
> > > > > > > > * Small improvements.
> > > > > > > > v2 link:
> > > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > > >
> > > > > > > > Changes from v1:
> > > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > > >    negotiated.
> > > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > > * Make SVQ a public struct
> > > > > > > > * Come back to previous approach to iova-tree
> > > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > > v1 link:
> > > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > > >
> > > > > > > > Changes from v4 RFC:
> > > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > > >    already present iova-tree for that.
> > > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > > >    different set of features with the device when enabled.
> > > > > > > > * Support of host notifiers memory regions
> > > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > > >    to implement it's way to enable vdpa.
> > > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > > * Better use of qemu error system
> > > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > > * Add more documentation
> > > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > > * Addressed many other small comments and small fixes.
> > > > > > > >
> > > > > > > > Changes from v3 RFC:
> > > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > > >      some cleanup but more code has been added in other places.
> > > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > > v3 link:
> > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > > >
> > > > > > > > Changes from v2 RFC:
> > > > > > > >    * Adding vhost-vdpa devices support
> > > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > > v2 link:
> > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > > >
> > > > > > > > Changes from v1 RFC:
> > > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > > >      (vDPA)
> > > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > > >    * Use of proper methods for synchronization.
> > > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > > >      contained in vhost code.
> > > > > > > >    * Delete superfluous code.
> > > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > > >      changes. It can be seen in
> > > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > > v1 link:
> > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > > >
> > > > > > > > Eugenio Pérez (20):
> > > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > > >        vdpa: Save host and guest features
> > > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > > >        vhost: Add VhostIOVATree
> > > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > > >
> > > > > > > > Eugenio Pérez (15):
> > > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > > >    util: Add iova_tree_alloc_map
> > > > > > > >    util: add iova_tree_find_iova
> > > > > > > >    vhost: Add VhostIOVATree
> > > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > >
> > > > > > > >   qapi/net.json                      |   8 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > >
> > > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  8:19                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08  8:19 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 09:15:37AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 8:55 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > > >
> > > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > > >
> > > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > > tracked.
> > > > > > > >
> > > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > > >
> > > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > > >
> > > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > > features like event_idx.
> > > > > > > >
> > > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > > >
> > > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > > >
> > > > > > A stable API for an incomplete feature is a problem imho.
> > > > >
> > > > > It should be "x-svq".
> > > >
> > > >
> > > > Well look at patch 15.
> > >
> > > It's a bug that needs to be fixed.
> > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > The first three patches enables notifications forwarding with
> > > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > > >
> > > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > > >
> > > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > > >
> > > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > > Recv   Send    Send
> > > > > > > > Socket Socket  Message  Elapsed
> > > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > > >
> > > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > > Recv   Send    Send
> > > > > > > > Socket Socket  Message  Elapsed
> > > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > > >
> > > > > > > > However, many performance improvements were left out of this series for
> > > > > > > > simplicity, so difference should shrink in the future.
> > > > > > > >
> > > > > > > > Comments are welcome.
> > > > > > >
> > > > > > >
> > > > > > > Hi Michael:
> > > > > > >
> > > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > > try to make it for 7.0.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > > at the current stage, is it?
> > > > >
> > > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > > But anyhow it's
> > > > > experimental.
> > > >
> > > > it's less experimental than incomplete. It seems pretty clearly not
> > > > the way it will work down the road, we don't want svq involved
> > > > at all times.
> > >
> > > Right, but SVQ could be used for other places e.g providing migration
> > > compatibility when the destination lacks some features.
> >
> > In its current form? I don't see how.  Generally? Maybe but I suspect
> > we'll have to rework it completely for that.
> >
> 
> RFCs of the series already do that: guest to SVQ communication
> supported indirect descriptors and packed virtqueue but SVQ to device
> did not. Even SVQ vring size could be different from guest's vring
> size. It's not a big diff actually, I can send it as a RFC on top of
> this to show it.
> 
> But that part was left out for simplicity and added as a TODO, so it
> just negotiates one set of features instead of two.
> 
> > > >
> > > > > >
> > > > > > The IOVA trick does not feel complete either.
> > > > >
> > > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > > >
> > > > > Thanks
> > > >
> > > > Maybe I misundrstand how this all works.
> > > > I refer to all the iova_tree_alloc_map things.
> > >
> > > It's a simple IOVA allocater actually. Anything wrong with that?
> >
> > Not by itself but I'm not sure we can guarantee guest will not
> > attempt to use the IOVA addresses we are reserving down
> > the road.
> >
> 
> The SVQ vring (the one that device's see) does not use GPA addresses
> anymore, but this new iova space. If the guest tries to expose a
> writable buffer with the address of SVQ vring, VirtQueue would refuse
> to translate it, generating an error. The same way it would generate
> an error with emulated devices.

Right. But guests are not really set up to handle such
errors except by failing the transaction, are they?



> If we hot-plug more physical memory to the guest, this new range is
> added as a totally new iova entry, which cannot collide with previous
> entries, both GPA and SVQ. Same thing happens if the vhost device is
> added or removed, or if the guest reset or start (set DRIVER_OK) the
> device.
> 
> Thanks!
> 
> > > I'm fine with making it for the future release.
> > >
> > > Thanks
> > >
> > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > TODO on future series:
> > > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > > >    SVQ<->guest communication.
> > > > > > > > * Support of device host notifier memory regions.
> > > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > > >    event loop.
> > > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > > * Proper documentation.
> > > > > > > >
> > > > > > > > Changes from v4:
> > > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > > >    overlaps.
> > > > > > > > * Fix: Errno completion at failure.
> > > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > > >
> > > > > > > > Changes from v3:
> > > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > > v3 link:
> > > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > > >
> > > > > > > > Changes from v2:
> > > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > > >    prefix at both times.
> > > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > > >    like the kernel driver code.
> > > > > > > > * Small improvements.
> > > > > > > > v2 link:
> > > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > > >
> > > > > > > > Changes from v1:
> > > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > > >    negotiated.
> > > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > > * Make SVQ a public struct
> > > > > > > > * Come back to previous approach to iova-tree
> > > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > > v1 link:
> > > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > > >
> > > > > > > > Changes from v4 RFC:
> > > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > > >    already present iova-tree for that.
> > > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > > >    different set of features with the device when enabled.
> > > > > > > > * Support of host notifiers memory regions
> > > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > > >    to implement it's way to enable vdpa.
> > > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > > * Better use of qemu error system
> > > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > > * Add more documentation
> > > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > > * Addressed many other small comments and small fixes.
> > > > > > > >
> > > > > > > > Changes from v3 RFC:
> > > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > > >      some cleanup but more code has been added in other places.
> > > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > > v3 link:
> > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > > >
> > > > > > > > Changes from v2 RFC:
> > > > > > > >    * Adding vhost-vdpa devices support
> > > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > > v2 link:
> > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > > >
> > > > > > > > Changes from v1 RFC:
> > > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > > >      (vDPA)
> > > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > > >    * Use of proper methods for synchronization.
> > > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > > >      contained in vhost code.
> > > > > > > >    * Delete superfluous code.
> > > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > > >      changes. It can be seen in
> > > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > > v1 link:
> > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > > >
> > > > > > > > Eugenio Pérez (20):
> > > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > > >        vdpa: Save host and guest features
> > > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > > >        vhost: Add VhostIOVATree
> > > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > > >
> > > > > > > > Eugenio Pérez (15):
> > > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > > >    util: Add iova_tree_alloc_map
> > > > > > > >    util: add iova_tree_find_iova
> > > > > > > >    vhost: Add VhostIOVATree
> > > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > >
> > > > > > > >   qapi/net.json                      |   8 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > >
> > > > > >
> > > >
> >



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  7:55             ` Michael S. Tsirkin
@ 2022-03-08  8:20               ` Jason Wang
  -1 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 3:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > >
> > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > tracked.
> > > > > > >
> > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > >
> > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > >
> > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > features like event_idx.
> > > > > > >
> > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > >
> > > > > A stable API for an incomplete feature is a problem imho.
> > > >
> > > > It should be "x-svq".
> > >
> > >
> > > Well look at patch 15.
> >
> > It's a bug that needs to be fixed.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The first three patches enables notifications forwarding with
> > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > >
> > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > >
> > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > >
> > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > >
> > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > >
> > > > > > > However, many performance improvements were left out of this series for
> > > > > > > simplicity, so difference should shrink in the future.
> > > > > > >
> > > > > > > Comments are welcome.
> > > > > >
> > > > > >
> > > > > > Hi Michael:
> > > > > >
> > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > try to make it for 7.0.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > at the current stage, is it?
> > > >
> > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > But anyhow it's
> > > > experimental.
> > >
> > > it's less experimental than incomplete. It seems pretty clearly not
> > > the way it will work down the road, we don't want svq involved
> > > at all times.
> >
> > Right, but SVQ could be used for other places e.g providing migration
> > compatibility when the destination lacks some features.
>
> In its current form? I don't see how.  Generally?

Generally, yes.

> Maybe but I suspect
> we'll have to rework it completely for that.

Probably not, from what I see, it just needs some extension of the current code.

>
> > >
> > > > >
> > > > > The IOVA trick does not feel complete either.
> > > >
> > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > >
> > > > Thanks
> > >
> > > Maybe I misundrstand how this all works.
> > > I refer to all the iova_tree_alloc_map things.
> >
> > It's a simple IOVA allocater actually. Anything wrong with that?
>
> Not by itself but I'm not sure we can guarantee guest will not
> attempt to use the IOVA addresses we are reserving down
> the road.

The IOVA is allocated via the listeners and stored in the iova tree
per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
virtio core see GPA to HVA mapping. And we do a reverse lookup to find
the HVA->IOVA we allocated previously.  So we have double check here:

1) Qemu memory core to make sure the GPA that guest uses is valid
2) the IOVA tree that guarantees there will be no HVA beyond what
guest can see is used

So technically, there's no way for the guest to use the IOVA address
allocated for the shadow virtqueue.

Thanks

>
> > I'm fine with making it for the future release.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO on future series:
> > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > >    SVQ<->guest communication.
> > > > > > > * Support of device host notifier memory regions.
> > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > >    event loop.
> > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > * Proper documentation.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > >    overlaps.
> > > > > > > * Fix: Errno completion at failure.
> > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > v3 link:
> > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > >    prefix at both times.
> > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > >    like the kernel driver code.
> > > > > > > * Small improvements.
> > > > > > > v2 link:
> > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > >    negotiated.
> > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > * Make SVQ a public struct
> > > > > > > * Come back to previous approach to iova-tree
> > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > v1 link:
> > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > >
> > > > > > > Changes from v4 RFC:
> > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > >    already present iova-tree for that.
> > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > >    different set of features with the device when enabled.
> > > > > > > * Support of host notifiers memory regions
> > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > >    to implement it's way to enable vdpa.
> > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > * Better use of qemu error system
> > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > * Add more documentation
> > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > * Addressed many other small comments and small fixes.
> > > > > > >
> > > > > > > Changes from v3 RFC:
> > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > >      some cleanup but more code has been added in other places.
> > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > v3 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > >
> > > > > > > Changes from v2 RFC:
> > > > > > >    * Adding vhost-vdpa devices support
> > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > v2 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > >
> > > > > > > Changes from v1 RFC:
> > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > >      (vDPA)
> > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > >    * Use of proper methods for synchronization.
> > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > >      contained in vhost code.
> > > > > > >    * Delete superfluous code.
> > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > >      changes. It can be seen in
> > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > v1 link:
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > >
> > > > > > > Eugenio Pérez (20):
> > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > >        vdpa: Save host and guest features
> > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > >        vhost: Add VhostIOVATree
> > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > >
> > > > > > > Eugenio Pérez (15):
> > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > >    util: Add iova_tree_alloc_map
> > > > > > >    util: add iova_tree_find_iova
> > > > > > >    vhost: Add VhostIOVATree
> > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >
> > > > > > >   qapi/net.json                      |   8 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > >
> > > > >
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08  8:20               ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 3:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > >
> > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > tracked.
> > > > > > >
> > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > >
> > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > >
> > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > features like event_idx.
> > > > > > >
> > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > >
> > > > > A stable API for an incomplete feature is a problem imho.
> > > >
> > > > It should be "x-svq".
> > >
> > >
> > > Well look at patch 15.
> >
> > It's a bug that needs to be fixed.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The first three patches enables notifications forwarding with
> > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > >
> > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > >
> > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > >
> > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > >
> > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > >
> > > > > > > However, many performance improvements were left out of this series for
> > > > > > > simplicity, so difference should shrink in the future.
> > > > > > >
> > > > > > > Comments are welcome.
> > > > > >
> > > > > >
> > > > > > Hi Michael:
> > > > > >
> > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > try to make it for 7.0.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > at the current stage, is it?
> > > >
> > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > But anyhow it's
> > > > experimental.
> > >
> > > it's less experimental than incomplete. It seems pretty clearly not
> > > the way it will work down the road, we don't want svq involved
> > > at all times.
> >
> > Right, but SVQ could be used for other places e.g providing migration
> > compatibility when the destination lacks some features.
>
> In its current form? I don't see how.  Generally?

Generally, yes.

> Maybe but I suspect
> we'll have to rework it completely for that.

Probably not, from what I see, it just needs some extension of the current code.

>
> > >
> > > > >
> > > > > The IOVA trick does not feel complete either.
> > > >
> > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > >
> > > > Thanks
> > >
> > > Maybe I misundrstand how this all works.
> > > I refer to all the iova_tree_alloc_map things.
> >
> > It's a simple IOVA allocater actually. Anything wrong with that?
>
> Not by itself but I'm not sure we can guarantee guest will not
> attempt to use the IOVA addresses we are reserving down
> the road.

The IOVA is allocated via the listeners and stored in the iova tree
per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
virtio core see GPA to HVA mapping. And we do a reverse lookup to find
the HVA->IOVA we allocated previously.  So we have double check here:

1) Qemu memory core to make sure the GPA that guest uses is valid
2) the IOVA tree that guarantees there will be no HVA beyond what
guest can see is used

So technically, there's no way for the guest to use the IOVA address
allocated for the shadow virtqueue.

Thanks

>
> > I'm fine with making it for the future release.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO on future series:
> > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > >    SVQ<->guest communication.
> > > > > > > * Support of device host notifier memory regions.
> > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > >    event loop.
> > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > * Proper documentation.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > >    overlaps.
> > > > > > > * Fix: Errno completion at failure.
> > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > v3 link:
> > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > >    prefix at both times.
> > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > >    like the kernel driver code.
> > > > > > > * Small improvements.
> > > > > > > v2 link:
> > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > >    negotiated.
> > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > * Make SVQ a public struct
> > > > > > > * Come back to previous approach to iova-tree
> > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > v1 link:
> > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > >
> > > > > > > Changes from v4 RFC:
> > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > >    already present iova-tree for that.
> > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > >    different set of features with the device when enabled.
> > > > > > > * Support of host notifiers memory regions
> > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > >    to implement it's way to enable vdpa.
> > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > * Better use of qemu error system
> > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > * Add more documentation
> > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > * Addressed many other small comments and small fixes.
> > > > > > >
> > > > > > > Changes from v3 RFC:
> > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > >      some cleanup but more code has been added in other places.
> > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > v3 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > >
> > > > > > > Changes from v2 RFC:
> > > > > > >    * Adding vhost-vdpa devices support
> > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > v2 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > >
> > > > > > > Changes from v1 RFC:
> > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > >      (vDPA)
> > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > >    * Use of proper methods for synchronization.
> > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > >      contained in vhost code.
> > > > > > >    * Delete superfluous code.
> > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > >      changes. It can be seen in
> > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > v1 link:
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > >
> > > > > > > Eugenio Pérez (20):
> > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > >        vdpa: Save host and guest features
> > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > >        vhost: Add VhostIOVATree
> > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > >
> > > > > > > Eugenio Pérez (15):
> > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > >    util: Add iova_tree_alloc_map
> > > > > > >    util: add iova_tree_find_iova
> > > > > > >    vhost: Add VhostIOVATree
> > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >
> > > > > > >   qapi/net.json                      |   8 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > >
> > > > >
> > >
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-08  8:02         ` Michael S. Tsirkin
  (?)
@ 2022-03-08  8:24         ` Eugenio Perez Martin
  2022-03-08 12:31             ` Michael S. Tsirkin
  -1 siblings, 1 reply; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08  8:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-level, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 9:02 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 08:32:07AM +0100, Eugenio Perez Martin wrote:
> > On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > > > Finally offering the possibility to enable SVQ from the command line.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > >  qapi/net.json    |  8 +++++++-
> > > >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > > >  2 files changed, 47 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > index 7fab2e7cd8..d626fa441c 100644
> > > > --- a/qapi/net.json
> > > > +++ b/qapi/net.json
> > > > @@ -445,12 +445,18 @@
> > > >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> > > >  #          (default: 1)
> > > >  #
> > > > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > > > +#
> > > > +# Features:
> > > > +# @unstable: Member @svq is experimental.
> > > > +#
> > > >  # Since: 5.1
> > > >  ##
> > > >  { 'struct': 'NetdevVhostVDPAOptions',
> > > >    'data': {
> > > >      '*vhostdev':     'str',
> > > > -    '*queues':       'int' } }
> > > > +    '*queues':       'int',
> > > > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> > > >
> > > >  ##
> > > >  # @NetClientDriver:
> > >
> > > I think this should be x-svq same as other unstable features.
> > >
> >
> > I'm fine with both, but I was pointed to the other direction at [1] and [2].
> >
> > Thanks!
> >
> > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
> > [2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/
>
>
> I think what Markus didn't know is that a bunch of changes in
> behaviour will occur before we rename it to "svq".
> The rename is thus less of a bother more of a bonus.
>

I'm totally fine with going back to x-svq. I'm not sure if it's more
appropriate to do different modes of different parameters (svq=off,
dynamic-svq=on) or different modes of the same parameter (svq=on vs
svq=on_migration). Or something totally different.

My impression is that all of the changes are covered with @unstable
but I can see the advantage of x- prefix since we have not come to an
agreement on it. I think it's the first time it is mentioned in the
mail list.

Do you want me to send a new series with x- prefix?

Thanks!

> > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > index 1e9fe47c03..c827921654 100644
> > > > --- a/net/vhost-vdpa.c
> > > > +++ b/net/vhost-vdpa.c
> > > > @@ -127,7 +127,11 @@ err_init:
> > > >  static void vhost_vdpa_cleanup(NetClientState *nc)
> > > >  {
> > > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > > >
> > > > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > > +    }
> > > >      if (s->vhost_net) {
> > > >          vhost_net_cleanup(s->vhost_net);
> > > >          g_free(s->vhost_net);
> > > > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> > > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > > >  };
> > > >
> > > > +static int vhost_vdpa_get_iova_range(int fd,
> > > > +                                     struct vhost_vdpa_iova_range *iova_range)
> > > > +{
> > > > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > > > +
> > > > +    return ret < 0 ? -errno : 0;
> > > > +}
> > > > +
> > > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > -                                           const char *device,
> > > > -                                           const char *name,
> > > > -                                           int vdpa_device_fd,
> > > > -                                           int queue_pair_index,
> > > > -                                           int nvqs,
> > > > -                                           bool is_datapath)
> > > > +                                       const char *device,
> > > > +                                       const char *name,
> > > > +                                       int vdpa_device_fd,
> > > > +                                       int queue_pair_index,
> > > > +                                       int nvqs,
> > > > +                                       bool is_datapath,
> > > > +                                       bool svq,
> > > > +                                       VhostIOVATree *iova_tree)
> > > >  {
> > > >      NetClientState *nc = NULL;
> > > >      VhostVDPAState *s;
> > > > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > >
> > > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > > >      s->vhost_vdpa.index = queue_pair_index;
> > > > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > > +    s->vhost_vdpa.iova_tree = iova_tree;
> > > >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > > >      if (ret) {
> > > >          qemu_del_net_client(nc);
> > > > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > > >      g_autofree NetClientState **ncs = NULL;
> > > >      NetClientState *nc;
> > > >      int queue_pairs, i, has_cvq = 0;
> > > > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > > >
> > > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > >      opts = &netdev->u.vhost_vdpa;
> > > > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > > >          qemu_close(vdpa_device_fd);
> > > >          return queue_pairs;
> > > >      }
> > > > +    if (opts->svq) {
> > > > +        struct vhost_vdpa_iova_range iova_range;
> > > > +
> > > > +        if (has_cvq) {
> > > > +            error_setg(errp, "vdpa svq does not work with cvq");
> > > > +            goto err_svq;
> > > > +        }
> > > > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > > > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > > > +    }
> > > >
> > > >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > > >
> > > >      for (i = 0; i < queue_pairs; i++) {
> > > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > > -                                     vdpa_device_fd, i, 2, true);
> > > > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > > > +                                     iova_tree);
> > > >          if (!ncs[i])
> > > >              goto err;
> > > >      }
> > > >
> > > >      if (has_cvq) {
> > > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > > -                                 vdpa_device_fd, i, 1, false);
> > > > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > > > +                                 iova_tree);
> > > >          if (!nc)
> > > >              goto err;
> > > >      }
> > > >
> > > > +    iova_tree = NULL;
> > > >      return 0;
> > > >
> > > >  err:
> > > >      if (i) {
> > > >          qemu_del_net_client(ncs[0]);
> > > >      }
> > > > +
> > > > +err_svq:
> > > >      qemu_close(vdpa_device_fd);
> > > >
> > > >      return -1;
> > > > --
> > > > 2.27.0
> > >
>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-07 15:33 ` [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
  2022-03-08  7:11     ` Michael S. Tsirkin
@ 2022-03-08  9:29   ` Markus Armbruster
  1 sibling, 0 replies; 60+ messages in thread
From: Markus Armbruster @ 2022-03-08  9:29 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Michael S. Tsirkin, Jason Wang, qemu-devel, Peter Xu,
	virtualization, Eli Cohen, Eric Blake, Eduardo Habkost, Cindy Lu,
	Fangyi (Eric),
	yebiaoxiang, Liuxiangdong, Stefano Garzarella, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Juan Quintela, Harpreet Singh Anand, Lingshan

Eugenio Pérez <eperezma@redhat.com> writes:

> Finally offering the possibility to enable SVQ from the command line.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  qapi/net.json    |  8 +++++++-
>  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 47 insertions(+), 9 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index 7fab2e7cd8..d626fa441c 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -445,12 +445,18 @@
>  # @queues: number of queues to be created for multiqueue vhost-vdpa
>  #          (default: 1)
>  #
> +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> +#
> +# Features:
> +# @unstable: Member @svq is experimental.
> +#
>  # Since: 5.1
>  ##
>  { 'struct': 'NetdevVhostVDPAOptions',
>    'data': {
>      '*vhostdev':     'str',
> -    '*queues':       'int' } }
> +    '*queues':       'int',
> +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
>  
>  ##
>  # @NetClientDriver:

QAPI schema:
Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  8:20               ` Jason Wang
@ 2022-03-08 10:46                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 10:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> Generally, yes.


So generally I support the idea of merging code gradually.  And merging
with an unstable flag to enable it is a reasonable way to do it.
However we are half a day away from soft freeze, so this will just
result in the feature getting to users in it's current not really
useable form. If we just want to simplify upstreaming then
merging patches 1-14 for now would be one way to do it.
If you want to do it through your tree then ok

Acked-by: Michael S. Tsirkin <mst@redhat.com>


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08 10:46                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 10:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> Generally, yes.


So generally I support the idea of merging code gradually.  And merging
with an unstable flag to enable it is a reasonable way to do it.
However we are half a day away from soft freeze, so this will just
result in the feature getting to users in it's current not really
useable form. If we just want to simplify upstreaming then
merging patches 1-14 for now would be one way to do it.
If you want to do it through your tree then ok

Acked-by: Michael S. Tsirkin <mst@redhat.com>


-- 
MST



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08  8:20               ` Jason Wang
@ 2022-03-08 10:48                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 10:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > Not by itself but I'm not sure we can guarantee guest will not
> > attempt to use the IOVA addresses we are reserving down
> > the road.
> 
> The IOVA is allocated via the listeners and stored in the iova tree
> per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> the HVA->IOVA we allocated previously.  So we have double check here:
> 
> 1) Qemu memory core to make sure the GPA that guest uses is valid
> 2) the IOVA tree that guarantees there will be no HVA beyond what
> guest can see is used
> 
> So technically, there's no way for the guest to use the IOVA address
> allocated for the shadow virtqueue.
> 
> Thanks

I mean, IOVA is programmed in the host hardware to translate to HPA, right?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08 10:48                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 10:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > Not by itself but I'm not sure we can guarantee guest will not
> > attempt to use the IOVA addresses we are reserving down
> > the road.
> 
> The IOVA is allocated via the listeners and stored in the iova tree
> per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> the HVA->IOVA we allocated previously.  So we have double check here:
> 
> 1) Qemu memory core to make sure the GPA that guest uses is valid
> 2) the IOVA tree that guarantees there will be no HVA beyond what
> guest can see is used
> 
> So technically, there's no way for the guest to use the IOVA address
> allocated for the shadow virtqueue.
> 
> Thanks

I mean, IOVA is programmed in the host hardware to translate to HPA, right?

-- 
MST



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08 10:48                 ` Michael S. Tsirkin
  (?)
@ 2022-03-08 11:37                 ` Eugenio Perez Martin
  2022-03-08 12:16                     ` Michael S. Tsirkin
  -1 siblings, 1 reply; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08 11:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > Not by itself but I'm not sure we can guarantee guest will not
> > > attempt to use the IOVA addresses we are reserving down
> > > the road.
> >
> > The IOVA is allocated via the listeners and stored in the iova tree
> > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > the HVA->IOVA we allocated previously.  So we have double check here:
> >
> > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > guest can see is used
> >
> > So technically, there's no way for the guest to use the IOVA address
> > allocated for the shadow virtqueue.
> >
> > Thanks
>
> I mean, IOVA is programmed in the host hardware to translate to HPA, right?
>

Yes, that's right if the device uses physical maps. Also to note, SVQ
vring is allocated in multiples of host huge pages to avoid garbage or
unintended access from the device.

If a vdpa device uses physical addresses, kernel vdpa will pin qemu
memory first and then will send IOVA to HPA translation to hardware.
But this IOVA space is not controlled by the guest, but by SVQ. If a
guest's virtqueue buffer cannot be translated first to GPA, it will
not be forwarded.

Thanks!



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08 11:37                 ` Eugenio Perez Martin
@ 2022-03-08 12:16                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 12:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > attempt to use the IOVA addresses we are reserving down
> > > > the road.
> > >
> > > The IOVA is allocated via the listeners and stored in the iova tree
> > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > the HVA->IOVA we allocated previously.  So we have double check here:
> > >
> > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > guest can see is used
> > >
> > > So technically, there's no way for the guest to use the IOVA address
> > > allocated for the shadow virtqueue.
> > >
> > > Thanks
> >
> > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> >
> 
> Yes, that's right if the device uses physical maps. Also to note, SVQ
> vring is allocated in multiples of host huge pages to avoid garbage or
> unintended access from the device.
> 
> If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> memory first and then will send IOVA to HPA translation to hardware.
> But this IOVA space is not controlled by the guest, but by SVQ. If a
> guest's virtqueue buffer cannot be translated first to GPA, it will
> not be forwarded.
> 
> Thanks!

Right. So if guests send a buffer where buffer address overlaps the
range we used for the SVQ, then I think at the moment guest won't work.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08 12:16                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 12:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > attempt to use the IOVA addresses we are reserving down
> > > > the road.
> > >
> > > The IOVA is allocated via the listeners and stored in the iova tree
> > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > the HVA->IOVA we allocated previously.  So we have double check here:
> > >
> > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > guest can see is used
> > >
> > > So technically, there's no way for the guest to use the IOVA address
> > > allocated for the shadow virtqueue.
> > >
> > > Thanks
> >
> > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> >
> 
> Yes, that's right if the device uses physical maps. Also to note, SVQ
> vring is allocated in multiples of host huge pages to avoid garbage or
> unintended access from the device.
> 
> If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> memory first and then will send IOVA to HPA translation to hardware.
> But this IOVA space is not controlled by the guest, but by SVQ. If a
> guest's virtqueue buffer cannot be translated first to GPA, it will
> not be forwarded.
> 
> Thanks!

Right. So if guests send a buffer where buffer address overlaps the
range we used for the SVQ, then I think at the moment guest won't work.

-- 
MST



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-03-08  8:24         ` Eugenio Perez Martin
@ 2022-03-08 12:31             ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 12:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 09:24:05AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 9:02 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 08:32:07AM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > > > > Finally offering the possibility to enable SVQ from the command line.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >  qapi/net.json    |  8 +++++++-
> > > > >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > > > >  2 files changed, 47 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > > index 7fab2e7cd8..d626fa441c 100644
> > > > > --- a/qapi/net.json
> > > > > +++ b/qapi/net.json
> > > > > @@ -445,12 +445,18 @@
> > > > >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> > > > >  #          (default: 1)
> > > > >  #
> > > > > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > > > > +#
> > > > > +# Features:
> > > > > +# @unstable: Member @svq is experimental.
> > > > > +#
> > > > >  # Since: 5.1
> > > > >  ##
> > > > >  { 'struct': 'NetdevVhostVDPAOptions',
> > > > >    'data': {
> > > > >      '*vhostdev':     'str',
> > > > > -    '*queues':       'int' } }
> > > > > +    '*queues':       'int',
> > > > > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> > > > >
> > > > >  ##
> > > > >  # @NetClientDriver:
> > > >
> > > > I think this should be x-svq same as other unstable features.
> > > >
> > >
> > > I'm fine with both, but I was pointed to the other direction at [1] and [2].
> > >
> > > Thanks!
> > >
> > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
> > > [2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/
> >
> >
> > I think what Markus didn't know is that a bunch of changes in
> > behaviour will occur before we rename it to "svq".
> > The rename is thus less of a bother more of a bonus.
> >
> 
> I'm totally fine with going back to x-svq. I'm not sure if it's more
> appropriate to do different modes of different parameters (svq=off,
> dynamic-svq=on) or different modes of the same parameter (svq=on vs
> svq=on_migration). Or something totally different.
> 
> My impression is that all of the changes are covered with @unstable
> but I can see the advantage of x- prefix since we have not come to an
> agreement on it. I think it's the first time it is mentioned in the
> mail list.
> 
> Do you want me to send a new series with x- prefix?
> 
> Thanks!

Sure, I think it's a prudent thing to do simply because as you say the
semantics of the flag are likely to change yet.


> > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > index 1e9fe47c03..c827921654 100644
> > > > > --- a/net/vhost-vdpa.c
> > > > > +++ b/net/vhost-vdpa.c
> > > > > @@ -127,7 +127,11 @@ err_init:
> > > > >  static void vhost_vdpa_cleanup(NetClientState *nc)
> > > > >  {
> > > > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > > > >
> > > > > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > > > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > > > +    }
> > > > >      if (s->vhost_net) {
> > > > >          vhost_net_cleanup(s->vhost_net);
> > > > >          g_free(s->vhost_net);
> > > > > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> > > > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > > > >  };
> > > > >
> > > > > +static int vhost_vdpa_get_iova_range(int fd,
> > > > > +                                     struct vhost_vdpa_iova_range *iova_range)
> > > > > +{
> > > > > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > > > > +
> > > > > +    return ret < 0 ? -errno : 0;
> > > > > +}
> > > > > +
> > > > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > > -                                           const char *device,
> > > > > -                                           const char *name,
> > > > > -                                           int vdpa_device_fd,
> > > > > -                                           int queue_pair_index,
> > > > > -                                           int nvqs,
> > > > > -                                           bool is_datapath)
> > > > > +                                       const char *device,
> > > > > +                                       const char *name,
> > > > > +                                       int vdpa_device_fd,
> > > > > +                                       int queue_pair_index,
> > > > > +                                       int nvqs,
> > > > > +                                       bool is_datapath,
> > > > > +                                       bool svq,
> > > > > +                                       VhostIOVATree *iova_tree)
> > > > >  {
> > > > >      NetClientState *nc = NULL;
> > > > >      VhostVDPAState *s;
> > > > > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > >
> > > > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > > > >      s->vhost_vdpa.index = queue_pair_index;
> > > > > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > > > +    s->vhost_vdpa.iova_tree = iova_tree;
> > > > >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > > > >      if (ret) {
> > > > >          qemu_del_net_client(nc);
> > > > > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > > > >      g_autofree NetClientState **ncs = NULL;
> > > > >      NetClientState *nc;
> > > > >      int queue_pairs, i, has_cvq = 0;
> > > > > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > > > >
> > > > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > > >      opts = &netdev->u.vhost_vdpa;
> > > > > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > > > >          qemu_close(vdpa_device_fd);
> > > > >          return queue_pairs;
> > > > >      }
> > > > > +    if (opts->svq) {
> > > > > +        struct vhost_vdpa_iova_range iova_range;
> > > > > +
> > > > > +        if (has_cvq) {
> > > > > +            error_setg(errp, "vdpa svq does not work with cvq");
> > > > > +            goto err_svq;
> > > > > +        }
> > > > > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > > > > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > > > > +    }
> > > > >
> > > > >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > > > >
> > > > >      for (i = 0; i < queue_pairs; i++) {
> > > > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > > > -                                     vdpa_device_fd, i, 2, true);
> > > > > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > > > > +                                     iova_tree);
> > > > >          if (!ncs[i])
> > > > >              goto err;
> > > > >      }
> > > > >
> > > > >      if (has_cvq) {
> > > > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > > > -                                 vdpa_device_fd, i, 1, false);
> > > > > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > > > > +                                 iova_tree);
> > > > >          if (!nc)
> > > > >              goto err;
> > > > >      }
> > > > >
> > > > > +    iova_tree = NULL;
> > > > >      return 0;
> > > > >
> > > > >  err:
> > > > >      if (i) {
> > > > >          qemu_del_net_client(ncs[0]);
> > > > >      }
> > > > > +
> > > > > +err_svq:
> > > > >      qemu_close(vdpa_device_fd);
> > > > >
> > > > >      return -1;
> > > > > --
> > > > > 2.27.0
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions
@ 2022-03-08 12:31             ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-08 12:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, qemu-level, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 08, 2022 at 09:24:05AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 9:02 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 08:32:07AM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Mar 8, 2022 at 8:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Mar 07, 2022 at 04:33:34PM +0100, Eugenio Pérez wrote:
> > > > > Finally offering the possibility to enable SVQ from the command line.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >  qapi/net.json    |  8 +++++++-
> > > > >  net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > > > >  2 files changed, 47 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > > index 7fab2e7cd8..d626fa441c 100644
> > > > > --- a/qapi/net.json
> > > > > +++ b/qapi/net.json
> > > > > @@ -445,12 +445,18 @@
> > > > >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> > > > >  #          (default: 1)
> > > > >  #
> > > > > +# @svq: Start device with (experimental) shadow virtqueue. (Since 7.0)
> > > > > +#
> > > > > +# Features:
> > > > > +# @unstable: Member @svq is experimental.
> > > > > +#
> > > > >  # Since: 5.1
> > > > >  ##
> > > > >  { 'struct': 'NetdevVhostVDPAOptions',
> > > > >    'data': {
> > > > >      '*vhostdev':     'str',
> > > > > -    '*queues':       'int' } }
> > > > > +    '*queues':       'int',
> > > > > +    '*svq':          {'type': 'bool', 'features' : [ 'unstable'] } } }
> > > > >
> > > > >  ##
> > > > >  # @NetClientDriver:
> > > >
> > > > I think this should be x-svq same as other unstable features.
> > > >
> > >
> > > I'm fine with both, but I was pointed to the other direction at [1] and [2].
> > >
> > > Thanks!
> > >
> > > [1] https://patchwork.kernel.org/project/qemu-devel/patch/20220302203012.3476835-15-eperezma@redhat.com/
> > > [2] https://lore.kernel.org/qemu-devel/20220303185147.3605350-15-eperezma@redhat.com/
> >
> >
> > I think what Markus didn't know is that a bunch of changes in
> > behaviour will occur before we rename it to "svq".
> > The rename is thus less of a bother more of a bonus.
> >
> 
> I'm totally fine with going back to x-svq. I'm not sure if it's more
> appropriate to do different modes of different parameters (svq=off,
> dynamic-svq=on) or different modes of the same parameter (svq=on vs
> svq=on_migration). Or something totally different.
> 
> My impression is that all of the changes are covered with @unstable
> but I can see the advantage of x- prefix since we have not come to an
> agreement on it. I think it's the first time it is mentioned in the
> mail list.
> 
> Do you want me to send a new series with x- prefix?
> 
> Thanks!

Sure, I think it's a prudent thing to do simply because as you say the
semantics of the flag are likely to change yet.


> > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > index 1e9fe47c03..c827921654 100644
> > > > > --- a/net/vhost-vdpa.c
> > > > > +++ b/net/vhost-vdpa.c
> > > > > @@ -127,7 +127,11 @@ err_init:
> > > > >  static void vhost_vdpa_cleanup(NetClientState *nc)
> > > > >  {
> > > > >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > > > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > > > >
> > > > > +    if (dev && dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > > > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > > > +    }
> > > > >      if (s->vhost_net) {
> > > > >          vhost_net_cleanup(s->vhost_net);
> > > > >          g_free(s->vhost_net);
> > > > > @@ -187,13 +191,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> > > > >          .check_peer_type = vhost_vdpa_check_peer_type,
> > > > >  };
> > > > >
> > > > > +static int vhost_vdpa_get_iova_range(int fd,
> > > > > +                                     struct vhost_vdpa_iova_range *iova_range)
> > > > > +{
> > > > > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > > > > +
> > > > > +    return ret < 0 ? -errno : 0;
> > > > > +}
> > > > > +
> > > > >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > > -                                           const char *device,
> > > > > -                                           const char *name,
> > > > > -                                           int vdpa_device_fd,
> > > > > -                                           int queue_pair_index,
> > > > > -                                           int nvqs,
> > > > > -                                           bool is_datapath)
> > > > > +                                       const char *device,
> > > > > +                                       const char *name,
> > > > > +                                       int vdpa_device_fd,
> > > > > +                                       int queue_pair_index,
> > > > > +                                       int nvqs,
> > > > > +                                       bool is_datapath,
> > > > > +                                       bool svq,
> > > > > +                                       VhostIOVATree *iova_tree)
> > > > >  {
> > > > >      NetClientState *nc = NULL;
> > > > >      VhostVDPAState *s;
> > > > > @@ -211,6 +225,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > >
> > > > >      s->vhost_vdpa.device_fd = vdpa_device_fd;
> > > > >      s->vhost_vdpa.index = queue_pair_index;
> > > > > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > > > +    s->vhost_vdpa.iova_tree = iova_tree;
> > > > >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > > > >      if (ret) {
> > > > >          qemu_del_net_client(nc);
> > > > > @@ -266,6 +282,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > > > >      g_autofree NetClientState **ncs = NULL;
> > > > >      NetClientState *nc;
> > > > >      int queue_pairs, i, has_cvq = 0;
> > > > > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> > > > >
> > > > >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > > >      opts = &netdev->u.vhost_vdpa;
> > > > > @@ -285,29 +302,44 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > > > >          qemu_close(vdpa_device_fd);
> > > > >          return queue_pairs;
> > > > >      }
> > > > > +    if (opts->svq) {
> > > > > +        struct vhost_vdpa_iova_range iova_range;
> > > > > +
> > > > > +        if (has_cvq) {
> > > > > +            error_setg(errp, "vdpa svq does not work with cvq");
> > > > > +            goto err_svq;
> > > > > +        }
> > > > > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > > > > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > > > > +    }
> > > > >
> > > > >      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > > > >
> > > > >      for (i = 0; i < queue_pairs; i++) {
> > > > >          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > > > -                                     vdpa_device_fd, i, 2, true);
> > > > > +                                     vdpa_device_fd, i, 2, true, opts->svq,
> > > > > +                                     iova_tree);
> > > > >          if (!ncs[i])
> > > > >              goto err;
> > > > >      }
> > > > >
> > > > >      if (has_cvq) {
> > > > >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > > > > -                                 vdpa_device_fd, i, 1, false);
> > > > > +                                 vdpa_device_fd, i, 1, false, opts->svq,
> > > > > +                                 iova_tree);
> > > > >          if (!nc)
> > > > >              goto err;
> > > > >      }
> > > > >
> > > > > +    iova_tree = NULL;
> > > > >      return 0;
> > > > >
> > > > >  err:
> > > > >      if (i) {
> > > > >          qemu_del_net_client(ncs[0]);
> > > > >      }
> > > > > +
> > > > > +err_svq:
> > > > >      qemu_close(vdpa_device_fd);
> > > > >
> > > > >      return -1;
> > > > > --
> > > > > 2.27.0
> > > >
> >



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08 10:46                 ` Michael S. Tsirkin
@ 2022-03-08 13:23                   ` Jason Wang
  -1 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08 13:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand, Lingshan


[-- Attachment #1.1: Type: text/plain, Size: 730 bytes --]

On Tue, Mar 8, 2022 at 6:46 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > Generally, yes.
>
>
> So generally I support the idea of merging code gradually.  And merging
> with an unstable flag to enable it is a reasonable way to do it.
> However we are half a day away from soft freeze, so this will just
> result in the feature getting to users in it's current not really
> useable form. If we just want to simplify upstreaming then
> merging patches 1-14 for now would be one way to do it.
>

Yes.


> If you want to do it through your tree then ok
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>

Thanks. Will send a pull request soon.


>
>
> --
> MST
>
>

[-- Attachment #1.2: Type: text/html, Size: 1490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-08 13:23                   ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-08 13:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Pérez, Liuxiangdong,
	Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

[-- Attachment #1: Type: text/plain, Size: 730 bytes --]

On Tue, Mar 8, 2022 at 6:46 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > Generally, yes.
>
>
> So generally I support the idea of merging code gradually.  And merging
> with an unstable flag to enable it is a reasonable way to do it.
> However we are half a day away from soft freeze, so this will just
> result in the feature getting to users in it's current not really
> useable form. If we just want to simplify upstreaming then
> merging patches 1-14 for now would be one way to do it.
>

Yes.


> If you want to do it through your tree then ok
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>

Thanks. Will send a pull request soon.


>
>
> --
> MST
>
>

[-- Attachment #2: Type: text/html, Size: 1490 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08 12:16                     ` Michael S. Tsirkin
  (?)
@ 2022-03-08 13:56                     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 60+ messages in thread
From: Eugenio Perez Martin @ 2022-03-08 13:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, Peter Xu, virtualization, Eli Cohen,
	Eric Blake, Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan

On Tue, Mar 8, 2022 at 1:17 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > attempt to use the IOVA addresses we are reserving down
> > > > > the road.
> > > >
> > > > The IOVA is allocated via the listeners and stored in the iova tree
> > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > > the HVA->IOVA we allocated previously.  So we have double check here:
> > > >
> > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > guest can see is used
> > > >
> > > > So technically, there's no way for the guest to use the IOVA address
> > > > allocated for the shadow virtqueue.
> > > >
> > > > Thanks
> > >
> > > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> > >
> >
> > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > vring is allocated in multiples of host huge pages to avoid garbage or
> > unintended access from the device.
> >
> > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > memory first and then will send IOVA to HPA translation to hardware.
> > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > guest's virtqueue buffer cannot be translated first to GPA, it will
> > not be forwarded.
> >
> > Thanks!
>
> Right. So if guests send a buffer where buffer address overlaps the
> range we used for the SVQ, then I think at the moment guest won't work.
>

I'm going to dissect a few cases so we're able to sync where the POV
differs. Letting out vIOMMU for simplicity.

If qemu uses an emulated device, it reads VirtQueue and translates
addresses from GPA to HVA via virtqueue_pop. If the guest places an
address out of GPA, dma_memory_map returns an error ("virtio: bogus
descriptor or out of resources").

It doesn't make sense to say "the buffer address overlaps with qemu
memory" here since the conversion function is not defined for all GPA.
If the range is not in GPA, it's a bogus descriptor.

Now we use a vdpa device that uses physical mapping and we start qemu
with no svq. When qemu starts it  maps IOVA == GPA to HVA. When the
vdpa kernel receives the mapping, it pins the HVA memory, obtaining
HPA, and sends the memory map IOVA == GPA to HPA mappings to the
hardware. This is supported here.

If we add SVQ, the IOVA is not GPA anymore. GPA chunks are mapped to
IOVA, and SVQ is mapped too to IOVA, they don't overlap so the device
can access them both. When the memory listener tells vdpa that a new
chunk of memory is added, the code of SVQ does not care about GPA: It
allocates a free region of IOVA for the HVA region of the guest's
memory. GPA to HVA is already tracked and translated by virtqueue_pop.

Let's use example numbers:
- SVQ occupies HVA [0xa000, 0xb000). It's the first one to call
iova_tree_alloc_map, so it's mapped that IOVA [0,0x1000) translates to
[0xa000, 0xb000).
- The memory listener now reports GPA from [0, 0x1000), translated to
HVA [0x8000, 0x9000). The new call to iova_tree_alloc_map assigns the
IOVA [0x1000, 0x2000) to HVA [0x8000, 0x9000).

Then that IOVA tree is sent to the device. From the kernel POV is the
same: It gets HVA addresses, pins them, and configures the hardware so
it can translate IOVA (!= GPA) to HPA.

SVQ now reads descriptors from the guest using virtqueue_pop, so SVQ
as it's caller does not use GPA to address them but HVA. If the
guest's vring descriptor is outside of GPA [0, 0x1000), it's an error
as in emulated device. After that, it translates HVA to IOVA with the
iova-tree. The result must be within [0x1000, 0x2000).

So guests should not be able to write qemu's memory outside the
guest's memory unless it hits a bug either in SVQ code or in qemu's
Virtqueue/DMA system.

Let me know if this makes sense to you.

Thanks!



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-08 12:16                     ` Michael S. Tsirkin
@ 2022-03-09  3:38                       ` Jason Wang
  -1 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-09  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eugenio Perez Martin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Laurent Vivier,
	Parav Pandit, Richard Henderson, Gautam Dawar, Xiao W Wang,
	Stefan Hajnoczi, Harpreet Singh Anand, Lingshan


在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
>> On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
>>>>> Not by itself but I'm not sure we can guarantee guest will not
>>>>> attempt to use the IOVA addresses we are reserving down
>>>>> the road.
>>>> The IOVA is allocated via the listeners and stored in the iova tree
>>>> per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
>>>> virtio core see GPA to HVA mapping. And we do a reverse lookup to find
>>>> the HVA->IOVA we allocated previously.  So we have double check here:
>>>>
>>>> 1) Qemu memory core to make sure the GPA that guest uses is valid
>>>> 2) the IOVA tree that guarantees there will be no HVA beyond what
>>>> guest can see is used
>>>>
>>>> So technically, there's no way for the guest to use the IOVA address
>>>> allocated for the shadow virtqueue.
>>>>
>>>> Thanks
>>> I mean, IOVA is programmed in the host hardware to translate to HPA, right?
>>>
>> Yes, that's right if the device uses physical maps. Also to note, SVQ
>> vring is allocated in multiples of host huge pages to avoid garbage or
>> unintended access from the device.
>>
>> If a vdpa device uses physical addresses, kernel vdpa will pin qemu
>> memory first and then will send IOVA to HPA translation to hardware.
>> But this IOVA space is not controlled by the guest, but by SVQ. If a
>> guest's virtqueue buffer cannot be translated first to GPA, it will
>> not be forwarded.
>>
>> Thanks!
> Right. So if guests send a buffer where buffer address overlaps the
> range we used for the SVQ, then I think at the moment guest won't work.


There's no way for a guest to do this, it can only use GPA but the Qemu 
won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated 
for those GPA ranges so SVQ won't use IOVA that is overlapped with what 
Guest use.

Thanks


>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-09  3:38                       ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-09  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eugenio Perez Martin
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Liuxiangdong, Stefano Garzarella,
	Laurent Vivier, Parav Pandit, Richard Henderson, Gautam Dawar,
	Xiao W Wang, Stefan Hajnoczi, Juan Quintela,
	Harpreet Singh Anand, Lingshan


在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
>> On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
>>>>> Not by itself but I'm not sure we can guarantee guest will not
>>>>> attempt to use the IOVA addresses we are reserving down
>>>>> the road.
>>>> The IOVA is allocated via the listeners and stored in the iova tree
>>>> per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
>>>> virtio core see GPA to HVA mapping. And we do a reverse lookup to find
>>>> the HVA->IOVA we allocated previously.  So we have double check here:
>>>>
>>>> 1) Qemu memory core to make sure the GPA that guest uses is valid
>>>> 2) the IOVA tree that guarantees there will be no HVA beyond what
>>>> guest can see is used
>>>>
>>>> So technically, there's no way for the guest to use the IOVA address
>>>> allocated for the shadow virtqueue.
>>>>
>>>> Thanks
>>> I mean, IOVA is programmed in the host hardware to translate to HPA, right?
>>>
>> Yes, that's right if the device uses physical maps. Also to note, SVQ
>> vring is allocated in multiples of host huge pages to avoid garbage or
>> unintended access from the device.
>>
>> If a vdpa device uses physical addresses, kernel vdpa will pin qemu
>> memory first and then will send IOVA to HPA translation to hardware.
>> But this IOVA space is not controlled by the guest, but by SVQ. If a
>> guest's virtqueue buffer cannot be translated first to GPA, it will
>> not be forwarded.
>>
>> Thanks!
> Right. So if guests send a buffer where buffer address overlaps the
> range we used for the SVQ, then I think at the moment guest won't work.


There's no way for a guest to do this, it can only use GPA but the Qemu 
won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated 
for those GPA ranges so SVQ won't use IOVA that is overlapped with what 
Guest use.

Thanks


>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-09  3:38                       ` Jason Wang
@ 2022-03-09  7:30                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09  7:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Perez Martin,
	Liuxiangdong, Laurent Vivier, Parav Pandit, Richard Henderson,
	Gautam Dawar, Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand,
	Lingshan

On Wed, Mar 09, 2022 at 11:38:35AM +0800, Jason Wang wrote:
> 
> 在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> > On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > > attempt to use the IOVA addresses we are reserving down
> > > > > > the road.
> > > > > The IOVA is allocated via the listeners and stored in the iova tree
> > > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > > > the HVA->IOVA we allocated previously.  So we have double check here:
> > > > > 
> > > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > > guest can see is used
> > > > > 
> > > > > So technically, there's no way for the guest to use the IOVA address
> > > > > allocated for the shadow virtqueue.
> > > > > 
> > > > > Thanks
> > > > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> > > > 
> > > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > > vring is allocated in multiples of host huge pages to avoid garbage or
> > > unintended access from the device.
> > > 
> > > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > > memory first and then will send IOVA to HPA translation to hardware.
> > > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > > guest's virtqueue buffer cannot be translated first to GPA, it will
> > > not be forwarded.
> > > 
> > > Thanks!
> > Right. So if guests send a buffer where buffer address overlaps the
> > range we used for the SVQ, then I think at the moment guest won't work.
> 
> 
> There's no way for a guest to do this, it can only use GPA

With a vIOMMU it can.

> but the Qemu
> won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated for
> those GPA ranges so SVQ won't use IOVA that is overlapped with what Guest
> use.
> 
> Thanks
> 
> 
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-09  7:30                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2022-03-09  7:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Perez Martin,
	Liuxiangdong, Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

On Wed, Mar 09, 2022 at 11:38:35AM +0800, Jason Wang wrote:
> 
> 在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> > On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > > attempt to use the IOVA addresses we are reserving down
> > > > > > the road.
> > > > > The IOVA is allocated via the listeners and stored in the iova tree
> > > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > > > the HVA->IOVA we allocated previously.  So we have double check here:
> > > > > 
> > > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > > guest can see is used
> > > > > 
> > > > > So technically, there's no way for the guest to use the IOVA address
> > > > > allocated for the shadow virtqueue.
> > > > > 
> > > > > Thanks
> > > > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> > > > 
> > > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > > vring is allocated in multiples of host huge pages to avoid garbage or
> > > unintended access from the device.
> > > 
> > > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > > memory first and then will send IOVA to HPA translation to hardware.
> > > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > > guest's virtqueue buffer cannot be translated first to GPA, it will
> > > not be forwarded.
> > > 
> > > Thanks!
> > Right. So if guests send a buffer where buffer address overlaps the
> > range we used for the SVQ, then I think at the moment guest won't work.
> 
> 
> There's no way for a guest to do this, it can only use GPA

With a vIOMMU it can.

> but the Qemu
> won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated for
> those GPA ranges so SVQ won't use IOVA that is overlapped with what Guest
> use.
> 
> Thanks
> 
> 
> > 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
  2022-03-09  7:30                         ` Michael S. Tsirkin
@ 2022-03-09  7:45                           ` Jason Wang
  -1 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-09  7:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Perez Martin,
	Liuxiangdong, Laurent Vivier, Parav Pandit, Richard Henderson,
	Gautam Dawar, Xiao W Wang, Stefan Hajnoczi, Harpreet Singh Anand,
	Lingshan


[-- Attachment #1.1: Type: text/plain, Size: 2830 bytes --]

On Wed, Mar 9, 2022 at 3:30 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, Mar 09, 2022 at 11:38:35AM +0800, Jason Wang wrote:
> >
> > 在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> > > On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > > > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > > > attempt to use the IOVA addresses we are reserving down
> > > > > > > the road.
> > > > > > The IOVA is allocated via the listeners and stored in the iova
> tree
> > > > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > > > virtio core see GPA to HVA mapping. And we do a reverse lookup
> to find
> > > > > > the HVA->IOVA we allocated previously.  So we have double check
> here:
> > > > > >
> > > > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > > > guest can see is used
> > > > > >
> > > > > > So technically, there's no way for the guest to use the IOVA
> address
> > > > > > allocated for the shadow virtqueue.
> > > > > >
> > > > > > Thanks
> > > > > I mean, IOVA is programmed in the host hardware to translate to
> HPA, right?
> > > > >
> > > > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > > > vring is allocated in multiples of host huge pages to avoid garbage
> or
> > > > unintended access from the device.
> > > >
> > > > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > > > memory first and then will send IOVA to HPA translation to hardware.
> > > > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > > > guest's virtqueue buffer cannot be translated first to GPA, it will
> > > > not be forwarded.
> > > >
> > > > Thanks!
> > > Right. So if guests send a buffer where buffer address overlaps the
> > > range we used for the SVQ, then I think at the moment guest won't work.
> >
> >
> > There's no way for a guest to do this, it can only use GPA
>
> With a vIOMMU it can.
>

It should be the same or I may miss something.

With a vIOMMU, vDPA devices still won't use gIOVA. Instead the device will
use the IOVA that is managed by the Qemu.

Listeners: IOVA->HVA
Qemu virtqueue helper: gIOVA->GPA->HVA
SVQ: HVA->IOVA

So SVQ will use an IOVA that is overlapped with gIOVA/GPA

Thanks


>
> > but the Qemu
> > won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated
> for
> > those GPA ranges so SVQ won't use IOVA that is overlapped with what Guest
> > use.
> >
> > Thanks
> >
> >
> > >
>
>

[-- Attachment #1.2: Type: text/html, Size: 4067 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 00/15] vDPA shadow virtqueue
@ 2022-03-09  7:45                           ` Jason Wang
  0 siblings, 0 replies; 60+ messages in thread
From: Jason Wang @ 2022-03-09  7:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Peter Xu, virtualization, Eli Cohen, Eric Blake,
	Eduardo Habkost, Cindy Lu, Fangyi (Eric),
	Markus Armbruster, yebiaoxiang, Eugenio Perez Martin,
	Liuxiangdong, Stefano Garzarella, Laurent Vivier, Parav Pandit,
	Richard Henderson, Gautam Dawar, Xiao W Wang, Stefan Hajnoczi,
	Juan Quintela, Harpreet Singh Anand, Lingshan

[-- Attachment #1: Type: text/plain, Size: 2830 bytes --]

On Wed, Mar 9, 2022 at 3:30 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, Mar 09, 2022 at 11:38:35AM +0800, Jason Wang wrote:
> >
> > 在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> > > On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > > > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > > > attempt to use the IOVA addresses we are reserving down
> > > > > > > the road.
> > > > > > The IOVA is allocated via the listeners and stored in the iova
> tree
> > > > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > > > virtio core see GPA to HVA mapping. And we do a reverse lookup
> to find
> > > > > > the HVA->IOVA we allocated previously.  So we have double check
> here:
> > > > > >
> > > > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > > > guest can see is used
> > > > > >
> > > > > > So technically, there's no way for the guest to use the IOVA
> address
> > > > > > allocated for the shadow virtqueue.
> > > > > >
> > > > > > Thanks
> > > > > I mean, IOVA is programmed in the host hardware to translate to
> HPA, right?
> > > > >
> > > > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > > > vring is allocated in multiples of host huge pages to avoid garbage
> or
> > > > unintended access from the device.
> > > >
> > > > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > > > memory first and then will send IOVA to HPA translation to hardware.
> > > > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > > > guest's virtqueue buffer cannot be translated first to GPA, it will
> > > > not be forwarded.
> > > >
> > > > Thanks!
> > > Right. So if guests send a buffer where buffer address overlaps the
> > > range we used for the SVQ, then I think at the moment guest won't work.
> >
> >
> > There's no way for a guest to do this, it can only use GPA
>
> With a vIOMMU it can.
>

It should be the same or I may miss something.

With a vIOMMU, vDPA devices still won't use gIOVA. Instead the device will
use the IOVA that is managed by the Qemu.

Listeners: IOVA->HVA
Qemu virtqueue helper: gIOVA->GPA->HVA
SVQ: HVA->IOVA

So SVQ will use an IOVA that is overlapped with gIOVA/GPA

Thanks


>
> > but the Qemu
> > won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated
> for
> > those GPA ranges so SVQ won't use IOVA that is overlapped with what Guest
> > use.
> >
> > Thanks
> >
> >
> > >
>
>

[-- Attachment #2: Type: text/html, Size: 4067 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-03-09  7:48 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 15:33 [PATCH v5 00/15] vDPA shadow virtqueue Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 01/15] vhost: Add VhostShadowVirtqueue Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 02/15] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 03/15] vhost: Add Shadow VirtQueue call " Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 04/15] vhost: Add vhost_svq_valid_features to shadow vq Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 05/15] virtio: Add vhost_svq_get_vring_addr Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 06/15] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 07/15] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 08/15] util: Add iova_tree_alloc_map Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 09/15] util: add iova_tree_find_iova Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 10/15] vhost: Add VhostIOVATree Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 11/15] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 12/15] vdpa: Adapt vhost_vdpa_get_vring_base " Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 13/15] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 14/15] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
2022-03-07 15:33 ` [PATCH v5 15/15] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
2022-03-08  7:11   ` Michael S. Tsirkin
2022-03-08  7:11     ` Michael S. Tsirkin
2022-03-08  7:32     ` Eugenio Perez Martin
2022-03-08  7:33       ` Eugenio Perez Martin
2022-03-08  8:02       ` Michael S. Tsirkin
2022-03-08  8:02         ` Michael S. Tsirkin
2022-03-08  8:24         ` Eugenio Perez Martin
2022-03-08 12:31           ` Michael S. Tsirkin
2022-03-08 12:31             ` Michael S. Tsirkin
2022-03-08  9:29   ` Markus Armbruster
2022-03-08  6:03 ` [PATCH v5 00/15] vDPA shadow virtqueue Jason Wang
2022-03-08  6:03   ` Jason Wang
2022-03-08  7:11   ` Michael S. Tsirkin
2022-03-08  7:11     ` Michael S. Tsirkin
2022-03-08  7:14     ` Jason Wang
2022-03-08  7:14       ` Jason Wang
2022-03-08  7:27       ` Michael S. Tsirkin
2022-03-08  7:27         ` Michael S. Tsirkin
2022-03-08  7:34         ` Jason Wang
2022-03-08  7:34           ` Jason Wang
2022-03-08  7:55           ` Michael S. Tsirkin
2022-03-08  7:55             ` Michael S. Tsirkin
2022-03-08  8:15             ` Eugenio Perez Martin
2022-03-08  8:19               ` Michael S. Tsirkin
2022-03-08  8:19                 ` Michael S. Tsirkin
2022-03-08  8:20             ` Jason Wang
2022-03-08  8:20               ` Jason Wang
2022-03-08 10:46               ` Michael S. Tsirkin
2022-03-08 10:46                 ` Michael S. Tsirkin
2022-03-08 13:23                 ` Jason Wang
2022-03-08 13:23                   ` Jason Wang
2022-03-08 10:48               ` Michael S. Tsirkin
2022-03-08 10:48                 ` Michael S. Tsirkin
2022-03-08 11:37                 ` Eugenio Perez Martin
2022-03-08 12:16                   ` Michael S. Tsirkin
2022-03-08 12:16                     ` Michael S. Tsirkin
2022-03-08 13:56                     ` Eugenio Perez Martin
2022-03-09  3:38                     ` Jason Wang
2022-03-09  3:38                       ` Jason Wang
2022-03-09  7:30                       ` Michael S. Tsirkin
2022-03-09  7:30                         ` Michael S. Tsirkin
2022-03-09  7:45                         ` Jason Wang
2022-03-09  7:45                           ` Jason Wang
2022-03-08  7:49         ` Eugenio Perez Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.