All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
@ 2022-05-19 19:12 Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
                   ` (21 more replies)
  0 siblings, 22 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support multiqueue and other
configurations.

Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
states, effectively intercepting them so qemu can track what regions of memory
are dirty because device action and needs migration. However, this does not
solve networking device state seen by the driver because CVQ messages, like
changes on MAC addresses from the driver.

To solve that, this series uses SVQ infraestructure proposed to intercept
networking control messages used by the device. This way, qemu is able to
update VirtIONet device model and to migrate it.

However, to intercept all queues would slow device data forwarding. To solve
that, only the CVQ must be intercepted all the time. This is achieved using
the ASID infraestructure, that allows different translations for different
virtqueues. The most updated kernel part of ASID is proposed at [1].

You can run qemu in two modes after applying this series: only intercepting
cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:

-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on

First three patches enable the update of the virtio-net device model for each
CVQ message acknoledged by the device.

Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
This allows simplyfing the memory mapping, instead of map all the guest's
memory like in the data virtqueues.

Patch 10 allows to inject control messages to the device. This allows to set
state to the device both at QEMU startup and at live migration destination. In
the future, this may also be used to emulate _F_ANNOUNCE.

Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
because they are still not accepted in the kernel.

Patches 12-16 enables the set of the features of the net device model to the
vdpa device at device start.

Last ones enables the sepparated ASID and SVQ.

Comments are welcomed.

TODO:
* Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
  reason, blocking migration. This is tricky, since it can cause that the VM
  cannot be migrated anymore, so some way of block it must be used.
* Review failure paths, some are with TODO notes, other don't.

Changes from rfc v7:
* Don't map all guest space in ASID 1 but copy all the buffers. No need for
  more memory listeners.
* Move net backend start callback to SVQ.
* Wait for device CVQ commands used by the device at SVQ start, avoiding races.
* Changed ioctls, but they're provisional anyway.
* Reorder commits so refactor and code adding ones are closer to usage.
* Usual cleaning: better tracing, doc, patches messages, ...

Changes from rfc v6:
* Fix bad iotlb updates order when batching was enabled
* Add reference counting to iova_tree so cleaning is simpler.

Changes from rfc v5:
* Fixes bad calculus of cvq end group when MQ is not acked by the guest.

Changes from rfc v4:
* Add missing tracing
* Add multiqueue support
* Use already sent version for replacing g_memdup
* Care with memory management

Changes from rfc v3:
* Fix bad returning of descriptors to SVQ list.

Changes from rfc v2:
* Fix use-after-free.

Changes from rfc v1:
* Rebase to latest master.
* Configure ASID instead of assuming cvq asid != data vqs asid.
* Update device model so (MAC) state can be migrated too.

[1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/

Eugenio Pérez (21):
  virtio-net: Expose ctrl virtqueue logic
  vhost: Add custom used buffer callback
  vdpa: control virtqueue support on shadow virtqueue
  virtio: Make virtqueue_alloc_element non-static
  vhost: Add vhost_iova_tree_find
  vdpa: Add map/unmap operation callback to SVQ
  vhost: move descriptor translation to vhost_svq_vring_write_descs
  vhost: Add SVQElement
  vhost: Add svq copy desc mode
  vhost: Add vhost_svq_inject
  vhost: Update kernel headers
  vdpa: delay set_vring_ready after DRIVER_OK
  vhost: Add ShadowVirtQueueStart operation
  vhost: Make possible to check for device exclusive vq group
  vhost: add vhost_svq_poll
  vdpa: Add vhost_vdpa_start_control_svq
  vdpa: Add asid attribute to vdpa device
  vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  vhost: Add reference counting to vhost_iova_tree
  vdpa: Add x-svq to NetdevVhostVDPAOptions
  vdpa: Add x-cvq-svq

 qapi/net.json                                |  13 +-
 hw/virtio/vhost-iova-tree.h                  |   7 +-
 hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
 include/hw/virtio/vhost-vdpa.h               |   3 +
 include/hw/virtio/vhost.h                    |   3 +
 include/hw/virtio/virtio-net.h               |   4 +
 include/hw/virtio/virtio.h                   |   1 +
 include/standard-headers/linux/vhost_types.h |  11 +-
 linux-headers/linux/vhost.h                  |  25 +-
 hw/net/vhost_net.c                           |   5 +-
 hw/net/virtio-net.c                          |  84 +++--
 hw/virtio/vhost-iova-tree.c                  |  35 +-
 hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
 hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
 hw/virtio/virtio.c                           |   2 +-
 net/vhost-vdpa.c                             | 294 ++++++++++++++-
 hw/virtio/trace-events                       |  10 +-
 17 files changed, 1012 insertions(+), 130 deletions(-)

-- 
2.27.0




^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-07  6:13   ` Jason Wang
  2022-05-19 19:12 ` [RFC PATCH v8 02/21] vhost: Add custom used buffer callback Eugenio Pérez
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

This allows external vhost-net devices to modify the state of the
VirtIO device model once vhost-vdpa device has acknowledge the control
commands.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio-net.h |  4 ++
 hw/net/virtio-net.c            | 84 ++++++++++++++++++++--------------
 2 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032627..cd31b7f67d 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -218,6 +218,10 @@ struct VirtIONet {
     struct EBPFRSSContext ebpf_rss;
 };
 
+unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                    const struct iovec *in_sg, size_t in_num,
+                                    const struct iovec *out_sg,
+                                    unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                    const char *type);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ad948ee7c..0e350154ec 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1434,57 +1434,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
     return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                    const struct iovec *in_sg, size_t in_num,
+                                    const struct iovec *out_sg,
+                                    unsigned out_num)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_ctrl_hdr ctrl;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-    VirtQueueElement *elem;
     size_t s;
     struct iovec *iov, *iov2;
-    unsigned int iov_cnt;
+
+    if (iov_size(in_sg, in_num) < sizeof(status) ||
+        iov_size(out_sg, out_num) < sizeof(ctrl)) {
+        virtio_error(vdev, "virtio-net ctrl missing headers");
+        return 0;
+    }
+
+    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
+    iov_discard_front(&iov, &out_num, sizeof(ctrl));
+    if (s != sizeof(ctrl)) {
+        status = VIRTIO_NET_ERR;
+    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+    }
+
+    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
+    assert(s == sizeof(status));
+
+    g_free(iov2);
+    return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtQueueElement *elem;
 
     for (;;) {
+        unsigned written;
         elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
         if (!elem) {
             break;
         }
-        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-            virtio_error(vdev, "virtio-net ctrl missing headers");
+
+        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+                                             elem->out_sg, elem->out_num);
+        if (written > 0) {
+            virtqueue_push(vq, elem, written);
+            virtio_notify(vdev, vq);
+            g_free(elem);
+        } else {
             virtqueue_detach_element(vq, elem, 0);
             g_free(elem);
             break;
         }
-
-        iov_cnt = elem->out_num;
-        iov2 = iov = g_memdup2(elem->out_sg,
-                               sizeof(struct iovec) * elem->out_num);
-        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
-        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
-        if (s != sizeof(ctrl)) {
-            status = VIRTIO_NET_ERR;
-        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
-            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
-            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
-        }
-
-        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
-        assert(s == sizeof(status));
-
-        virtqueue_push(vq, elem, sizeof(status));
-        virtio_notify(vdev, vq);
-        g_free(iov2);
-        g_free(elem);
     }
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 02/21] vhost: Add custom used buffer callback
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-07  6:12   ` Jason Wang
  2022-05-19 19:12 ` [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

The callback allows SVQ users to know the VirtQueue requests and
responses. QEMU can use this to synchronize virtio device model state,
allowing to migrate it with minimum changes to the migration code.

In the case of networking, this will be used to inspect control
virtqueue messages.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 16 +++++++++++++++-
 include/hw/virtio/vhost-vdpa.h     |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c |  9 ++++++++-
 hw/virtio/vhost-vdpa.c             |  3 ++-
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index c132c994e9..6593f07db3 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,13 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
+                                         const VirtQueueElement *elem);
+
+typedef struct VhostShadowVirtqueueOps {
+    VirtQueueElementCallback used_elem_handler;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -59,6 +66,12 @@ typedef struct VhostShadowVirtqueue {
      */
     uint16_t *desc_next;
 
+    /* Optional callbacks */
+    const VhostShadowVirtqueueOps *ops;
+
+    /* Optional custom used virtqueue element handler */
+    VirtQueueElementCallback used_elem_cb;
+
     /* Next head to expose to the device */
     uint16_t shadow_avail_idx;
 
@@ -85,7 +98,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a29dbb3f53..f1ba46a860 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -17,6 +17,7 @@
 #include "hw/virtio/vhost-iova-tree.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 
 typedef struct VhostVDPAHostNotifier {
     MemoryRegion mr;
@@ -35,6 +36,7 @@ typedef struct vhost_vdpa {
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
+    const VhostShadowVirtqueueOps *shadow_vq_ops;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 56c96ebd13..167db8be45 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -410,6 +410,10 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 break;
             }
 
+            if (svq->ops && svq->ops->used_elem_handler) {
+                svq->ops->used_elem_handler(svq->vdev, elem);
+            }
+
             if (unlikely(i >= svq->vring.num)) {
                 qemu_log_mask(LOG_GUEST_ERROR,
                          "More than %u used buffers obtained in a %u size SVQ",
@@ -607,12 +611,14 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ operations hooks
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -634,6 +640,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
+    svq->ops = ops;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 66f054a12c..7677b337e6 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -418,7 +418,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
+        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
+                                                            v->shadow_vq_ops);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 02/21] vhost: Add custom used buffer callback Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-07  6:05   ` Jason Wang
  2022-05-19 19:12 ` [RFC PATCH v8 04/21] virtio: Make virtqueue_alloc_element non-static Eugenio Pérez
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like multiqueue.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
support and virtio-net driver changes MAC or the number of queues
virtio-net device model will be updated with the new one.

Others cvq commands could be added here straightforwardly but they have
been not tested.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index df1e69ee72..ef12fc284c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
@@ -187,6 +188,46 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
+                                       const VirtQueueElement *elem)
+{
+    struct virtio_net_ctrl_hdr ctrl;
+    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+    size_t s;
+    struct iovec in = {
+        .iov_base = &status,
+        .iov_len = sizeof(status),
+    };
+
+    s = iov_to_buf(elem->out_sg, elem->out_num, 0, &ctrl, sizeof(ctrl.class));
+    if (s != sizeof(ctrl.class)) {
+        return;
+    }
+
+    switch (ctrl.class) {
+    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
+    case VIRTIO_NET_CTRL_MQ:
+        break;
+    default:
+        return;
+    };
+
+    s = iov_to_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
+    if (s != sizeof(status) || status != VIRTIO_NET_OK) {
+        return;
+    }
+
+    status = VIRTIO_NET_ERR;
+    virtio_net_handle_ctrl_iov(vdev, &in, 1, elem->out_sg, elem->out_num);
+    if (status != VIRTIO_NET_OK) {
+        error_report("Bad CVQ processing in model");
+    }
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+    .used_elem_handler = vhost_vdpa_net_handle_ctrl,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            const char *device,
                                            const char *name,
@@ -211,6 +252,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    if (!is_datapath) {
+        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+    }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         qemu_del_net_client(nc);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 04/21] virtio: Make virtqueue_alloc_element non-static
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 05/21] vhost: Add vhost_iova_tree_find Eugenio Pérez
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

So SVQ can allocate elements by calling it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio.h | 1 +
 hw/virtio/virtio.c         | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..5ca29e8757 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -198,6 +198,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx);
 
 void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem);
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num);
 void *virtqueue_pop(VirtQueue *vq, size_t sz);
 unsigned int virtqueue_drop_all(VirtQueue *vq);
 void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..b0929ba86c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1426,7 +1426,7 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
                                                                         false);
 }
 
-static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
 {
     VirtQueueElement *elem;
     size_t in_addr_ofs = QEMU_ALIGN_UP(sz, __alignof__(elem->in_addr[0]));
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 05/21] vhost: Add vhost_iova_tree_find
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 04/21] virtio: Make virtqueue_alloc_element non-static Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 06/21] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Just a simple wrapper so we can find DMAMap entries based on iova

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  2 ++
 hw/virtio/vhost-iova-tree.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 6a4f24e0f9..1ffcdc5b57 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -19,6 +19,8 @@ VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
 void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
 
+const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
+                                   const DMAMap *map);
 const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
                                         const DMAMap *map);
 int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 67bf6d57ab..1a59894385 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -56,6 +56,20 @@ void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
     g_free(iova_tree);
 }
 
+/**
+ * Find a mapping in the tree that matches map
+ *
+ * @iova_tree  The iova tree
+ * @map        The map
+ *
+ * Return a matching map that contains argument map or NULL
+ */
+const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
+                                   const DMAMap *map)
+{
+    return iova_tree_find(iova_tree->iova_taddr_map, map);
+}
+
 /**
  * Find the IOVA address stored from a memory address
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 06/21] vdpa: Add map/unmap operation callback to SVQ
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 05/21] vhost: Add vhost_iova_tree_find Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 07/21] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Net Shadow Control VirtQueue will use them to map buffers outside of the
guest's address space.

These are needed for other features like indirect descriptors. They can be used
to map SVQ vrings: It is currently done outside of
vhost-shadow-virtqueue.c and that is a duplication.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 21 +++++++++++++++++++--
 hw/virtio/vhost-shadow-virtqueue.c |  8 +++++++-
 hw/virtio/vhost-vdpa.c             | 20 +++++++++++++++++++-
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 6593f07db3..50f45153c0 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -22,6 +22,15 @@ typedef struct VhostShadowVirtqueueOps {
     VirtQueueElementCallback used_elem_handler;
 } VhostShadowVirtqueueOps;
 
+typedef int (*vhost_svq_map_op)(hwaddr iova, hwaddr size, void *vaddr,
+                                bool readonly, void *opaque);
+typedef int (*vhost_svq_unmap_op)(hwaddr iova, hwaddr size, void *opaque);
+
+typedef struct VhostShadowVirtqueueMapOps {
+    vhost_svq_map_op map;
+    vhost_svq_unmap_op unmap;
+} VhostShadowVirtqueueMapOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -69,6 +78,12 @@ typedef struct VhostShadowVirtqueue {
     /* Optional callbacks */
     const VhostShadowVirtqueueOps *ops;
 
+    /* Device memory mapping callbacks */
+    const VhostShadowVirtqueueMapOps *map_ops;
+
+    /* Device memory mapping callbacks opaque */
+    void *map_ops_opaque;
+
     /* Optional custom used virtqueue element handler */
     VirtQueueElementCallback used_elem_cb;
 
@@ -98,8 +113,10 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-                                    const VhostShadowVirtqueueOps *ops);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
+                                    const VhostShadowVirtqueueOps *ops,
+                                    const VhostShadowVirtqueueMapOps *map_ops,
+                                    void *map_ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 167db8be45..a6a8e403ea 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -612,13 +612,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  *
  * @iova_tree: Tree to perform descriptors translations
  * @ops: SVQ operations hooks
+ * @map_ops: SVQ mapping operation hooks
+ * @map_ops_opaque: Opaque data to pass to mapping operations
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-                                    const VhostShadowVirtqueueOps *ops)
+                                    const VhostShadowVirtqueueOps *ops,
+                                    const VhostShadowVirtqueueMapOps *map_ops,
+                                    void *map_ops_opaque)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -641,6 +645,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
     svq->ops = ops;
+    svq->map_ops = map_ops;
+    svq->map_ops_opaque = map_ops_opaque;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7677b337e6..e6ef944e23 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -392,6 +392,22 @@ static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
     return ret;
 }
 
+static int vhost_vdpa_svq_map(hwaddr iova, hwaddr size, void *vaddr,
+                              bool readonly, void *opaque)
+{
+    return vhost_vdpa_dma_map(opaque, iova, size, vaddr, readonly);
+}
+
+static int vhost_vdpa_svq_unmap(hwaddr iova, hwaddr size, void *opaque)
+{
+    return vhost_vdpa_dma_unmap(opaque, iova, size);
+}
+
+static const VhostShadowVirtqueueMapOps vhost_vdpa_svq_map_ops = {
+    .map = vhost_vdpa_svq_map,
+    .unmap = vhost_vdpa_svq_unmap,
+};
+
 static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
                                Error **errp)
 {
@@ -419,7 +435,9 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
         g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
-                                                            v->shadow_vq_ops);
+                                                       v->shadow_vq_ops,
+                                                       &vhost_vdpa_svq_map_ops,
+                                                       v);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 07/21] vhost: move descriptor translation to vhost_svq_vring_write_descs
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 06/21] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 08/21] vhost: Add SVQElement Eugenio Pérez
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

It's done for both in and out descriptors so it's better placed here.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index a6a8e403ea..2d5d27d29c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -122,17 +122,35 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
     return true;
 }
 
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-                                    const struct iovec *iovec, size_t num,
-                                    bool more_descs, bool write)
+/**
+ * Write descriptors to SVQ vring
+ *
+ * @svq: The shadow virtqueue
+ * @sg: Cache for hwaddr
+ * @iovec: The iovec from the guest
+ * @num: iovec length
+ * @more_descs: True if more descriptors come in the chain
+ * @write: True if they are in descriptors
+ *
+ * Return true if success, false otherwise and print error.
+ */
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+                                        const struct iovec *iovec, size_t num,
+                                        bool more_descs, bool write)
 {
     uint16_t i = svq->free_head, last = svq->free_head;
     unsigned n;
     uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
     vring_desc_t *descs = svq->vring.desc;
+    bool ok;
 
     if (num == 0) {
-        return;
+        return true;
+    }
+
+    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+    if (unlikely(!ok)) {
+        return false;
     }
 
     for (n = 0; n < num; n++) {
@@ -150,6 +168,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     }
 
     svq->free_head = le16_to_cpu(svq->desc_next[last]);
+    return true;
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -169,21 +188,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
         return false;
     }
 
-    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+                                     elem->in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
-    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-                            elem->in_num > 0, false);
-
 
-    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
+                                     true);
     if (unlikely(!ok)) {
         return false;
     }
 
-    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-
     /*
      * Put the entry in the available array (but don't update avail->idx until
      * they do sync).
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 08/21] vhost: Add SVQElement
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 07/21] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 09/21] vhost: Add svq copy desc mode Eugenio Pérez
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

This allows SVQ to add metadata to the different queue elements.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  8 ++++--
 hw/virtio/vhost-shadow-virtqueue.c | 46 ++++++++++++++++--------------
 2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 50f45153c0..e06ac52158 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,10 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQElement {
+    VirtQueueElement elem;
+} SVQElement;
+
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
                                          const VirtQueueElement *elem);
 
@@ -64,10 +68,10 @@ typedef struct VhostShadowVirtqueue {
     VhostIOVATree *iova_tree;
 
     /* Map for use the guest's descriptors */
-    VirtQueueElement **ring_id_maps;
+    SVQElement **ring_id_maps;
 
     /* Next VirtQueue element that guest made available */
-    VirtQueueElement *next_guest_avail_elem;
+    SVQElement *next_guest_avail_elem;
 
     /*
      * Backup next field for each descriptor so we can recover securely, not
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 2d5d27d29c..044005ba89 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -171,9 +171,10 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     return true;
 }
 
-static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                VirtQueueElement *elem, unsigned *head)
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
+                                unsigned *head)
 {
+    const VirtQueueElement *elem = &svq_elem->elem;
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
     bool ok;
@@ -222,7 +223,7 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
  * takes ownership of the element: In case of failure, it is free and the SVQ
  * is considered broken.
  */
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
 {
     unsigned qemu_head;
     bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
@@ -272,19 +273,21 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
         virtio_queue_set_notification(svq->vq, false);
 
         while (true) {
+            SVQElement *svq_elem;
             VirtQueueElement *elem;
             bool ok;
 
             if (svq->next_guest_avail_elem) {
-                elem = g_steal_pointer(&svq->next_guest_avail_elem);
+                svq_elem = g_steal_pointer(&svq->next_guest_avail_elem);
             } else {
-                elem = virtqueue_pop(svq->vq, sizeof(*elem));
+                svq_elem = virtqueue_pop(svq->vq, sizeof(*svq_elem));
             }
 
-            if (!elem) {
+            if (!svq_elem) {
                 break;
             }
 
+            elem = &svq_elem->elem;
             if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
                 /*
                  * This condition is possible since a contiguous buffer in GPA
@@ -297,11 +300,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                  * queue the current guest descriptor and ignore further kicks
                  * until some elements are used.
                  */
-                svq->next_guest_avail_elem = elem;
+                svq->next_guest_avail_elem = svq_elem;
                 return;
             }
 
-            ok = vhost_svq_add(svq, elem);
+            ok = vhost_svq_add(svq, svq_elem);
             if (unlikely(!ok)) {
                 /* VQ is broken, just return and ignore any other kicks */
                 return;
@@ -368,8 +371,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
     return i;
 }
 
-static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
-                                           uint32_t *len)
+static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
 {
     const vring_used_t *used = svq->vring.used;
     vring_used_elem_t used_elem;
@@ -399,8 +401,8 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
         return NULL;
     }
 
-    num = svq->ring_id_maps[used_elem.id]->in_num +
-          svq->ring_id_maps[used_elem.id]->out_num;
+    num = svq->ring_id_maps[used_elem.id]->elem.in_num +
+          svq->ring_id_maps[used_elem.id]->elem.out_num;
     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
     svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
@@ -421,11 +423,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
         vhost_svq_disable_notification(svq);
         while (true) {
             uint32_t len;
-            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
-            if (!elem) {
+            g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq, &len);
+            VirtQueueElement *elem;
+            if (!svq_elem) {
                 break;
             }
 
+            elem = &svq_elem->elem;
             if (svq->ops && svq->ops->used_elem_handler) {
                 svq->ops->used_elem_handler(svq->vdev, elem);
             }
@@ -580,7 +584,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
     memset(svq->vring.used, 0, device_size);
-    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+    svq->ring_id_maps = g_new0(SVQElement *, svq->vring.num);
     svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
         svq->desc_next[i] = cpu_to_le16(i + 1);
@@ -594,7 +598,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
 void vhost_svq_stop(VhostShadowVirtqueue *svq)
 {
     event_notifier_set_handler(&svq->svq_kick, NULL);
-    g_autofree VirtQueueElement *next_avail_elem = NULL;
+    g_autofree SVQElement *next_avail_elem = NULL;
 
     if (!svq->vq) {
         return;
@@ -604,16 +608,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
     vhost_svq_flush(svq, false);
 
     for (unsigned i = 0; i < svq->vring.num; ++i) {
-        g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->ring_id_maps[i]);
-        if (elem) {
-            virtqueue_detach_element(svq->vq, elem, 0);
+        g_autofree SVQElement *svq_elem = NULL;
+        svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
+        if (svq_elem) {
+            virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
         }
     }
 
     next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
     if (next_avail_elem) {
-        virtqueue_detach_element(svq->vq, next_avail_elem, 0);
+        virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
     }
     svq->vq = NULL;
     g_free(svq->desc_next);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 09/21] vhost: Add svq copy desc mode
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 08/21] vhost: Add SVQElement Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-08  4:14   ` Jason Wang
  2022-05-19 19:12 ` [RFC PATCH v8 10/21] vhost: Add vhost_svq_inject Eugenio Pérez
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Enable SVQ to not to forward the descriptor translating its address to
qemu's IOVA but copying to a region outside of the guest.

Virtio-net control VQ will use this mode, so we don't need to send all
the guest's memory every time there is a change, but only on messages.
Reversely, CVQ will only have access to control messages.  This lead to
less messing with memory listeners.

We could also try to send only the required translation by message, but
this presents a problem when many control messages occupy the same
guest's memory region.

Lastly, this allows us to inject messages from QEMU to the device in a
simple manner.  CVQ should be used rarely and with small messages, so all
the drawbacks should be assumible.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  10 ++
 include/hw/virtio/vhost-vdpa.h     |   1 +
 hw/virtio/vhost-shadow-virtqueue.c | 174 +++++++++++++++++++++++++++--
 hw/virtio/vhost-vdpa.c             |   1 +
 net/vhost-vdpa.c                   |   1 +
 5 files changed, 175 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index e06ac52158..79cb2d301f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,12 @@
 
 typedef struct SVQElement {
     VirtQueueElement elem;
+
+    /* SVQ IOVA address of in buffer and out buffer if cloned */
+    hwaddr in_iova, out_iova;
+
+    /* Length of in buffer */
+    size_t in_len;
 } SVQElement;
 
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
@@ -102,6 +108,9 @@ typedef struct VhostShadowVirtqueue {
 
     /* Next head to consume from the device */
     uint16_t last_used_idx;
+
+    /* Copy each descriptor to QEMU iova */
+    bool copy_descs;
 } VhostShadowVirtqueue;
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
@@ -119,6 +128,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
                                     const VhostShadowVirtqueueOps *ops,
+                                    bool copy_descs,
                                     const VhostShadowVirtqueueMapOps *map_ops,
                                     void *map_ops_opaque);
 
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index f1ba46a860..dc2884eea4 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -33,6 +33,7 @@ typedef struct vhost_vdpa {
     struct vhost_vdpa_iova_range iova_range;
     uint64_t acked_features;
     bool shadow_vqs_enabled;
+    bool svq_copy_descs;
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 044005ba89..5a8feb1cbc 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -16,6 +16,7 @@
 #include "qemu/log.h"
 #include "qemu/memalign.h"
 #include "linux-headers/linux/vhost.h"
+#include "qemu/iov.h"
 
 /**
  * Validate the transport device features that both guests can use with the SVQ
@@ -70,6 +71,30 @@ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
     return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
 }
 
+static void vhost_svq_alloc_buffer(void **base, size_t *len,
+                                   const struct iovec *iov, size_t num,
+                                   bool write)
+{
+    *len = iov_size(iov, num);
+    size_t buf_size = ROUND_UP(*len, 4096);
+
+    if (!num) {
+        return;
+    }
+
+    /*
+     * Linearize element. If guest had a descriptor chain, we expose the device
+     * a single buffer.
+     */
+    *base = qemu_memalign(4096, buf_size);
+    if (!write) {
+        iov_to_buf(iov, num, 0, *base, *len);
+        memset(*base + *len, 0, buf_size - *len);
+    } else {
+        memset(*base, 0, *len);
+    }
+}
+
 /**
  * Translate addresses between the qemu's virtual address and the SVQ IOVA
  *
@@ -126,7 +151,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
  * Write descriptors to SVQ vring
  *
  * @svq: The shadow virtqueue
+ * @svq_elem: The shadow virtqueue element
  * @sg: Cache for hwaddr
+ * @descs_len: Total written buffer if svq->copy_descs.
  * @iovec: The iovec from the guest
  * @num: iovec length
  * @more_descs: True if more descriptors come in the chain
@@ -134,7 +161,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
  *
  * Return true if success, false otherwise and print error.
  */
-static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
+                                        SVQElement *svq_elem, hwaddr *sg,
+                                        size_t *descs_len,
                                         const struct iovec *iovec, size_t num,
                                         bool more_descs, bool write)
 {
@@ -142,18 +171,41 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     unsigned n;
     uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
     vring_desc_t *descs = svq->vring.desc;
-    bool ok;
-
     if (num == 0) {
         return true;
     }
 
-    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
-    if (unlikely(!ok)) {
-        return false;
+    if (svq->copy_descs) {
+        void *buf;
+        DMAMap map = {};
+        int r;
+
+        vhost_svq_alloc_buffer(&buf, descs_len, iovec, num, write);
+        map.translated_addr = (hwaddr)(uintptr_t)buf;
+        map.size = ROUND_UP(*descs_len, 4096) - 1;
+        map.perm = write ? IOMMU_RW : IOMMU_RO,
+        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
+        if (unlikely(r != IOVA_OK)) {
+            error_report("Cannot map injected element");
+            return false;
+        }
+
+        r = svq->map_ops->map(map.iova, map.size + 1,
+                              (void *)map.translated_addr, !write,
+                              svq->map_ops_opaque);
+        /* TODO: Handle error */
+        assert(r == 0);
+        num = 1;
+        sg[0] = map.iova;
+    } else {
+        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+        if (unlikely(!ok)) {
+            return false;
+        }
     }
 
     for (n = 0; n < num; n++) {
+        uint32_t len = svq->copy_descs ? *descs_len : iovec[n].iov_len;
         if (more_descs || (n + 1 < num)) {
             descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
             descs[i].next = cpu_to_le16(svq->desc_next[i]);
@@ -161,7 +213,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
             descs[i].flags = flags;
         }
         descs[i].addr = cpu_to_le64(sg[n]);
-        descs[i].len = cpu_to_le32(iovec[n].iov_len);
+        descs[i].len = cpu_to_le32(len);
 
         last = i;
         i = cpu_to_le16(svq->desc_next[i]);
@@ -178,7 +230,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
     bool ok;
-    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+    g_autofree hwaddr *sgs = NULL;
+    hwaddr *in_sgs, *out_sgs;
 
     *head = svq->free_head;
 
@@ -189,15 +242,24 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+    if (!svq->copy_descs) {
+        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+        in_sgs = out_sgs = sgs;
+    } else {
+        in_sgs = &svq_elem->in_iova;
+        out_sgs = &svq_elem->out_iova;
+    }
+    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, (size_t[]){},
+                                     elem->out_sg, elem->out_num,
                                      elem->in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
-                                     true);
+    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, &svq_elem->in_len,
+                                     elem->in_sg, elem->in_num, false, true);
     if (unlikely(!ok)) {
+        /* TODO unwind out_sg */
         return false;
     }
 
@@ -276,6 +338,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
             SVQElement *svq_elem;
             VirtQueueElement *elem;
             bool ok;
+            uint32_t needed_slots;
 
             if (svq->next_guest_avail_elem) {
                 svq_elem = g_steal_pointer(&svq->next_guest_avail_elem);
@@ -288,7 +351,8 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
             }
 
             elem = &svq_elem->elem;
-            if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
+            needed_slots = svq->copy_descs ? 1 : elem->out_num + elem->in_num;
+            if (needed_slots > vhost_svq_available_slots(svq)) {
                 /*
                  * This condition is possible since a contiguous buffer in GPA
                  * does not imply a contiguous buffer in qemu's VA
@@ -411,6 +475,76 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
     return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
 }
 
+/**
+ * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
+ *
+ * @svq: Shadow VirtQueue
+ * @iova: SVQ IO Virtual address of descriptor
+ * @iov: Optional iovec to store device writable buffer
+ * @iov_cnt: iov length
+ * @buf_len: Length written by the device
+ *
+ * Print error message in case of error
+ */
+static bool vhost_svq_unmap_iov(VhostShadowVirtqueue *svq, hwaddr iova,
+                                const struct iovec *iov, size_t iov_cnt,
+                                size_t buf_len)
+{
+    DMAMap needle = {
+        /*
+         * No need to specify size since contiguous iova chunk was allocated
+         * by SVQ.
+         */
+        .iova = iova,
+    };
+    const DMAMap *map = vhost_iova_tree_find(svq->iova_tree, &needle);
+    int r;
+
+    if (!map) {
+        error_report("Cannot locate expected map");
+        return false;
+    }
+
+    r = svq->map_ops->unmap(map->iova, map->size + 1, svq->map_ops_opaque);
+    if (unlikely(r != 0)) {
+        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
+        return false;
+    }
+
+    if (iov) {
+        iov_from_buf(iov, iov_cnt, 0, (const void *)map->translated_addr, buf_len);
+    }
+    qemu_vfree((void *)map->translated_addr);
+    vhost_iova_tree_remove(svq->iova_tree, &needle);
+    return true;
+}
+
+/**
+ * Unmap shadow virtqueue element
+ *
+ * @svq_elem: Shadow VirtQueue Element
+ * @copy_in: Copy in buffer to the element at unmapping
+ */
+static bool vhost_svq_unmap_elem(VhostShadowVirtqueue *svq, SVQElement *svq_elem, uint32_t len, bool copy_in)
+{
+    VirtQueueElement *elem = &svq_elem->elem;
+    const struct iovec *in_iov = copy_in ? elem->in_sg : NULL;
+    size_t in_count = copy_in ? elem->in_num : 0;
+    if (elem->out_num) {
+        bool ok = vhost_svq_unmap_iov(svq, svq_elem->out_iova, NULL, 0, 0);
+        if (unlikely(!ok)) {
+            return false;
+        }
+    }
+
+    if (elem->in_num) {
+        return vhost_svq_unmap_iov(svq, svq_elem->in_iova, in_iov, in_count,
+                                   len);
+    }
+
+    return true;
+}
+
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                             bool check_for_avail_queue)
 {
@@ -429,6 +563,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 break;
             }
 
+            if (svq->copy_descs) {
+                bool ok = vhost_svq_unmap_elem(svq, svq_elem, len, true);
+                if (unlikely(!ok)) {
+                    return;
+                }
+            }
+
             elem = &svq_elem->elem;
             if (svq->ops && svq->ops->used_elem_handler) {
                 svq->ops->used_elem_handler(svq->vdev, elem);
@@ -611,12 +752,18 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
         g_autofree SVQElement *svq_elem = NULL;
         svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
         if (svq_elem) {
+            if (svq->copy_descs) {
+                vhost_svq_unmap_elem(svq, svq_elem, 0, false);
+            }
             virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
         }
     }
 
     next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
     if (next_avail_elem) {
+        if (svq->copy_descs) {
+            vhost_svq_unmap_elem(svq, next_avail_elem, 0, false);
+        }
         virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
     }
     svq->vq = NULL;
@@ -632,6 +779,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  *
  * @iova_tree: Tree to perform descriptors translations
  * @ops: SVQ operations hooks
+ * @copy_descs: Copy each descriptor to QEMU iova
  * @map_ops: SVQ mapping operation hooks
  * @map_ops_opaque: Opaque data to pass to mapping operations
  *
@@ -641,6 +789,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  */
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
                                     const VhostShadowVirtqueueOps *ops,
+                                    bool copy_descs,
                                     const VhostShadowVirtqueueMapOps *map_ops,
                                     void *map_ops_opaque)
 {
@@ -665,6 +814,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
     svq->ops = ops;
+    svq->copy_descs = copy_descs;
     svq->map_ops = map_ops;
     svq->map_ops_opaque = map_ops_opaque;
     return g_steal_pointer(&svq);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index e6ef944e23..31b3d4d013 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -436,6 +436,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
         g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
                                                        v->shadow_vq_ops,
+                                                       v->svq_copy_descs,
                                                        &vhost_vdpa_svq_map_ops,
                                                        v);
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ef12fc284c..174fec5e77 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -254,6 +254,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.index = queue_pair_index;
     if (!is_datapath) {
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+        s->vhost_vdpa.svq_copy_descs = true;
     }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 10/21] vhost: Add vhost_svq_inject
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 09/21] vhost: Add svq copy desc mode Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 11/21] vhost: Update kernel headers Eugenio Pérez
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

This allows qemu to inject buffers to the device without guest's notice.

This will be use to inject net CVQ messages to restore status in the
destination.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  5 +++
 hw/virtio/vhost-shadow-virtqueue.c | 72 +++++++++++++++++++++++++-----
 2 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 79cb2d301f..8fe0367944 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -23,6 +23,9 @@ typedef struct SVQElement {
 
     /* Length of in buffer */
     size_t in_len;
+
+    /* Buffer has been injected by QEMU, not by the guest */
+    bool not_from_guest;
 } SVQElement;
 
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
@@ -115,6 +118,8 @@ typedef struct VhostShadowVirtqueue {
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+                     size_t out_num, size_t in_num);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 5a8feb1cbc..c535c99905 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -312,6 +312,43 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
     event_notifier_set(&svq->hdev_kick);
 }
 
+/**
+ * Inject a chain of buffers to the device
+ *
+ * @svq: Shadow VirtQueue
+ * @iov: Descriptors buffer
+ * @out_num: Number of out elements
+ * @in_num: Number of in elements
+ */
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+                     size_t out_num, size_t in_num)
+{
+    SVQElement *svq_elem;
+    uint16_t num_slots = (in_num ? 1 : 0) + (out_num ? 1 : 0);
+
+    /*
+     * To inject buffers in a SVQ that does not copy descriptors is not
+     * supported. All vhost_svq_inject calls are controlled by qemu so we won't
+     * hit these assertions.
+     */
+    assert(svq->copy_descs);
+    assert(num_slots > 0);
+
+    if (unlikely(svq->next_guest_avail_elem)) {
+        error_report("Injecting in a full queue");
+        return -ENOMEM;
+    }
+
+    svq_elem = virtqueue_alloc_element(sizeof(*svq_elem), out_num, in_num);
+    iov_copy(svq_elem->elem.in_sg, in_num, iov + out_num, in_num, 0, SIZE_MAX);
+    iov_copy(svq_elem->elem.out_sg, out_num, iov, out_num, 0, SIZE_MAX);
+    svq_elem->not_from_guest = true;
+    vhost_svq_add(svq, svq_elem);
+    vhost_svq_kick(svq);
+
+    return 0;
+}
+
 /**
  * Forward available buffers.
  *
@@ -350,6 +387,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 break;
             }
 
+            svq_elem->not_from_guest = false;
             elem = &svq_elem->elem;
             needed_slots = svq->copy_descs ? 1 : elem->out_num + elem->in_num;
             if (needed_slots > vhost_svq_available_slots(svq)) {
@@ -575,19 +613,24 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 svq->ops->used_elem_handler(svq->vdev, elem);
             }
 
-            if (unlikely(i >= svq->vring.num)) {
-                qemu_log_mask(LOG_GUEST_ERROR,
-                         "More than %u used buffers obtained in a %u size SVQ",
-                         i, svq->vring.num);
-                virtqueue_fill(vq, elem, len, i);
-                virtqueue_flush(vq, i);
-                return;
+            if (!svq_elem->not_from_guest) {
+                if (unlikely(i >= svq->vring.num)) {
+                    qemu_log_mask(
+                        LOG_GUEST_ERROR,
+                        "More than %u used buffers obtained in a %u size SVQ",
+                        i, svq->vring.num);
+                    virtqueue_fill(vq, elem, len, i);
+                    virtqueue_flush(vq, i);
+                    return;
+                }
+                virtqueue_fill(vq, elem, len, i++);
             }
-            virtqueue_fill(vq, elem, len, i++);
         }
 
-        virtqueue_flush(vq, i);
-        event_notifier_set(&svq->svq_call);
+        if (i > 0) {
+            virtqueue_flush(vq, i);
+            event_notifier_set(&svq->svq_call);
+        }
 
         if (check_for_avail_queue && svq->next_guest_avail_elem) {
             /*
@@ -755,7 +798,10 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
             if (svq->copy_descs) {
                 vhost_svq_unmap_elem(svq, svq_elem, 0, false);
             }
-            virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
+
+            if (!svq_elem->not_from_guest) {
+                virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
+            }
         }
     }
 
@@ -764,7 +810,9 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
         if (svq->copy_descs) {
             vhost_svq_unmap_elem(svq, next_avail_elem, 0, false);
         }
-        virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
+        if (!next_avail_elem->not_from_guest) {
+            virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
+        }
     }
     svq->vq = NULL;
     g_free(svq->desc_next);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 11/21] vhost: Update kernel headers
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 10/21] vhost: Add vhost_svq_inject Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-08  4:18   ` Jason Wang
  2022-05-19 19:12 ` [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/standard-headers/linux/vhost_types.h | 11 ++++++++-
 linux-headers/linux/vhost.h                  | 25 ++++++++++++++++----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
index 0bd2684a2a..ce78551b0f 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -87,7 +87,7 @@ struct vhost_msg {
 
 struct vhost_msg_v2 {
 	uint32_t type;
-	uint32_t reserved;
+	uint32_t asid;
 	union {
 		struct vhost_iotlb_msg iotlb;
 		uint8_t padding[64];
@@ -153,4 +153,13 @@ struct vhost_vdpa_iova_range {
 /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
 #define VHOST_NET_F_VIRTIO_NET_HDR 27
 
+/* Use message type V2 */
+#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
+/* IOTLB can accept batching hints */
+#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
+/* IOTLB can accept address space identifier through V2 type of IOTLB
+ * message
+ */
+#define VHOST_BACKEND_F_IOTLB_ASID  0x3
+
 #endif
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index 5d99e7c242..d42eb46efd 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -89,11 +89,6 @@
 
 /* Set or get vhost backend capability */
 
-/* Use message type V2 */
-#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
-/* IOTLB can accept batching hints */
-#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
-
 #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
 #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
 
@@ -154,6 +149,26 @@
 /* Get the config size */
 #define VHOST_VDPA_GET_CONFIG_SIZE	_IOR(VHOST_VIRTIO, 0x79, __u32)
 
+/* Get the number of virtqueue groups. */
+#define VHOST_VDPA_GET_GROUP_NUM	_IOR(VHOST_VIRTIO, 0x7A, unsigned int)
+
+/* Get the number of address spaces. */
+#define VHOST_VDPA_GET_AS_NUM		_IOR(VHOST_VIRTIO, 0x7B, unsigned int)
+
+/* Get the group for a virtqueue: read index, write group in num,
+ * The virtqueue index is stored in the index field of
+ * vhost_vring_state. The group for this specific virtqueue is
+ * returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_GROUP	_IOWR(VHOST_VIRTIO, 0x7C,	\
+					      struct vhost_vring_state)
+/* Set the ASID for a virtqueue group. The group index is stored in
+ * the index field of vhost_vring_state, the ASID associated with this
+ * group is stored at num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_SET_GROUP_ASID	_IOW(VHOST_VIRTIO, 0x7D, \
+					     struct vhost_vring_state)
+
 /* Get the count of all virtqueues */
 #define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (10 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 11/21] vhost: Update kernel headers Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-08  4:20   ` Jason Wang
  2022-05-19 19:12 ` [RFC PATCH v8 13/21] vhost: Add ShadowVirtQueueStart operation Eugenio Pérez
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

To restore the device in the destination of a live migration we send the
commands through control virtqueue. For a device to read CVQ it must
have received DRIVER_OK status bit.

However this open a window where the device could start receiving
packets in rx queue 0 before it receive the RSS configuration. To avoid
that, we will not send vring_enable until all configuration is used by
the device.

As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 31b3d4d013..13e5e2a061 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -748,13 +748,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
+/**
+ * Set ready all vring of the device
+ *
+ * @dev: Vhost device
+ */
 static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
-    for (i = 0; i < dev->nvqs; ++i) {
+    for (i = 0; i < dev->vq_index_end; ++i) {
         struct vhost_vring_state state = {
-            .index = dev->vq_index + i,
+            .index = i,
             .num = 1,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
@@ -1117,7 +1122,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
     } else {
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
@@ -1131,16 +1135,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     }
 
     if (started) {
+        int r;
         memory_listener_register(&v->listener, &address_space_memory);
-        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+        if (unlikely(r)) {
+            return r;
+        }
+        vhost_vdpa_set_vring_ready(dev);
     } else {
         vhost_vdpa_reset_device(dev);
         vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                    VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
 
-        return 0;
     }
+
+    return 0;
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 13/21] vhost: Add ShadowVirtQueueStart operation
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (11 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-05-19 19:12 ` [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

It allows to run commands at SVQ start.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
 hw/virtio/vhost-vdpa.c             | 14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 8fe0367944..3c55fe2641 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -28,10 +28,14 @@ typedef struct SVQElement {
     bool not_from_guest;
 } SVQElement;
 
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
+                                    struct vhost_dev *dev);
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
                                          const VirtQueueElement *elem);
 
 typedef struct VhostShadowVirtqueueOps {
+    ShadowVirtQueueStart start;
     VirtQueueElementCallback used_elem_handler;
 } VhostShadowVirtqueueOps;
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 13e5e2a061..eec6d544e9 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1141,6 +1141,20 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(r)) {
             return r;
         }
+
+        if (v->shadow_vqs_enabled) {
+            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                              i);
+                if (svq->ops && svq->ops->start) {
+                    r = svq->ops->start(svq, dev);
+                    if (unlikely(r)) {
+                        return r;
+                    }
+                }
+            }
+        }
+
         vhost_vdpa_set_vring_ready(dev);
     } else {
         vhost_vdpa_reset_device(dev);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (12 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 13/21] vhost: Add ShadowVirtQueueStart operation Eugenio Pérez
@ 2022-05-19 19:12 ` Eugenio Pérez
  2022-06-08  4:25   ` Jason Wang
  2022-05-19 19:13 ` [RFC PATCH v8 15/21] vhost: add vhost_svq_poll Eugenio Pérez
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

CVQ needs to be in its own group, not shared with any data vq. Enable
the checking of it here, before introducing address space id concepts.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost.h |  2 +
 hw/net/vhost_net.c        |  4 +-
 hw/virtio/vhost-vdpa.c    | 79 ++++++++++++++++++++++++++++++++++++++-
 hw/virtio/trace-events    |  1 +
 4 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index b291fe4e24..cebec1d817 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -84,6 +84,8 @@ struct vhost_dev {
     int vq_index_end;
     /* if non-zero, minimum required value for max_queues */
     int num_queues;
+    /* Must be a vq group different than any other vhost dev */
+    bool independent_vq_group;
     uint64_t features;
     uint64_t acked_features;
     uint64_t backend_features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..1c2386c01c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -339,14 +339,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < nvhosts; i++) {
+        bool cvq_idx = i >= data_queue_pairs;
 
-        if (i < data_queue_pairs) {
+        if (!cvq_idx) {
             peer = qemu_get_peer(ncs, i);
         } else { /* Control Virtqueue */
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
 
         net = get_vhost_net(peer);
+        net->dev.independent_vq_group = !!cvq_idx;
         vhost_net_set_vq_index(net, i * 2, index_end);
 
         /* Suppress the masking guest notifiers on vhost user
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index eec6d544e9..52dd8baa8d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -685,7 +685,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
     uint64_t features;
     uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
     int r;
 
     if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
@@ -1110,6 +1111,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
     return true;
 }
 
+static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
+                                      struct vhost_vring_state *state)
+{
+    int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
+    trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
+    return ret;
+}
+
+static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    struct vhost_vring_state this_vq_group = {
+        .index = dev->vq_index,
+    };
+    int ret;
+
+    if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
+        return true;
+    }
+
+    if (!v->shadow_vqs_enabled) {
+        return true;
+    }
+
+    ret = vhost_vdpa_get_vring_group(dev, &this_vq_group);
+    if (unlikely(ret)) {
+        goto call_err;
+    }
+
+    for (int i = 1; i < dev->nvqs; ++i) {
+        struct vhost_vring_state vq_group = {
+            .index = dev->vq_index + i,
+        };
+
+        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
+        if (unlikely(ret)) {
+            goto call_err;
+        }
+        if (unlikely(vq_group.num != this_vq_group.num)) {
+            error_report("VQ %d group is different than VQ %d one",
+                         this_vq_group.index, vq_group.index);
+            return false;
+        }
+    }
+
+    for (int i = 0; i < dev->vq_index_end; ++i) {
+        struct vhost_vring_state vq_group = {
+            .index = i,
+        };
+
+        if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
+            continue;
+        }
+
+        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
+        if (unlikely(ret)) {
+            goto call_err;
+        }
+        if (unlikely(vq_group.num == this_vq_group.num)) {
+            error_report("VQ %d group is the same as VQ %d one",
+                         this_vq_group.index, vq_group.index);
+            return false;
+        }
+    }
+
+    return true;
+
+call_err:
+    error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
+    return false;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
@@ -1118,6 +1191,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 
     if (started) {
         vhost_vdpa_host_notifiers_init(dev);
+        if (dev->independent_vq_group &&
+            !vhost_dev_is_independent_group(dev)) {
+            return -1;
+        }
         ok = vhost_vdpa_svqs_start(dev);
         if (unlikely(!ok)) {
             return -1;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index ab8e095b73..ffb8eb26e7 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -46,6 +46,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
 vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
 vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
 vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
+vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
 vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 15/21] vhost: add vhost_svq_poll
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (13 preceding siblings ...)
  2022-05-19 19:12 ` [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-05-19 19:13 ` [RFC PATCH v8 16/21] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

It allows the Shadow Control VirtQueue to wait the device to use the commands
that restore the net device state after a live migration.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 57 +++++++++++++++++++++++++++---
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 3c55fe2641..20ca59e9a7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -124,6 +124,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
 int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
                      size_t out_num, size_t in_num);
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index c535c99905..831ffb71e5 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -10,6 +10,8 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
+#include <glib/gpoll.h>
+
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
@@ -583,10 +585,11 @@ static bool vhost_svq_unmap_elem(VhostShadowVirtqueue *svq, SVQElement *svq_elem
     return true;
 }
 
-static void vhost_svq_flush(VhostShadowVirtqueue *svq,
-                            bool check_for_avail_queue)
+static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
+                              bool check_for_avail_queue)
 {
     VirtQueue *vq = svq->vq;
+    size_t ret = 0;
 
     /* Forward as many used buffers as possible. */
     do {
@@ -604,7 +607,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
             if (svq->copy_descs) {
                 bool ok = vhost_svq_unmap_elem(svq, svq_elem, len, true);
                 if (unlikely(!ok)) {
-                    return;
+                    return ret;
                 }
             }
 
@@ -621,10 +624,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                         i, svq->vring.num);
                     virtqueue_fill(vq, elem, len, i);
                     virtqueue_flush(vq, i);
-                    return;
+                    return ret + 1;
                 }
                 virtqueue_fill(vq, elem, len, i++);
             }
+
+            ret++;
         }
 
         if (i > 0) {
@@ -640,6 +645,50 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
             vhost_handle_guest_kick(svq);
         }
     } while (!vhost_svq_enable_notification(svq));
+
+    return ret;
+}
+
+/**
+ * Poll the SVQ for device used buffers.
+ *
+ * This function race with main event loop SVQ polling, so extra
+ * syncthronization is needed.
+ *
+ * Return the number of descriptors read from the device.
+ */
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq)
+{
+    int fd = event_notifier_get_fd(&svq->hdev_call);
+    GPollFD poll_fd = {
+        .fd = fd,
+        .events = G_IO_IN,
+    };
+    assert(fd >= 0);
+    int r = g_poll(&poll_fd, 1, -1);
+
+    if (unlikely(r < 0)) {
+        error_report("Cannot poll device call fd "G_POLLFD_FORMAT": (%d) %s",
+                     poll_fd.fd, errno, g_strerror(errno));
+        return -errno;
+    }
+
+    if (r == 0) {
+        return 0;
+    }
+
+    if (unlikely(poll_fd.revents & ~(G_IO_IN))) {
+        error_report(
+            "Error polling device call fd "G_POLLFD_FORMAT": revents=%d",
+            poll_fd.fd, poll_fd.revents);
+        return -1;
+    }
+
+    /*
+     * Max return value of vhost_svq_flush is (uint16_t)-1, so it's safe to
+     * convert to ssize_t.
+     */
+    return vhost_svq_flush(svq, false);
 }
 
 /**
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 16/21] vdpa: Add vhost_vdpa_start_control_svq
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (14 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 15/21] vhost: add vhost_svq_poll Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-05-19 19:13 ` [RFC PATCH v8 17/21] vdpa: Add asid attribute to vdpa device Eugenio Pérez
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

As a first step we only enable CVQ first than others. Future patches add
state restore.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 174fec5e77..a66f73ff63 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -188,6 +188,66 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
+                                        struct vhost_dev *dev)
+{
+    struct vhost_vring_state state = {
+        .index = virtio_get_queue_index(svq->vq),
+        .num = 1,
+    };
+    struct vhost_vdpa *v = dev->opaque;
+    VirtIONet *n = VIRTIO_NET(dev->vdev);
+    uint64_t features = dev->vdev->host_features;
+    int r;
+    size_t num = 0;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+    r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, &state);
+    if (r < 0) {
+        return -errno;
+    }
+
+    if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+        const struct virtio_net_ctrl_hdr ctrl = {
+            .class = VIRTIO_NET_CTRL_MAC,
+            .cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+        };
+        uint8_t mac[6];
+        virtio_net_ctrl_ack ack;
+        const struct iovec data[] = {
+            {
+                .iov_base = (void *)&ctrl,
+                .iov_len = sizeof(ctrl),
+            },{
+                .iov_base = mac,
+                .iov_len = sizeof(mac),
+            },{
+                .iov_base = &ack,
+                .iov_len = sizeof(ack),
+            }
+        };
+
+        memcpy(mac, n->mac, sizeof(mac));
+        r = vhost_svq_inject(svq, data, 2, 1);
+        if (unlikely(r)) {
+            return r;
+        }
+        num++;
+    }
+
+    while (num) {
+        /*
+         * We can call vhost_svq_poll here because BQL protects calls to run.
+         */
+        size_t used = vhost_svq_poll(svq);
+        assert(used <= num);
+        num -= used;
+    }
+
+    return 0;
+}
+
 static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
                                        const VirtQueueElement *elem)
 {
@@ -226,6 +286,7 @@ static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
 
 static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
     .used_elem_handler = vhost_vdpa_net_handle_ctrl,
+    .start = vhost_vdpa_start_control_svq,
 };
 
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 17/21] vdpa: Add asid attribute to vdpa device
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (15 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 16/21] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-05-19 19:13 ` [RFC PATCH v8 18/21] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

We can configure ASID per group, but we still use asid 0 for every vdpa
device. Multiple asid support for cvq will be introduced in next
patches

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost.h |  1 +
 hw/net/vhost_net.c        |  1 +
 hw/virtio/vhost-vdpa.c    | 71 +++++++++++++++++++++++++++++++++++----
 hw/virtio/trace-events    |  9 ++---
 4 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index cebec1d817..eadaf055f0 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -84,6 +84,7 @@ struct vhost_dev {
     int vq_index_end;
     /* if non-zero, minimum required value for max_queues */
     int num_queues;
+    uint32_t address_space_id;
     /* Must be a vq group different than any other vhost dev */
     bool independent_vq_group;
     uint64_t features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 1c2386c01c..4d79d622f7 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -348,6 +348,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         }
 
         net = get_vhost_net(peer);
+        net->dev.address_space_id = !!cvq_idx;
         net->dev.independent_vq_group = !!cvq_idx;
         vhost_net_set_vq_index(net, i * 2, index_end);
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 52dd8baa8d..0208e36589 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -79,14 +79,18 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
     int ret = 0;
 
     msg.type = v->msg_type;
+    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
     msg.iotlb.iova = iova;
     msg.iotlb.size = size;
     msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
     msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
     msg.iotlb.type = VHOST_IOTLB_UPDATE;
 
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
-                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+                             msg.iotlb.type);
 
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -104,12 +108,15 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
     int fd = v->device_fd;
     int ret = 0;
 
+    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
     msg.type = v->msg_type;
     msg.iotlb.iova = iova;
     msg.iotlb.size = size;
     msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
 
-    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
+    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
                                msg.iotlb.size, msg.iotlb.type);
 
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
@@ -123,13 +130,19 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
 
 static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
 {
+    struct vhost_dev *dev = v->dev;
     int fd = v->device_fd;
     struct vhost_msg_v2 msg = {
         .type = v->msg_type,
         .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
     };
 
-    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
+    if (dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
+
+    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.asid,
+                                          msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -161,10 +174,14 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
         return;
     }
 
+    if (dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
+
     msg.type = v->msg_type;
     msg.iotlb.type = VHOST_IOTLB_BATCH_END;
-
-    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.asid,
+                                     msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -1183,10 +1200,48 @@ call_err:
     return false;
 }
 
+static int vhost_vdpa_set_vq_group_address_space_id(struct vhost_dev *dev,
+                                                struct vhost_vring_state *asid)
+{
+    trace_vhost_vdpa_set_vq_group_address_space_id(dev, asid->index, asid->num);
+    return vhost_vdpa_call(dev, VHOST_VDPA_SET_GROUP_ASID, asid);
+}
+
+static int vhost_vdpa_set_address_space_id(struct vhost_dev *dev)
+{
+    struct vhost_vring_state vq_group = {
+        .index = dev->vq_index,
+    };
+    struct vhost_vring_state asid;
+    int ret;
+
+    if (!dev->address_space_id) {
+        return 0;
+    }
+
+    ret = vhost_vdpa_get_vring_group(dev, &vq_group);
+    if (unlikely(ret)) {
+        error_report("Can't read vq group, errno=%d (%s)", ret,
+                     g_strerror(-ret));
+        return ret;
+    }
+
+    asid.index = vq_group.num;
+    asid.num = dev->address_space_id;
+    ret = vhost_vdpa_set_vq_group_address_space_id(dev, &asid);
+    if (unlikely(ret)) {
+        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
+            asid.index, asid.num, ret, g_strerror(-ret));
+    }
+    return ret;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
     bool ok;
+    int r = 0;
+
     trace_vhost_vdpa_dev_start(dev, started);
 
     if (started) {
@@ -1195,6 +1250,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
             !vhost_dev_is_independent_group(dev)) {
             return -1;
         }
+        r = vhost_vdpa_set_address_space_id(dev);
+        if (unlikely(r)) {
+            return r;
+        }
         ok = vhost_vdpa_svqs_start(dev);
         if (unlikely(!ok)) {
             return -1;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index ffb8eb26e7..67adad8610 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -26,10 +26,10 @@ vhost_user_write(uint32_t req, uint32_t flags) "req:%d flags:0x%"PRIx32""
 vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
 
 # vhost-vdpa.c
-vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
-vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
-vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
-vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
+vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
+vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
+vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
 vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
 vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
@@ -47,6 +47,7 @@ vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
 vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
 vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
 vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
+vhost_vdpa_set_vq_group_address_space_id(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
 vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 18/21] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (16 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 17/21] vdpa: Add asid attribute to vdpa device Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-05-19 19:13 ` [RFC PATCH v8 19/21] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a66f73ff63..8960b8db74 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -325,20 +325,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+    int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+    if (ret) {
+        error_setg_errno(errp, errno,
+                         "Fail to query features from vhost-vDPA device");
+    }
+    return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+                                          int *has_cvq, Error **errp)
 {
     unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
     g_autofree struct vhost_vdpa_config *config = NULL;
     __virtio16 *max_queue_pairs;
-    uint64_t features;
     int ret;
 
-    ret = ioctl(fd, VHOST_GET_FEATURES, &features);
-    if (ret) {
-        error_setg(errp, "Fail to query features from vhost-vDPA device");
-        return ret;
-    }
-
     if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
         *has_cvq = 1;
     } else {
@@ -368,10 +372,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp)
 {
     const NetdevVhostVDPAOptions *opts;
+    uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
-    int queue_pairs, i, has_cvq = 0;
+    int queue_pairs, r, i, has_cvq = 0;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -385,7 +390,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return -errno;
     }
 
-    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+    r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
+    if (r) {
+        return r;
+    }
+
+    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
                                                  &has_cvq, errp);
     if (queue_pairs < 0) {
         qemu_close(vdpa_device_fd);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 19/21] vhost: Add reference counting to vhost_iova_tree
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (17 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 18/21] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-05-19 19:13 ` [RFC PATCH v8 20/21] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Now that different vqs can have different ASIDs its easier to track them
using reference counters.

QEMU's glib version still does not have them so we've copied g_rc_box,
so the implementation can be converted to glib's one when the minimum
version is raised.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  5 +++--
 hw/virtio/vhost-iova-tree.c | 21 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 1ffcdc5b57..bacd17d99c 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -16,8 +16,9 @@
 typedef struct VhostIOVATree VhostIOVATree;
 
 VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
-void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree);
+void vhost_iova_tree_release(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_release);
 
 const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
                                    const DMAMap *map);
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 1a59894385..208476b3db 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -28,6 +28,9 @@ struct VhostIOVATree {
 
     /* IOVA address to qemu memory maps. */
     IOVATree *iova_taddr_map;
+
+    /* Reference count */
+    size_t refcnt;
 };
 
 /**
@@ -44,14 +47,28 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
     tree->iova_last = iova_last;
 
     tree->iova_taddr_map = iova_tree_new();
+    tree->refcnt = 1;
     return tree;
 }
 
 /**
- * Delete an iova tree
+ * Increases the reference count of the iova tree
+ */
+VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree)
+{
+    ++iova_tree->refcnt;
+    return iova_tree;
+}
+
+/**
+ * Decrease reference counter of iova tree, freeing if it reaches 0
  */
-void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
+void vhost_iova_tree_release(VhostIOVATree *iova_tree)
 {
+    if (--iova_tree->refcnt) {
+        return;
+    }
+
     iova_tree_destroy(iova_tree->iova_taddr_map);
     g_free(iova_tree);
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 20/21] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (18 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 19/21] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-05-19 19:13 ` [RFC PATCH v8 21/21] vdpa: Add x-cvq-svq Eugenio Pérez
  2022-06-08  5:51 ` [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Jason Wang
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |  9 ++++++++-
 net/vhost-vdpa.c | 38 +++++++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index d6f7cfd4d6..cd7a1b32fe 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,19 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #          (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+#         (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
     '*vhostdev':     'str',
-    '*queues':       'int' } }
+    '*queues':       'int',
+    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 8960b8db74..ef8c82f92e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -129,6 +129,7 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+    g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_release);
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -188,6 +189,14 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static int vhost_vdpa_get_iova_range(int fd,
+                                     struct vhost_vdpa_iova_range *iova_range)
+{
+    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+    return ret < 0 ? -errno : 0;
+}
+
 static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
                                         struct vhost_dev *dev)
 {
@@ -295,7 +304,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            int vdpa_device_fd,
                                            int queue_pair_index,
                                            int nvqs,
-                                           bool is_datapath)
+                                           bool is_datapath,
+                                           bool svq,
+                                           VhostIOVATree *iova_tree)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -313,12 +324,18 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
+                              NULL;
     if (!is_datapath) {
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
         s->vhost_vdpa.svq_copy_descs = true;
     }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
+        if (iova_tree) {
+            vhost_iova_tree_release(iova_tree);
+        }
         qemu_del_net_client(nc);
         return NULL;
     }
@@ -377,6 +394,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, r, i, has_cvq = 0;
+    g_autoptr(VhostIOVATree) iova_tree = NULL;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -401,19 +419,31 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         qemu_close(vdpa_device_fd);
         return queue_pairs;
     }
+    if (opts->x_svq) {
+        struct vhost_vdpa_iova_range iova_range;
+
+        if (has_cvq) {
+            error_setg(errp, "vdpa svq does not work with cvq");
+            goto err_svq;
+        }
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    }
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true);
+                                     vdpa_device_fd, i, 2, true, opts->x_svq,
+                                     iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false);
+                                 vdpa_device_fd, i, 1, false, opts->x_svq,
+                                 iova_tree);
         if (!nc)
             goto err;
     }
@@ -426,6 +456,8 @@ err:
             qemu_del_net_client(ncs[i]);
         }
     }
+
+err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v8 21/21] vdpa: Add x-cvq-svq
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (19 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 20/21] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
@ 2022-05-19 19:13 ` Eugenio Pérez
  2022-06-08  5:51 ` [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Jason Wang
  21 siblings, 0 replies; 51+ messages in thread
From: Eugenio Pérez @ 2022-05-19 19:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Jason Wang, Parav Pandit

This isolates shadow cvq in its own group.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |   8 ++-
 net/vhost-vdpa.c | 134 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 133 insertions(+), 9 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index cd7a1b32fe..f5b047ae15 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -447,9 +447,12 @@
 #
 # @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
 #         (default: false)
+# @x-cvq-svq: Start device with (experimental) shadow virtqueue in its own
+#             virtqueue group. (Since 7.1)
+#             (default: false)
 #
 # Features:
-# @unstable: Member @x-svq is experimental.
+# @unstable: Members @x-svq and x-cvq-svq are experimental.
 #
 # Since: 5.1
 ##
@@ -457,7 +460,8 @@
   'data': {
     '*vhostdev':     'str',
     '*queues':       'int',
-    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
+    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] },
+    '*x-cvq-svq':    {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ef8c82f92e..ad006a2bf3 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -70,6 +70,30 @@ const int vdpa_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+    BIT_ULL(VIRTIO_NET_F_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
+    BIT_ULL(VIRTIO_NET_F_MTU) |
+    BIT_ULL(VIRTIO_NET_F_MAC) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+    BIT_ULL(VIRTIO_NET_F_STATUS) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+    BIT_ULL(VIRTIO_NET_F_MQ) |
+    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+    BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -352,6 +376,17 @@ static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
     return ret;
 }
 
+static int vhost_vdpa_get_backend_features(int fd, uint64_t *features,
+                                           Error **errp)
+{
+    int ret = ioctl(fd, VHOST_GET_BACKEND_FEATURES, features);
+    if (ret) {
+        error_setg_errno(errp, errno,
+            "Fail to query backend features from vhost-vDPA device");
+    }
+    return ret;
+}
+
 static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
                                           int *has_cvq, Error **errp)
 {
@@ -385,16 +420,56 @@ static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
     return 1;
 }
 
+/**
+ * Check vdpa device to support CVQ group asid 1
+ *
+ * @vdpa_device_fd: Vdpa device fd
+ * @queue_pairs: Queue pairs
+ * @errp: Error
+ */
+static int vhost_vdpa_check_cvq_svq(int vdpa_device_fd, int queue_pairs,
+                                    Error **errp)
+{
+    uint64_t backend_features;
+    unsigned num_as;
+    int r;
+
+    r = vhost_vdpa_get_backend_features(vdpa_device_fd, &backend_features,
+                                        errp);
+    if (unlikely(r)) {
+        return -1;
+    }
+
+    if (unlikely(!(backend_features & VHOST_BACKEND_F_IOTLB_ASID))) {
+        error_setg(errp, "Device without IOTLB_ASID feature");
+        return -1;
+    }
+
+    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
+    if (unlikely(r)) {
+        error_setg_errno(errp, errno,
+                         "Cannot retrieve number of supported ASs");
+        return -1;
+    }
+    if (unlikely(num_as < 2)) {
+        error_setg(errp, "Insufficient number of ASs (%u, min: 2)", num_as);
+    }
+
+    return 0;
+}
+
 int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp)
 {
     const NetdevVhostVDPAOptions *opts;
+    struct vhost_vdpa_iova_range iova_range;
     uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, r, i, has_cvq = 0;
     g_autoptr(VhostIOVATree) iova_tree = NULL;
+    ERRP_GUARD();
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -419,14 +494,35 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         qemu_close(vdpa_device_fd);
         return queue_pairs;
     }
-    if (opts->x_svq) {
-        struct vhost_vdpa_iova_range iova_range;
+    if (opts->x_cvq_svq || opts->x_svq) {
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+
+        uint64_t invalid_dev_features =
+            features & ~vdpa_svq_device_features &
+            /* Transport are all accepted at this point */
+            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
 
-        if (has_cvq) {
-            error_setg(errp, "vdpa svq does not work with cvq");
+        if (invalid_dev_features) {
+            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
+                       invalid_dev_features);
             goto err_svq;
         }
-        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+    }
+
+    if (opts->x_cvq_svq) {
+        if (!has_cvq) {
+            error_setg(errp, "Cannot use x-cvq-svq with a device without cvq");
+            goto err_svq;
+        }
+
+        r = vhost_vdpa_check_cvq_svq(vdpa_device_fd, queue_pairs, errp);
+        if (unlikely(r)) {
+            error_prepend(errp, "Cannot configure CVQ SVQ: ");
+            goto err_svq;
+        }
+    }
+    if (opts->x_svq) {
         iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
     }
 
@@ -441,11 +537,35 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     }
 
     if (has_cvq) {
+        g_autoptr(VhostIOVATree) cvq_iova_tree = NULL;
+
+        if (opts->x_cvq_svq) {
+            cvq_iova_tree = vhost_iova_tree_new(iova_range.first,
+                                                iova_range.last);
+        } else if (opts->x_svq) {
+            cvq_iova_tree = vhost_iova_tree_acquire(iova_tree);
+        }
+
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false, opts->x_svq,
-                                 iova_tree);
+                                 vdpa_device_fd, i, 1, false,
+                                 opts->x_cvq_svq || opts->x_svq,
+                                 cvq_iova_tree);
         if (!nc)
             goto err;
+
+        if (opts->x_cvq_svq) {
+            struct vhost_vring_state asid = {
+                .index = 1,
+                .num = 1,
+            };
+
+            r = ioctl(vdpa_device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
+            if (unlikely(r)) {
+                error_setg_errno(errp, errno,
+                                 "Cannot set cvq group independent asid");
+                goto err;
+            }
+        }
     }
 
     return 0;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue
  2022-05-19 19:12 ` [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
@ 2022-06-07  6:05   ` Jason Wang
  2022-06-08 16:38     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-07  6:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> Introduce the control virtqueue support for vDPA shadow virtqueue. This
> is needed for advanced networking features like multiqueue.
>
> To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
> VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
> support and virtio-net driver changes MAC or the number of queues
> virtio-net device model will be updated with the new one.
>
> Others cvq commands could be added here straightforwardly but they have
> been not tested.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 44 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index df1e69ee72..ef12fc284c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -11,6 +11,7 @@
>   
>   #include "qemu/osdep.h"
>   #include "clients.h"
> +#include "hw/virtio/virtio-net.h"
>   #include "net/vhost_net.h"
>   #include "net/vhost-vdpa.h"
>   #include "hw/virtio/vhost-vdpa.h"
> @@ -187,6 +188,46 @@ static NetClientInfo net_vhost_vdpa_info = {
>           .check_peer_type = vhost_vdpa_check_peer_type,
>   };
>   
> +static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
> +                                       const VirtQueueElement *elem)
> +{
> +    struct virtio_net_ctrl_hdr ctrl;
> +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> +    size_t s;
> +    struct iovec in = {
> +        .iov_base = &status,
> +        .iov_len = sizeof(status),
> +    };
> +
> +    s = iov_to_buf(elem->out_sg, elem->out_num, 0, &ctrl, sizeof(ctrl.class));
> +    if (s != sizeof(ctrl.class)) {
> +        return;
> +    }
> +
> +    switch (ctrl.class) {
> +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> +    case VIRTIO_NET_CTRL_MQ:
> +        break;
> +    default:
> +        return;
> +    };


I think we can probably remove the whitelist here since it is expected 
to work for any kind of command?

Thanks


> +
> +    s = iov_to_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
> +    if (s != sizeof(status) || status != VIRTIO_NET_OK) {
> +        return;
> +    }
> +
> +    status = VIRTIO_NET_ERR;
> +    virtio_net_handle_ctrl_iov(vdev, &in, 1, elem->out_sg, elem->out_num);
> +    if (status != VIRTIO_NET_OK) {
> +        error_report("Bad CVQ processing in model");
> +    }
> +}
> +
> +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> +    .used_elem_handler = vhost_vdpa_net_handle_ctrl,
> +};
> +
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                              const char *device,
>                                              const char *name,
> @@ -211,6 +252,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
> +    if (!is_datapath) {
> +        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> +    }
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {
>           qemu_del_net_client(nc);



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 02/21] vhost: Add custom used buffer callback
  2022-05-19 19:12 ` [RFC PATCH v8 02/21] vhost: Add custom used buffer callback Eugenio Pérez
@ 2022-06-07  6:12   ` Jason Wang
  2022-06-08 19:38     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-07  6:12 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> The callback allows SVQ users to know the VirtQueue requests and
> responses. QEMU can use this to synchronize virtio device model state,
> allowing to migrate it with minimum changes to the migration code.
>
> In the case of networking, this will be used to inspect control
> virtqueue messages.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 16 +++++++++++++++-
>   include/hw/virtio/vhost-vdpa.h     |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c |  9 ++++++++-
>   hw/virtio/vhost-vdpa.c             |  3 ++-
>   4 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index c132c994e9..6593f07db3 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,13 @@
>   #include "standard-headers/linux/vhost_types.h"
>   #include "hw/virtio/vhost-iova-tree.h"
>   
> +typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> +                                         const VirtQueueElement *elem);


Nit: I wonder if something like "VirtQueueCallback" is sufficient (e.g 
kernel use "callback" directly)


> +
> +typedef struct VhostShadowVirtqueueOps {
> +    VirtQueueElementCallback used_elem_handler;
> +} VhostShadowVirtqueueOps;
> +
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
>       /* Shadow vring */
> @@ -59,6 +66,12 @@ typedef struct VhostShadowVirtqueue {
>        */
>       uint16_t *desc_next;
>   
> +    /* Optional callbacks */
> +    const VhostShadowVirtqueueOps *ops;


Can we merge map_ops to ops?


> +
> +    /* Optional custom used virtqueue element handler */
> +    VirtQueueElementCallback used_elem_cb;


This seems not used in this series.

Thanks


> +
>       /* Next head to expose to the device */
>       uint16_t shadow_avail_idx;
>   
> @@ -85,7 +98,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>                        VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
> +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> +                                    const VhostShadowVirtqueueOps *ops);
>   
>   void vhost_svq_free(gpointer vq);
>   G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index a29dbb3f53..f1ba46a860 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -17,6 +17,7 @@
>   #include "hw/virtio/vhost-iova-tree.h"
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   typedef struct VhostVDPAHostNotifier {
>       MemoryRegion mr;
> @@ -35,6 +36,7 @@ typedef struct vhost_vdpa {
>       /* IOVA mapping used by the Shadow Virtqueue */
>       VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
> +    const VhostShadowVirtqueueOps *shadow_vq_ops;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 56c96ebd13..167db8be45 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -410,6 +410,10 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                   break;
>               }
>   
> +            if (svq->ops && svq->ops->used_elem_handler) {
> +                svq->ops->used_elem_handler(svq->vdev, elem);
> +            }
> +
>               if (unlikely(i >= svq->vring.num)) {
>                   qemu_log_mask(LOG_GUEST_ERROR,
>                            "More than %u used buffers obtained in a %u size SVQ",
> @@ -607,12 +611,14 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>    * shadow methods and file descriptors.
>    *
>    * @iova_tree: Tree to perform descriptors translations
> + * @ops: SVQ operations hooks
>    *
>    * Returns the new virtqueue or NULL.
>    *
>    * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
> +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> +                                    const VhostShadowVirtqueueOps *ops)
>   {
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
> @@ -634,6 +640,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
>       event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       svq->iova_tree = iova_tree;
> +    svq->ops = ops;
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 66f054a12c..7677b337e6 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -418,7 +418,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>   
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
> +        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
> +                                                            v->shadow_vq_ops);
>   
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic
  2022-05-19 19:12 ` [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
@ 2022-06-07  6:13   ` Jason Wang
  2022-06-08 16:30     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-07  6:13 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> This allows external vhost-net devices to modify the state of the
> VirtIO device model once vhost-vdpa device has acknowledge the control
> commands.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/virtio-net.h |  4 ++
>   hw/net/virtio-net.c            | 84 ++++++++++++++++++++--------------
>   2 files changed, 53 insertions(+), 35 deletions(-)
>
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index eb87032627..cd31b7f67d 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -218,6 +218,10 @@ struct VirtIONet {
>       struct EBPFRSSContext ebpf_rss;
>   };
>   
> +unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
> +                                    const struct iovec *in_sg, size_t in_num,
> +                                    const struct iovec *out_sg,
> +                                    unsigned out_num);
>   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>                                      const char *type);
>   
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7ad948ee7c..0e350154ec 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1434,57 +1434,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>       return VIRTIO_NET_OK;
>   }
>   
> -static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> +unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,


Should we use size_t here?

Thanks


> +                                    const struct iovec *in_sg, size_t in_num,
> +                                    const struct iovec *out_sg,
> +                                    unsigned out_num)
>   {
>       VirtIONet *n = VIRTIO_NET(vdev);
>       struct virtio_net_ctrl_hdr ctrl;
>       virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> -    VirtQueueElement *elem;
>       size_t s;
>       struct iovec *iov, *iov2;
> -    unsigned int iov_cnt;
> +
> +    if (iov_size(in_sg, in_num) < sizeof(status) ||
> +        iov_size(out_sg, out_num) < sizeof(ctrl)) {
> +        virtio_error(vdev, "virtio-net ctrl missing headers");
> +        return 0;
> +    }
> +
> +    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
> +    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
> +    iov_discard_front(&iov, &out_num, sizeof(ctrl));
> +    if (s != sizeof(ctrl)) {
> +        status = VIRTIO_NET_ERR;
> +    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
> +        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
> +    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
> +        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
> +    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> +        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
> +    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> +        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
> +    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> +        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
> +    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> +        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
> +    }
> +
> +    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
> +    assert(s == sizeof(status));
> +
> +    g_free(iov2);
> +    return sizeof(status);
> +}
> +
> +static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtQueueElement *elem;
>   
>       for (;;) {
> +        unsigned written;
>           elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>           if (!elem) {
>               break;
>           }
> -        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
> -            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
> -            virtio_error(vdev, "virtio-net ctrl missing headers");
> +
> +        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
> +                                             elem->out_sg, elem->out_num);
> +        if (written > 0) {
> +            virtqueue_push(vq, elem, written);
> +            virtio_notify(vdev, vq);
> +            g_free(elem);
> +        } else {
>               virtqueue_detach_element(vq, elem, 0);
>               g_free(elem);
>               break;
>           }
> -
> -        iov_cnt = elem->out_num;
> -        iov2 = iov = g_memdup2(elem->out_sg,
> -                               sizeof(struct iovec) * elem->out_num);
> -        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
> -        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
> -        if (s != sizeof(ctrl)) {
> -            status = VIRTIO_NET_ERR;
> -        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
> -            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
> -        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
> -            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
> -        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> -            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
> -        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> -            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
> -        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> -            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
> -        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> -            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
> -        }
> -
> -        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
> -        assert(s == sizeof(status));
> -
> -        virtqueue_push(vq, elem, sizeof(status));
> -        virtio_notify(vdev, vq);
> -        g_free(iov2);
> -        g_free(elem);
>       }
>   }
>   



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 09/21] vhost: Add svq copy desc mode
  2022-05-19 19:12 ` [RFC PATCH v8 09/21] vhost: Add svq copy desc mode Eugenio Pérez
@ 2022-06-08  4:14   ` Jason Wang
  2022-06-08 19:02     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-08  4:14 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> Enable SVQ to not to forward the descriptor translating its address to
> qemu's IOVA but copying to a region outside of the guest.
>
> Virtio-net control VQ will use this mode, so we don't need to send all
> the guest's memory every time there is a change, but only on messages.
> Reversely, CVQ will only have access to control messages.  This lead to
> less messing with memory listeners.
>
> We could also try to send only the required translation by message, but
> this presents a problem when many control messages occupy the same
> guest's memory region.
>
> Lastly, this allows us to inject messages from QEMU to the device in a
> simple manner.  CVQ should be used rarely and with small messages, so all
> the drawbacks should be assumible.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  10 ++
>   include/hw/virtio/vhost-vdpa.h     |   1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 174 +++++++++++++++++++++++++++--
>   hw/virtio/vhost-vdpa.c             |   1 +
>   net/vhost-vdpa.c                   |   1 +
>   5 files changed, 175 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index e06ac52158..79cb2d301f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -17,6 +17,12 @@
>   
>   typedef struct SVQElement {
>       VirtQueueElement elem;
> +
> +    /* SVQ IOVA address of in buffer and out buffer if cloned */
> +    hwaddr in_iova, out_iova;


It might worth to mention that we'd expect a single buffer here.


> +
> +    /* Length of in buffer */
> +    size_t in_len;
>   } SVQElement;
>   
>   typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> @@ -102,6 +108,9 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Next head to consume from the device */
>       uint16_t last_used_idx;
> +
> +    /* Copy each descriptor to QEMU iova */
> +    bool copy_descs;
>   } VhostShadowVirtqueue;
>   
>   bool vhost_svq_valid_features(uint64_t features, Error **errp);
> @@ -119,6 +128,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
>   VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
>                                       const VhostShadowVirtqueueOps *ops,
> +                                    bool copy_descs,
>                                       const VhostShadowVirtqueueMapOps *map_ops,
>                                       void *map_ops_opaque);
>   
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index f1ba46a860..dc2884eea4 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -33,6 +33,7 @@ typedef struct vhost_vdpa {
>       struct vhost_vdpa_iova_range iova_range;
>       uint64_t acked_features;
>       bool shadow_vqs_enabled;
> +    bool svq_copy_descs;
>       /* IOVA mapping used by the Shadow Virtqueue */
>       VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 044005ba89..5a8feb1cbc 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -16,6 +16,7 @@
>   #include "qemu/log.h"
>   #include "qemu/memalign.h"
>   #include "linux-headers/linux/vhost.h"
> +#include "qemu/iov.h"
>   
>   /**
>    * Validate the transport device features that both guests can use with the SVQ
> @@ -70,6 +71,30 @@ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>       return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
>   }
>   
> +static void vhost_svq_alloc_buffer(void **base, size_t *len,
> +                                   const struct iovec *iov, size_t num,
> +                                   bool write)
> +{
> +    *len = iov_size(iov, num);


Since this behavior is trigger able by the guest, we need an upper limit 
here.


> +    size_t buf_size = ROUND_UP(*len, 4096);


I see a kind of duplicated round up which is done in 
vhost_svq_write_descs().

Btw, should we use TARGET_PAGE_SIZE instead of the magic 4096 here?


> +
> +    if (!num) {
> +        return;
> +    }
> +
> +    /*
> +     * Linearize element. If guest had a descriptor chain, we expose the device
> +     * a single buffer.
> +     */
> +    *base = qemu_memalign(4096, buf_size);
> +    if (!write) {
> +        iov_to_buf(iov, num, 0, *base, *len);
> +        memset(*base + *len, 0, buf_size - *len);
> +    } else {
> +        memset(*base, 0, *len);
> +    }
> +}
> +
>   /**
>    * Translate addresses between the qemu's virtual address and the SVQ IOVA
>    *
> @@ -126,7 +151,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
>    * Write descriptors to SVQ vring
>    *
>    * @svq: The shadow virtqueue
> + * @svq_elem: The shadow virtqueue element
>    * @sg: Cache for hwaddr
> + * @descs_len: Total written buffer if svq->copy_descs.
>    * @iovec: The iovec from the guest
>    * @num: iovec length
>    * @more_descs: True if more descriptors come in the chain
> @@ -134,7 +161,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
>    *
>    * Return true if success, false otherwise and print error.
>    */
> -static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                        SVQElement *svq_elem, hwaddr *sg,
> +                                        size_t *descs_len,
>                                           const struct iovec *iovec, size_t num,
>                                           bool more_descs, bool write)
>   {
> @@ -142,18 +171,41 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>       unsigned n;
>       uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>       vring_desc_t *descs = svq->vring.desc;
> -    bool ok;
> -
>       if (num == 0) {
>           return true;
>       }
>   
> -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> -    if (unlikely(!ok)) {
> -        return false;
> +    if (svq->copy_descs) {
> +        void *buf;
> +        DMAMap map = {};
> +        int r;
> +
> +        vhost_svq_alloc_buffer(&buf, descs_len, iovec, num, write);
> +        map.translated_addr = (hwaddr)(uintptr_t)buf;
> +        map.size = ROUND_UP(*descs_len, 4096) - 1;
> +        map.perm = write ? IOMMU_RW : IOMMU_RO,
> +        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
> +        if (unlikely(r != IOVA_OK)) {
> +            error_report("Cannot map injected element");
> +            return false;
> +        }
> +
> +        r = svq->map_ops->map(map.iova, map.size + 1,
> +                              (void *)map.translated_addr, !write,
> +                              svq->map_ops_opaque);
> +        /* TODO: Handle error */
> +        assert(r == 0);
> +        num = 1;
> +        sg[0] = map.iova;


I think it would be simple if stick a simple logic of 
vhost_svq_vring_write_descs() here.

E.g we can move the above logic to the caller and it can simply prepare 
a dedicated in/out sg for the copied buffer.


> +    } else {
> +        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> +        if (unlikely(!ok)) {
> +            return false;
> +        }
>       }
>   
>       for (n = 0; n < num; n++) {
> +        uint32_t len = svq->copy_descs ? *descs_len : iovec[n].iov_len;
>           if (more_descs || (n + 1 < num)) {
>               descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
>               descs[i].next = cpu_to_le16(svq->desc_next[i]);
> @@ -161,7 +213,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>               descs[i].flags = flags;
>           }
>           descs[i].addr = cpu_to_le64(sg[n]);
> -        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +        descs[i].len = cpu_to_le32(len);
>   
>           last = i;
>           i = cpu_to_le16(svq->desc_next[i]);
> @@ -178,7 +230,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
>       unsigned avail_idx;
>       vring_avail_t *avail = svq->vring.avail;
>       bool ok;
> -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> +    g_autofree hwaddr *sgs = NULL;
> +    hwaddr *in_sgs, *out_sgs;
>   
>       *head = svq->free_head;
>   
> @@ -189,15 +242,24 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
>           return false;
>       }
>   
> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> +    if (!svq->copy_descs) {
> +        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> +        in_sgs = out_sgs = sgs;
> +    } else {
> +        in_sgs = &svq_elem->in_iova;
> +        out_sgs = &svq_elem->out_iova;
> +    }
> +    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, (size_t[]){},
> +                                     elem->out_sg, elem->out_num,
>                                        elem->in_num > 0, false);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> -                                     true);
> +    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, &svq_elem->in_len,
> +                                     elem->in_sg, elem->in_num, false, true);
>       if (unlikely(!ok)) {
> +        /* TODO unwind out_sg */
>           return false;
>       }
>   
> @@ -276,6 +338,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>               SVQElement *svq_elem;
>               VirtQueueElement *elem;
>               bool ok;
> +            uint32_t needed_slots;
>   
>               if (svq->next_guest_avail_elem) {
>                   svq_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> @@ -288,7 +351,8 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>               }
>   
>               elem = &svq_elem->elem;
> -            if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
> +            needed_slots = svq->copy_descs ? 1 : elem->out_num + elem->in_num;
> +            if (needed_slots > vhost_svq_available_slots(svq)) {
>                   /*
>                    * This condition is possible since a contiguous buffer in GPA
>                    * does not imply a contiguous buffer in qemu's VA
> @@ -411,6 +475,76 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
>       return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>   }
>   
> +/**
> + * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
> + *
> + * @svq: Shadow VirtQueue
> + * @iova: SVQ IO Virtual address of descriptor
> + * @iov: Optional iovec to store device writable buffer
> + * @iov_cnt: iov length
> + * @buf_len: Length written by the device
> + *
> + * Print error message in case of error
> + */
> +static bool vhost_svq_unmap_iov(VhostShadowVirtqueue *svq, hwaddr iova,
> +                                const struct iovec *iov, size_t iov_cnt,
> +                                size_t buf_len)
> +{
> +    DMAMap needle = {
> +        /*
> +         * No need to specify size since contiguous iova chunk was allocated
> +         * by SVQ.
> +         */
> +        .iova = iova,
> +    };
> +    const DMAMap *map = vhost_iova_tree_find(svq->iova_tree, &needle);
> +    int r;
> +
> +    if (!map) {
> +        error_report("Cannot locate expected map");
> +        return false;
> +    }
> +
> +    r = svq->map_ops->unmap(map->iova, map->size + 1, svq->map_ops_opaque);
> +    if (unlikely(r != 0)) {
> +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> +        return false;
> +    }
> +
> +    if (iov) {
> +        iov_from_buf(iov, iov_cnt, 0, (const void *)map->translated_addr, buf_len);
> +    }
> +    qemu_vfree((void *)map->translated_addr);
> +    vhost_iova_tree_remove(svq->iova_tree, &needle);
> +    return true;
> +}
> +
> +/**
> + * Unmap shadow virtqueue element
> + *
> + * @svq_elem: Shadow VirtQueue Element
> + * @copy_in: Copy in buffer to the element at unmapping
> + */
> +static bool vhost_svq_unmap_elem(VhostShadowVirtqueue *svq, SVQElement *svq_elem, uint32_t len, bool copy_in)
> +{
> +    VirtQueueElement *elem = &svq_elem->elem;
> +    const struct iovec *in_iov = copy_in ? elem->in_sg : NULL;
> +    size_t in_count = copy_in ? elem->in_num : 0;
> +    if (elem->out_num) {
> +        bool ok = vhost_svq_unmap_iov(svq, svq_elem->out_iova, NULL, 0, 0);
> +        if (unlikely(!ok)) {
> +            return false;
> +        }
> +    }
> +
> +    if (elem->in_num) {
> +        return vhost_svq_unmap_iov(svq, svq_elem->in_iova, in_iov, in_count,
> +                                   len);
> +    }
> +
> +    return true;
> +}
> +
>   static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                               bool check_for_avail_queue)
>   {
> @@ -429,6 +563,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                   break;
>               }
>   
> +            if (svq->copy_descs) {
> +                bool ok = vhost_svq_unmap_elem(svq, svq_elem, len, true);
> +                if (unlikely(!ok)) {
> +                    return;
> +                }
> +            }
> +
>               elem = &svq_elem->elem;
>               if (svq->ops && svq->ops->used_elem_handler) {
>                   svq->ops->used_elem_handler(svq->vdev, elem);
> @@ -611,12 +752,18 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>           g_autofree SVQElement *svq_elem = NULL;
>           svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
>           if (svq_elem) {
> +            if (svq->copy_descs) {
> +                vhost_svq_unmap_elem(svq, svq_elem, 0, false);
> +            }
>               virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
>           }
>       }
>   
>       next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>       if (next_avail_elem) {
> +        if (svq->copy_descs) {
> +            vhost_svq_unmap_elem(svq, next_avail_elem, 0, false);
> +        }
>           virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
>       }
>       svq->vq = NULL;
> @@ -632,6 +779,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>    *
>    * @iova_tree: Tree to perform descriptors translations
>    * @ops: SVQ operations hooks
> + * @copy_descs: Copy each descriptor to QEMU iova
>    * @map_ops: SVQ mapping operation hooks
>    * @map_ops_opaque: Opaque data to pass to mapping operations
>    *
> @@ -641,6 +789,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>    */
>   VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
>                                       const VhostShadowVirtqueueOps *ops,
> +                                    bool copy_descs,
>                                       const VhostShadowVirtqueueMapOps *map_ops,
>                                       void *map_ops_opaque)
>   {
> @@ -665,6 +814,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       svq->iova_tree = iova_tree;
>       svq->ops = ops;
> +    svq->copy_descs = copy_descs;
>       svq->map_ops = map_ops;
>       svq->map_ops_opaque = map_ops_opaque;
>       return g_steal_pointer(&svq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index e6ef944e23..31b3d4d013 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -436,6 +436,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
>           g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
>                                                          v->shadow_vq_ops,
> +                                                       v->svq_copy_descs,
>                                                          &vhost_vdpa_svq_map_ops,
>                                                          v);
>   
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index ef12fc284c..174fec5e77 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -254,6 +254,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.index = queue_pair_index;
>       if (!is_datapath) {
>           s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> +        s->vhost_vdpa.svq_copy_descs = true;
>       }
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {


So all these logic seems rather complicated, it might be better to think 
of a way to simplify the stuffs. The cause of the complexity is that we 
couple too many stuffs with SVQ.

I wonder if we can simply let control virtqueue end in userspace code 
where it has a full understanding of the semantic, then let it talks to 
the vhost-vdpa directly:

E.g in the case of mq setting, we will start form the 
virtio_net_handle_mq(). Where we can prepare cvq commands there and send 
them to vhost-vDPA networking backend where the cvq commands were mapped 
and submitted to the device?

Thanks



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 11/21] vhost: Update kernel headers
  2022-05-19 19:12 ` [RFC PATCH v8 11/21] vhost: Update kernel headers Eugenio Pérez
@ 2022-06-08  4:18   ` Jason Wang
  2022-06-08 19:04     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-08  4:18 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


It's better to use the helpers in scripts/ and mentioned to which 
version is this synced.

Thanks


>   include/standard-headers/linux/vhost_types.h | 11 ++++++++-
>   linux-headers/linux/vhost.h                  | 25 ++++++++++++++++----
>   2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
> index 0bd2684a2a..ce78551b0f 100644
> --- a/include/standard-headers/linux/vhost_types.h
> +++ b/include/standard-headers/linux/vhost_types.h
> @@ -87,7 +87,7 @@ struct vhost_msg {
>   
>   struct vhost_msg_v2 {
>   	uint32_t type;
> -	uint32_t reserved;
> +	uint32_t asid;
>   	union {
>   		struct vhost_iotlb_msg iotlb;
>   		uint8_t padding[64];
> @@ -153,4 +153,13 @@ struct vhost_vdpa_iova_range {
>   /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
>   #define VHOST_NET_F_VIRTIO_NET_HDR 27
>   
> +/* Use message type V2 */
> +#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
> +/* IOTLB can accept batching hints */
> +#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
> +/* IOTLB can accept address space identifier through V2 type of IOTLB
> + * message
> + */
> +#define VHOST_BACKEND_F_IOTLB_ASID  0x3
> +
>   #endif
> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> index 5d99e7c242..d42eb46efd 100644
> --- a/linux-headers/linux/vhost.h
> +++ b/linux-headers/linux/vhost.h
> @@ -89,11 +89,6 @@
>   
>   /* Set or get vhost backend capability */
>   
> -/* Use message type V2 */
> -#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
> -/* IOTLB can accept batching hints */
> -#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
> -
>   #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
>   #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
>   
> @@ -154,6 +149,26 @@
>   /* Get the config size */
>   #define VHOST_VDPA_GET_CONFIG_SIZE	_IOR(VHOST_VIRTIO, 0x79, __u32)
>   
> +/* Get the number of virtqueue groups. */
> +#define VHOST_VDPA_GET_GROUP_NUM	_IOR(VHOST_VIRTIO, 0x7A, unsigned int)
> +
> +/* Get the number of address spaces. */
> +#define VHOST_VDPA_GET_AS_NUM		_IOR(VHOST_VIRTIO, 0x7B, unsigned int)
> +
> +/* Get the group for a virtqueue: read index, write group in num,
> + * The virtqueue index is stored in the index field of
> + * vhost_vring_state. The group for this specific virtqueue is
> + * returned via num field of vhost_vring_state.
> + */
> +#define VHOST_VDPA_GET_VRING_GROUP	_IOWR(VHOST_VIRTIO, 0x7C,	\
> +					      struct vhost_vring_state)
> +/* Set the ASID for a virtqueue group. The group index is stored in
> + * the index field of vhost_vring_state, the ASID associated with this
> + * group is stored at num field of vhost_vring_state.
> + */
> +#define VHOST_VDPA_SET_GROUP_ASID	_IOW(VHOST_VIRTIO, 0x7D, \
> +					     struct vhost_vring_state)
> +
>   /* Get the count of all virtqueues */
>   #define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
>   



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK
  2022-05-19 19:12 ` [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
@ 2022-06-08  4:20   ` Jason Wang
  2022-06-08 19:06     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-08  4:20 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> To restore the device in the destination of a live migration we send the
> commands through control virtqueue. For a device to read CVQ it must
> have received DRIVER_OK status bit.
>
> However this open a window where the device could start receiving
> packets in rx queue 0 before it receive the RSS configuration. To avoid
> that, we will not send vring_enable until all configuration is used by
> the device.
>
> As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


I may miss something, but it looks to me this should be an independent 
patch or it should depend on live migration series.

Thanks


> ---
>   hw/virtio/vhost-vdpa.c | 20 +++++++++++++++-----
>   1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 31b3d4d013..13e5e2a061 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -748,13 +748,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>       return idx;
>   }
>   
> +/**
> + * Set ready all vring of the device
> + *
> + * @dev: Vhost device
> + */
>   static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>   {
>       int i;
>       trace_vhost_vdpa_set_vring_ready(dev);
> -    for (i = 0; i < dev->nvqs; ++i) {
> +    for (i = 0; i < dev->vq_index_end; ++i) {
>           struct vhost_vring_state state = {
> -            .index = dev->vq_index + i,
> +            .index = i,
>               .num = 1,
>           };
>           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> @@ -1117,7 +1122,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           if (unlikely(!ok)) {
>               return -1;
>           }
> -        vhost_vdpa_set_vring_ready(dev);
>       } else {
>           ok = vhost_vdpa_svqs_stop(dev);
>           if (unlikely(!ok)) {
> @@ -1131,16 +1135,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       }
>   
>       if (started) {
> +        int r;
>           memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        if (unlikely(r)) {
> +            return r;
> +        }
> +        vhost_vdpa_set_vring_ready(dev);
>       } else {
>           vhost_vdpa_reset_device(dev);
>           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                      VIRTIO_CONFIG_S_DRIVER);
>           memory_listener_unregister(&v->listener);
>   
> -        return 0;
>       }
> +
> +    return 0;
>   }
>   
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group
  2022-05-19 19:12 ` [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
@ 2022-06-08  4:25   ` Jason Wang
  2022-06-08 19:21     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-08  4:25 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> CVQ needs to be in its own group, not shared with any data vq. Enable
> the checking of it here, before introducing address space id concepts.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost.h |  2 +
>   hw/net/vhost_net.c        |  4 +-
>   hw/virtio/vhost-vdpa.c    | 79 ++++++++++++++++++++++++++++++++++++++-
>   hw/virtio/trace-events    |  1 +
>   4 files changed, 84 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index b291fe4e24..cebec1d817 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -84,6 +84,8 @@ struct vhost_dev {
>       int vq_index_end;
>       /* if non-zero, minimum required value for max_queues */
>       int num_queues;
> +    /* Must be a vq group different than any other vhost dev */
> +    bool independent_vq_group;


We probably need a better abstraction here.

E.g having a parent vhost_dev_group structure.


>       uint64_t features;
>       uint64_t acked_features;
>       uint64_t backend_features;
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index ccac5b7a64..1c2386c01c 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -339,14 +339,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>       }
>   
>       for (i = 0; i < nvhosts; i++) {
> +        bool cvq_idx = i >= data_queue_pairs;
>   
> -        if (i < data_queue_pairs) {
> +        if (!cvq_idx) {
>               peer = qemu_get_peer(ncs, i);
>           } else { /* Control Virtqueue */
>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>           }
>   
>           net = get_vhost_net(peer);
> +        net->dev.independent_vq_group = !!cvq_idx;
>           vhost_net_set_vq_index(net, i * 2, index_end);
>   
>           /* Suppress the masking guest notifiers on vhost user
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index eec6d544e9..52dd8baa8d 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -685,7 +685,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>   {
>       uint64_t features;
>       uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
>       int r;
>   
>       if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> @@ -1110,6 +1111,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
> +                                      struct vhost_vring_state *state)
> +{
> +    int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
> +    trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
> +    return ret;
> +}
> +
> +static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_state this_vq_group = {
> +        .index = dev->vq_index,
> +    };
> +    int ret;
> +
> +    if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
> +        return true;
> +    }


This should be false?


> +
> +    if (!v->shadow_vqs_enabled) {
> +        return true;
> +    }


And here?


> +
> +    ret = vhost_vdpa_get_vring_group(dev, &this_vq_group);
> +    if (unlikely(ret)) {
> +        goto call_err;
> +    }
> +
> +    for (int i = 1; i < dev->nvqs; ++i) {
> +        struct vhost_vring_state vq_group = {
> +            .index = dev->vq_index + i,
> +        };
> +
> +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> +        if (unlikely(ret)) {
> +            goto call_err;
> +        }
> +        if (unlikely(vq_group.num != this_vq_group.num)) {
> +            error_report("VQ %d group is different than VQ %d one",
> +                         this_vq_group.index, vq_group.index);


Not sure this is needed. The group id is not tied to vq index if I 
understand correctly.

E.g we have 1 qp with cvq, we can have

group 0 cvq

group 1 tx/rx

Thanks


> +            return false;
> +        }
> +    }
> +
> +    for (int i = 0; i < dev->vq_index_end; ++i) {
> +        struct vhost_vring_state vq_group = {
> +            .index = i,
> +        };
> +
> +        if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
> +            continue;
> +        }
> +
> +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> +        if (unlikely(ret)) {
> +            goto call_err;
> +        }
> +        if (unlikely(vq_group.num == this_vq_group.num)) {
> +            error_report("VQ %d group is the same as VQ %d one",
> +                         this_vq_group.index, vq_group.index);
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +
> +call_err:
> +    error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
> +    return false;
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1118,6 +1191,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
> +        if (dev->independent_vq_group &&
> +            !vhost_dev_is_independent_group(dev)) {
> +            return -1;
> +        }
>           ok = vhost_vdpa_svqs_start(dev);
>           if (unlikely(!ok)) {
>               return -1;
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index ab8e095b73..ffb8eb26e7 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -46,6 +46,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> +vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (20 preceding siblings ...)
  2022-05-19 19:13 ` [RFC PATCH v8 21/21] vdpa: Add x-cvq-svq Eugenio Pérez
@ 2022-06-08  5:51 ` Jason Wang
  2022-06-08 19:28   ` Eugenio Perez Martin
  21 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-08  5:51 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit


在 2022/5/20 03:12, Eugenio Pérez 写道:
> Control virtqueue is used by networking device for accepting various
> commands from the driver. It's a must to support multiqueue and other
> configurations.
>
> Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> states, effectively intercepting them so qemu can track what regions of memory
> are dirty because device action and needs migration. However, this does not
> solve networking device state seen by the driver because CVQ messages, like
> changes on MAC addresses from the driver.
>
> To solve that, this series uses SVQ infraestructure proposed to intercept
> networking control messages used by the device. This way, qemu is able to
> update VirtIONet device model and to migrate it.
>
> However, to intercept all queues would slow device data forwarding. To solve
> that, only the CVQ must be intercepted all the time. This is achieved using
> the ASID infraestructure, that allows different translations for different
> virtqueues. The most updated kernel part of ASID is proposed at [1].
>
> You can run qemu in two modes after applying this series: only intercepting
> cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
>
> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
>
> First three patches enable the update of the virtio-net device model for each
> CVQ message acknoledged by the device.
>
> Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> This allows simplyfing the memory mapping, instead of map all the guest's
> memory like in the data virtqueues.
>
> Patch 10 allows to inject control messages to the device. This allows to set
> state to the device both at QEMU startup and at live migration destination. In
> the future, this may also be used to emulate _F_ANNOUNCE.
>
> Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> because they are still not accepted in the kernel.
>
> Patches 12-16 enables the set of the features of the net device model to the
> vdpa device at device start.
>
> Last ones enables the sepparated ASID and SVQ.
>
> Comments are welcomed.


As discussed, I think we need to split this huge series into smaller ones:

1) shadow CVQ only, this makes rx-filter-event work
2) ASID support for CVQ

And for 1) we need consider whether or not it could be simplified.

Or do it in reverse order, since if we do 1) first, we may have security 
issues.

Thoughts?

Thanks


>
> TODO:
> * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
>    reason, blocking migration. This is tricky, since it can cause that the VM
>    cannot be migrated anymore, so some way of block it must be used.
> * Review failure paths, some are with TODO notes, other don't.
>
> Changes from rfc v7:
> * Don't map all guest space in ASID 1 but copy all the buffers. No need for
>    more memory listeners.
> * Move net backend start callback to SVQ.
> * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> * Changed ioctls, but they're provisional anyway.
> * Reorder commits so refactor and code adding ones are closer to usage.
> * Usual cleaning: better tracing, doc, patches messages, ...
>
> Changes from rfc v6:
> * Fix bad iotlb updates order when batching was enabled
> * Add reference counting to iova_tree so cleaning is simpler.
>
> Changes from rfc v5:
> * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
>
> Changes from rfc v4:
> * Add missing tracing
> * Add multiqueue support
> * Use already sent version for replacing g_memdup
> * Care with memory management
>
> Changes from rfc v3:
> * Fix bad returning of descriptors to SVQ list.
>
> Changes from rfc v2:
> * Fix use-after-free.
>
> Changes from rfc v1:
> * Rebase to latest master.
> * Configure ASID instead of assuming cvq asid != data vqs asid.
> * Update device model so (MAC) state can be migrated too.
>
> [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
>
> Eugenio Pérez (21):
>    virtio-net: Expose ctrl virtqueue logic
>    vhost: Add custom used buffer callback
>    vdpa: control virtqueue support on shadow virtqueue
>    virtio: Make virtqueue_alloc_element non-static
>    vhost: Add vhost_iova_tree_find
>    vdpa: Add map/unmap operation callback to SVQ
>    vhost: move descriptor translation to vhost_svq_vring_write_descs
>    vhost: Add SVQElement
>    vhost: Add svq copy desc mode
>    vhost: Add vhost_svq_inject
>    vhost: Update kernel headers
>    vdpa: delay set_vring_ready after DRIVER_OK
>    vhost: Add ShadowVirtQueueStart operation
>    vhost: Make possible to check for device exclusive vq group
>    vhost: add vhost_svq_poll
>    vdpa: Add vhost_vdpa_start_control_svq
>    vdpa: Add asid attribute to vdpa device
>    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
>    vhost: Add reference counting to vhost_iova_tree
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>    vdpa: Add x-cvq-svq
>
>   qapi/net.json                                |  13 +-
>   hw/virtio/vhost-iova-tree.h                  |   7 +-
>   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
>   include/hw/virtio/vhost-vdpa.h               |   3 +
>   include/hw/virtio/vhost.h                    |   3 +
>   include/hw/virtio/virtio-net.h               |   4 +
>   include/hw/virtio/virtio.h                   |   1 +
>   include/standard-headers/linux/vhost_types.h |  11 +-
>   linux-headers/linux/vhost.h                  |  25 +-
>   hw/net/vhost_net.c                           |   5 +-
>   hw/net/virtio-net.c                          |  84 +++--
>   hw/virtio/vhost-iova-tree.c                  |  35 +-
>   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
>   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
>   hw/virtio/virtio.c                           |   2 +-
>   net/vhost-vdpa.c                             | 294 ++++++++++++++-
>   hw/virtio/trace-events                       |  10 +-
>   17 files changed, 1012 insertions(+), 130 deletions(-)
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic
  2022-06-07  6:13   ` Jason Wang
@ 2022-06-08 16:30     ` Eugenio Perez Martin
  0 siblings, 0 replies; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 16:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 7, 2022 at 8:13 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > This allows external vhost-net devices to modify the state of the
> > VirtIO device model once vhost-vdpa device has acknowledge the control
> > commands.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/virtio-net.h |  4 ++
> >   hw/net/virtio-net.c            | 84 ++++++++++++++++++++--------------
> >   2 files changed, 53 insertions(+), 35 deletions(-)
> >
> > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> > index eb87032627..cd31b7f67d 100644
> > --- a/include/hw/virtio/virtio-net.h
> > +++ b/include/hw/virtio/virtio-net.h
> > @@ -218,6 +218,10 @@ struct VirtIONet {
> >       struct EBPFRSSContext ebpf_rss;
> >   };
> >
> > +unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
> > +                                    const struct iovec *in_sg, size_t in_num,
> > +                                    const struct iovec *out_sg,
> > +                                    unsigned out_num);
> >   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> >                                      const char *type);
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 7ad948ee7c..0e350154ec 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1434,57 +1434,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
> >       return VIRTIO_NET_OK;
> >   }
> >
> > -static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> > +unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
>
>
> Should we use size_t here?
>

I think it's a better type, yes. I used "unsigned" because
virtqueue_push uses unsigned for "len", maybe it's a good idea to
replace it there too.

Thanks!

> Thanks
>
>
> > +                                    const struct iovec *in_sg, size_t in_num,
> > +                                    const struct iovec *out_sg,
> > +                                    unsigned out_num)
> >   {
> >       VirtIONet *n = VIRTIO_NET(vdev);
> >       struct virtio_net_ctrl_hdr ctrl;
> >       virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > -    VirtQueueElement *elem;
> >       size_t s;
> >       struct iovec *iov, *iov2;
> > -    unsigned int iov_cnt;
> > +
> > +    if (iov_size(in_sg, in_num) < sizeof(status) ||
> > +        iov_size(out_sg, out_num) < sizeof(ctrl)) {
> > +        virtio_error(vdev, "virtio-net ctrl missing headers");
> > +        return 0;
> > +    }
> > +
> > +    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
> > +    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
> > +    iov_discard_front(&iov, &out_num, sizeof(ctrl));
> > +    if (s != sizeof(ctrl)) {
> > +        status = VIRTIO_NET_ERR;
> > +    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
> > +        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
> > +    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
> > +        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
> > +    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> > +        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
> > +    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> > +        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
> > +    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> > +        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
> > +    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> > +        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
> > +    }
> > +
> > +    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
> > +    assert(s == sizeof(status));
> > +
> > +    g_free(iov2);
> > +    return sizeof(status);
> > +}
> > +
> > +static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtQueueElement *elem;
> >
> >       for (;;) {
> > +        unsigned written;
> >           elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> >           if (!elem) {
> >               break;
> >           }
> > -        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
> > -            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
> > -            virtio_error(vdev, "virtio-net ctrl missing headers");
> > +
> > +        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
> > +                                             elem->out_sg, elem->out_num);
> > +        if (written > 0) {
> > +            virtqueue_push(vq, elem, written);
> > +            virtio_notify(vdev, vq);
> > +            g_free(elem);
> > +        } else {
> >               virtqueue_detach_element(vq, elem, 0);
> >               g_free(elem);
> >               break;
> >           }
> > -
> > -        iov_cnt = elem->out_num;
> > -        iov2 = iov = g_memdup2(elem->out_sg,
> > -                               sizeof(struct iovec) * elem->out_num);
> > -        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
> > -        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
> > -        if (s != sizeof(ctrl)) {
> > -            status = VIRTIO_NET_ERR;
> > -        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
> > -            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
> > -        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
> > -            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
> > -        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> > -            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
> > -        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> > -            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
> > -        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> > -            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
> > -        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> > -            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
> > -        }
> > -
> > -        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
> > -        assert(s == sizeof(status));
> > -
> > -        virtqueue_push(vq, elem, sizeof(status));
> > -        virtio_notify(vdev, vq);
> > -        g_free(iov2);
> > -        g_free(elem);
> >       }
> >   }
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue
  2022-06-07  6:05   ` Jason Wang
@ 2022-06-08 16:38     ` Eugenio Perez Martin
  0 siblings, 0 replies; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 16:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 7, 2022 at 8:05 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > Introduce the control virtqueue support for vDPA shadow virtqueue. This
> > is needed for advanced networking features like multiqueue.
> >
> > To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
> > VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
> > support and virtio-net driver changes MAC or the number of queues
> > virtio-net device model will be updated with the new one.
> >
> > Others cvq commands could be added here straightforwardly but they have
> > been not tested.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 44 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index df1e69ee72..ef12fc284c 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -11,6 +11,7 @@
> >
> >   #include "qemu/osdep.h"
> >   #include "clients.h"
> > +#include "hw/virtio/virtio-net.h"
> >   #include "net/vhost_net.h"
> >   #include "net/vhost-vdpa.h"
> >   #include "hw/virtio/vhost-vdpa.h"
> > @@ -187,6 +188,46 @@ static NetClientInfo net_vhost_vdpa_info = {
> >           .check_peer_type = vhost_vdpa_check_peer_type,
> >   };
> >
> > +static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
> > +                                       const VirtQueueElement *elem)
> > +{
> > +    struct virtio_net_ctrl_hdr ctrl;
> > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > +    size_t s;
> > +    struct iovec in = {
> > +        .iov_base = &status,
> > +        .iov_len = sizeof(status),
> > +    };
> > +
> > +    s = iov_to_buf(elem->out_sg, elem->out_num, 0, &ctrl, sizeof(ctrl.class));
> > +    if (s != sizeof(ctrl.class)) {
> > +        return;
> > +    }
> > +
> > +    switch (ctrl.class) {
> > +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > +    case VIRTIO_NET_CTRL_MQ:
> > +        break;
> > +    default:
> > +        return;
> > +    };
>
>
> I think we can probably remove the whitelist here since it is expected
> to work for any kind of command?
>

SVQ is expected to inject virtio device status at startup
(specifically, at live migration destination startup). This code is
specific per command.

Thanks!

> Thanks
>
>
> > +
> > +    s = iov_to_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
> > +    if (s != sizeof(status) || status != VIRTIO_NET_OK) {
> > +        return;
> > +    }
> > +
> > +    status = VIRTIO_NET_ERR;
> > +    virtio_net_handle_ctrl_iov(vdev, &in, 1, elem->out_sg, elem->out_num);
> > +    if (status != VIRTIO_NET_OK) {
> > +        error_report("Bad CVQ processing in model");
> > +    }
> > +}
> > +
> > +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> > +    .used_elem_handler = vhost_vdpa_net_handle_ctrl,
> > +};
> > +
> >   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                              const char *device,
> >                                              const char *name,
> > @@ -211,6 +252,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> > +    if (!is_datapath) {
> > +        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> > +    }
> >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >       if (ret) {
> >           qemu_del_net_client(nc);
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 09/21] vhost: Add svq copy desc mode
  2022-06-08  4:14   ` Jason Wang
@ 2022-06-08 19:02     ` Eugenio Perez Martin
  2022-06-09  7:00       ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 19:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 8, 2022 at 6:14 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > Enable SVQ to not to forward the descriptor translating its address to
> > qemu's IOVA but copying to a region outside of the guest.
> >
> > Virtio-net control VQ will use this mode, so we don't need to send all
> > the guest's memory every time there is a change, but only on messages.
> > Reversely, CVQ will only have access to control messages.  This lead to
> > less messing with memory listeners.
> >
> > We could also try to send only the required translation by message, but
> > this presents a problem when many control messages occupy the same
> > guest's memory region.
> >
> > Lastly, this allows us to inject messages from QEMU to the device in a
> > simple manner.  CVQ should be used rarely and with small messages, so all
> > the drawbacks should be assumible.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  10 ++
> >   include/hw/virtio/vhost-vdpa.h     |   1 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 174 +++++++++++++++++++++++++++--
> >   hw/virtio/vhost-vdpa.c             |   1 +
> >   net/vhost-vdpa.c                   |   1 +
> >   5 files changed, 175 insertions(+), 12 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index e06ac52158..79cb2d301f 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -17,6 +17,12 @@
> >
> >   typedef struct SVQElement {
> >       VirtQueueElement elem;
> > +
> > +    /* SVQ IOVA address of in buffer and out buffer if cloned */
> > +    hwaddr in_iova, out_iova;
>
>
> It might worth to mention that we'd expect a single buffer here.
>

I'll do it. There is another comment like that in another place, I'll
copy it here.

>
> > +
> > +    /* Length of in buffer */
> > +    size_t in_len;
> >   } SVQElement;
> >
> >   typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> > @@ -102,6 +108,9 @@ typedef struct VhostShadowVirtqueue {
> >
> >       /* Next head to consume from the device */
> >       uint16_t last_used_idx;
> > +
> > +    /* Copy each descriptor to QEMU iova */
> > +    bool copy_descs;
> >   } VhostShadowVirtqueue;
> >
> >   bool vhost_svq_valid_features(uint64_t features, Error **errp);
> > @@ -119,6 +128,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> >   VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
> >                                       const VhostShadowVirtqueueOps *ops,
> > +                                    bool copy_descs,
> >                                       const VhostShadowVirtqueueMapOps *map_ops,
> >                                       void *map_ops_opaque);
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index f1ba46a860..dc2884eea4 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -33,6 +33,7 @@ typedef struct vhost_vdpa {
> >       struct vhost_vdpa_iova_range iova_range;
> >       uint64_t acked_features;
> >       bool shadow_vqs_enabled;
> > +    bool svq_copy_descs;
> >       /* IOVA mapping used by the Shadow Virtqueue */
> >       VhostIOVATree *iova_tree;
> >       GPtrArray *shadow_vqs;
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 044005ba89..5a8feb1cbc 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -16,6 +16,7 @@
> >   #include "qemu/log.h"
> >   #include "qemu/memalign.h"
> >   #include "linux-headers/linux/vhost.h"
> > +#include "qemu/iov.h"
> >
> >   /**
> >    * Validate the transport device features that both guests can use with the SVQ
> > @@ -70,6 +71,30 @@ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> >       return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
> >   }
> >
> > +static void vhost_svq_alloc_buffer(void **base, size_t *len,
> > +                                   const struct iovec *iov, size_t num,
> > +                                   bool write)
> > +{
> > +    *len = iov_size(iov, num);
>
>
> Since this behavior is trigger able by the guest, we need an upper limit
> here.
>

Good point. What could be a good limit?

As you propose later, maybe I can redesign SVQ so it either forwards
the buffer to the device or calls an available element callback. It
can inject the right copied buffer by itself. This way we know the
right buffer size beforehand.

>
> > +    size_t buf_size = ROUND_UP(*len, 4096);
>
>
> I see a kind of duplicated round up which is done in
> vhost_svq_write_descs().
>

Yes, it's better to return this size somehow.

> Btw, should we use TARGET_PAGE_SIZE instead of the magic 4096 here?
>

Yes. But since we're going to expose pages to the device, it should be
host_page_size, right?

>
> > +
> > +    if (!num) {
> > +        return;
> > +    }
> > +
> > +    /*
> > +     * Linearize element. If guest had a descriptor chain, we expose the device
> > +     * a single buffer.
> > +     */
> > +    *base = qemu_memalign(4096, buf_size);
> > +    if (!write) {
> > +        iov_to_buf(iov, num, 0, *base, *len);
> > +        memset(*base + *len, 0, buf_size - *len);
> > +    } else {
> > +        memset(*base, 0, *len);
> > +    }
> > +}
> > +
> >   /**
> >    * Translate addresses between the qemu's virtual address and the SVQ IOVA
> >    *
> > @@ -126,7 +151,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> >    * Write descriptors to SVQ vring
> >    *
> >    * @svq: The shadow virtqueue
> > + * @svq_elem: The shadow virtqueue element
> >    * @sg: Cache for hwaddr
> > + * @descs_len: Total written buffer if svq->copy_descs.
> >    * @iovec: The iovec from the guest
> >    * @num: iovec length
> >    * @more_descs: True if more descriptors come in the chain
> > @@ -134,7 +161,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> >    *
> >    * Return true if success, false otherwise and print error.
> >    */
> > -static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> > +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
> > +                                        SVQElement *svq_elem, hwaddr *sg,
> > +                                        size_t *descs_len,
> >                                           const struct iovec *iovec, size_t num,
> >                                           bool more_descs, bool write)
> >   {
> > @@ -142,18 +171,41 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> >       unsigned n;
> >       uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> >       vring_desc_t *descs = svq->vring.desc;
> > -    bool ok;
> > -
> >       if (num == 0) {
> >           return true;
> >       }
> >
> > -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> > -    if (unlikely(!ok)) {
> > -        return false;
> > +    if (svq->copy_descs) {
> > +        void *buf;
> > +        DMAMap map = {};
> > +        int r;
> > +
> > +        vhost_svq_alloc_buffer(&buf, descs_len, iovec, num, write);
> > +        map.translated_addr = (hwaddr)(uintptr_t)buf;
> > +        map.size = ROUND_UP(*descs_len, 4096) - 1;
> > +        map.perm = write ? IOMMU_RW : IOMMU_RO,
> > +        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
> > +        if (unlikely(r != IOVA_OK)) {
> > +            error_report("Cannot map injected element");
> > +            return false;
> > +        }
> > +
> > +        r = svq->map_ops->map(map.iova, map.size + 1,
> > +                              (void *)map.translated_addr, !write,
> > +                              svq->map_ops_opaque);
> > +        /* TODO: Handle error */
> > +        assert(r == 0);
> > +        num = 1;
> > +        sg[0] = map.iova;
>
>
> I think it would be simple if stick a simple logic of
> vhost_svq_vring_write_descs() here.
>
> E.g we can move the above logic to the caller and it can simply prepare
> a dedicated in/out sg for the copied buffer.
>

Yes, it can be done that way.

>
> > +    } else {
> > +        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> > +        if (unlikely(!ok)) {
> > +            return false;
> > +        }
> >       }
> >
> >       for (n = 0; n < num; n++) {
> > +        uint32_t len = svq->copy_descs ? *descs_len : iovec[n].iov_len;
> >           if (more_descs || (n + 1 < num)) {
> >               descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> >               descs[i].next = cpu_to_le16(svq->desc_next[i]);
> > @@ -161,7 +213,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> >               descs[i].flags = flags;
> >           }
> >           descs[i].addr = cpu_to_le64(sg[n]);
> > -        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > +        descs[i].len = cpu_to_le32(len);
> >
> >           last = i;
> >           i = cpu_to_le16(svq->desc_next[i]);
> > @@ -178,7 +230,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
> >       unsigned avail_idx;
> >       vring_avail_t *avail = svq->vring.avail;
> >       bool ok;
> > -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > +    g_autofree hwaddr *sgs = NULL;
> > +    hwaddr *in_sgs, *out_sgs;
> >
> >       *head = svq->free_head;
> >
> > @@ -189,15 +242,24 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
> >           return false;
> >       }
> >
> > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> > +    if (!svq->copy_descs) {
> > +        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > +        in_sgs = out_sgs = sgs;
> > +    } else {
> > +        in_sgs = &svq_elem->in_iova;
> > +        out_sgs = &svq_elem->out_iova;
> > +    }
> > +    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, (size_t[]){},
> > +                                     elem->out_sg, elem->out_num,
> >                                        elem->in_num > 0, false);
> >       if (unlikely(!ok)) {
> >           return false;
> >       }
> >
> > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> > -                                     true);
> > +    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, &svq_elem->in_len,
> > +                                     elem->in_sg, elem->in_num, false, true);
> >       if (unlikely(!ok)) {
> > +        /* TODO unwind out_sg */
> >           return false;
> >       }
> >
> > @@ -276,6 +338,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >               SVQElement *svq_elem;
> >               VirtQueueElement *elem;
> >               bool ok;
> > +            uint32_t needed_slots;
> >
> >               if (svq->next_guest_avail_elem) {
> >                   svq_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > @@ -288,7 +351,8 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >               }
> >
> >               elem = &svq_elem->elem;
> > -            if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
> > +            needed_slots = svq->copy_descs ? 1 : elem->out_num + elem->in_num;
> > +            if (needed_slots > vhost_svq_available_slots(svq)) {
> >                   /*
> >                    * This condition is possible since a contiguous buffer in GPA
> >                    * does not imply a contiguous buffer in qemu's VA
> > @@ -411,6 +475,76 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
> >       return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> >   }
> >
> > +/**
> > + * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
> > + *
> > + * @svq: Shadow VirtQueue
> > + * @iova: SVQ IO Virtual address of descriptor
> > + * @iov: Optional iovec to store device writable buffer
> > + * @iov_cnt: iov length
> > + * @buf_len: Length written by the device
> > + *
> > + * Print error message in case of error
> > + */
> > +static bool vhost_svq_unmap_iov(VhostShadowVirtqueue *svq, hwaddr iova,
> > +                                const struct iovec *iov, size_t iov_cnt,
> > +                                size_t buf_len)
> > +{
> > +    DMAMap needle = {
> > +        /*
> > +         * No need to specify size since contiguous iova chunk was allocated
> > +         * by SVQ.
> > +         */
> > +        .iova = iova,
> > +    };
> > +    const DMAMap *map = vhost_iova_tree_find(svq->iova_tree, &needle);
> > +    int r;
> > +
> > +    if (!map) {
> > +        error_report("Cannot locate expected map");
> > +        return false;
> > +    }
> > +
> > +    r = svq->map_ops->unmap(map->iova, map->size + 1, svq->map_ops_opaque);
> > +    if (unlikely(r != 0)) {
> > +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> > +        return false;
> > +    }
> > +
> > +    if (iov) {
> > +        iov_from_buf(iov, iov_cnt, 0, (const void *)map->translated_addr, buf_len);
> > +    }
> > +    qemu_vfree((void *)map->translated_addr);
> > +    vhost_iova_tree_remove(svq->iova_tree, &needle);
> > +    return true;
> > +}
> > +
> > +/**
> > + * Unmap shadow virtqueue element
> > + *
> > + * @svq_elem: Shadow VirtQueue Element
> > + * @copy_in: Copy in buffer to the element at unmapping
> > + */
> > +static bool vhost_svq_unmap_elem(VhostShadowVirtqueue *svq, SVQElement *svq_elem, uint32_t len, bool copy_in)
> > +{
> > +    VirtQueueElement *elem = &svq_elem->elem;
> > +    const struct iovec *in_iov = copy_in ? elem->in_sg : NULL;
> > +    size_t in_count = copy_in ? elem->in_num : 0;
> > +    if (elem->out_num) {
> > +        bool ok = vhost_svq_unmap_iov(svq, svq_elem->out_iova, NULL, 0, 0);
> > +        if (unlikely(!ok)) {
> > +            return false;
> > +        }
> > +    }
> > +
> > +    if (elem->in_num) {
> > +        return vhost_svq_unmap_iov(svq, svq_elem->in_iova, in_iov, in_count,
> > +                                   len);
> > +    }
> > +
> > +    return true;
> > +}
> > +
> >   static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                               bool check_for_avail_queue)
> >   {
> > @@ -429,6 +563,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                   break;
> >               }
> >
> > +            if (svq->copy_descs) {
> > +                bool ok = vhost_svq_unmap_elem(svq, svq_elem, len, true);
> > +                if (unlikely(!ok)) {
> > +                    return;
> > +                }
> > +            }
> > +
> >               elem = &svq_elem->elem;
> >               if (svq->ops && svq->ops->used_elem_handler) {
> >                   svq->ops->used_elem_handler(svq->vdev, elem);
> > @@ -611,12 +752,18 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >           g_autofree SVQElement *svq_elem = NULL;
> >           svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
> >           if (svq_elem) {
> > +            if (svq->copy_descs) {
> > +                vhost_svq_unmap_elem(svq, svq_elem, 0, false);
> > +            }
> >               virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
> >           }
> >       }
> >
> >       next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> >       if (next_avail_elem) {
> > +        if (svq->copy_descs) {
> > +            vhost_svq_unmap_elem(svq, next_avail_elem, 0, false);
> > +        }
> >           virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
> >       }
> >       svq->vq = NULL;
> > @@ -632,6 +779,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >    *
> >    * @iova_tree: Tree to perform descriptors translations
> >    * @ops: SVQ operations hooks
> > + * @copy_descs: Copy each descriptor to QEMU iova
> >    * @map_ops: SVQ mapping operation hooks
> >    * @map_ops_opaque: Opaque data to pass to mapping operations
> >    *
> > @@ -641,6 +789,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >    */
> >   VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> >                                       const VhostShadowVirtqueueOps *ops,
> > +                                    bool copy_descs,
> >                                       const VhostShadowVirtqueueMapOps *map_ops,
> >                                       void *map_ops_opaque)
> >   {
> > @@ -665,6 +814,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >       svq->iova_tree = iova_tree;
> >       svq->ops = ops;
> > +    svq->copy_descs = copy_descs;
> >       svq->map_ops = map_ops;
> >       svq->map_ops_opaque = map_ops_opaque;
> >       return g_steal_pointer(&svq);
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index e6ef944e23..31b3d4d013 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -436,6 +436,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> >           g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
> >                                                          v->shadow_vq_ops,
> > +                                                       v->svq_copy_descs,
> >                                                          &vhost_vdpa_svq_map_ops,
> >                                                          v);
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index ef12fc284c..174fec5e77 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -254,6 +254,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.index = queue_pair_index;
> >       if (!is_datapath) {
> >           s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> > +        s->vhost_vdpa.svq_copy_descs = true;
> >       }
> >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >       if (ret) {
>
>
> So all these logic seems rather complicated, it might be better to think
> of a way to simplify the stuffs. The cause of the complexity is that we
> couple too many stuffs with SVQ.
>
> I wonder if we can simply let control virtqueue end in userspace code
> where it has a full understanding of the semantic, then let it talks to
> the vhost-vdpa directly:
>
> E.g in the case of mq setting, we will start form the
> virtio_net_handle_mq(). Where we can prepare cvq commands there and send
> them to vhost-vDPA networking backend where the cvq commands were mapped
> and submitted to the device?
>

If I understood you correctly, it's doable.

I'll try to come up with that for the next version.

Thanks!

> Thanks
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 11/21] vhost: Update kernel headers
  2022-06-08  4:18   ` Jason Wang
@ 2022-06-08 19:04     ` Eugenio Perez Martin
  0 siblings, 0 replies; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 19:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 8, 2022 at 6:19 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
>
>
> It's better to use the helpers in scripts/ and mentioned to which
> version is this synced.
>

Right, I should have written somewhere this was in the meantime it was
accepted in Linux master :). I'll use the scripts for the next
version.

Thanks!

> Thanks
>
>
> >   include/standard-headers/linux/vhost_types.h | 11 ++++++++-
> >   linux-headers/linux/vhost.h                  | 25 ++++++++++++++++----
> >   2 files changed, 30 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
> > index 0bd2684a2a..ce78551b0f 100644
> > --- a/include/standard-headers/linux/vhost_types.h
> > +++ b/include/standard-headers/linux/vhost_types.h
> > @@ -87,7 +87,7 @@ struct vhost_msg {
> >
> >   struct vhost_msg_v2 {
> >       uint32_t type;
> > -     uint32_t reserved;
> > +     uint32_t asid;
> >       union {
> >               struct vhost_iotlb_msg iotlb;
> >               uint8_t padding[64];
> > @@ -153,4 +153,13 @@ struct vhost_vdpa_iova_range {
> >   /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
> >   #define VHOST_NET_F_VIRTIO_NET_HDR 27
> >
> > +/* Use message type V2 */
> > +#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
> > +/* IOTLB can accept batching hints */
> > +#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
> > +/* IOTLB can accept address space identifier through V2 type of IOTLB
> > + * message
> > + */
> > +#define VHOST_BACKEND_F_IOTLB_ASID  0x3
> > +
> >   #endif
> > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > index 5d99e7c242..d42eb46efd 100644
> > --- a/linux-headers/linux/vhost.h
> > +++ b/linux-headers/linux/vhost.h
> > @@ -89,11 +89,6 @@
> >
> >   /* Set or get vhost backend capability */
> >
> > -/* Use message type V2 */
> > -#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
> > -/* IOTLB can accept batching hints */
> > -#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
> > -
> >   #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
> >   #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
> >
> > @@ -154,6 +149,26 @@
> >   /* Get the config size */
> >   #define VHOST_VDPA_GET_CONFIG_SIZE  _IOR(VHOST_VIRTIO, 0x79, __u32)
> >
> > +/* Get the number of virtqueue groups. */
> > +#define VHOST_VDPA_GET_GROUP_NUM     _IOR(VHOST_VIRTIO, 0x7A, unsigned int)
> > +
> > +/* Get the number of address spaces. */
> > +#define VHOST_VDPA_GET_AS_NUM                _IOR(VHOST_VIRTIO, 0x7B, unsigned int)
> > +
> > +/* Get the group for a virtqueue: read index, write group in num,
> > + * The virtqueue index is stored in the index field of
> > + * vhost_vring_state. The group for this specific virtqueue is
> > + * returned via num field of vhost_vring_state.
> > + */
> > +#define VHOST_VDPA_GET_VRING_GROUP   _IOWR(VHOST_VIRTIO, 0x7C,       \
> > +                                           struct vhost_vring_state)
> > +/* Set the ASID for a virtqueue group. The group index is stored in
> > + * the index field of vhost_vring_state, the ASID associated with this
> > + * group is stored at num field of vhost_vring_state.
> > + */
> > +#define VHOST_VDPA_SET_GROUP_ASID    _IOW(VHOST_VIRTIO, 0x7D, \
> > +                                          struct vhost_vring_state)
> > +
> >   /* Get the count of all virtqueues */
> >   #define VHOST_VDPA_GET_VQS_COUNT    _IOR(VHOST_VIRTIO, 0x80, __u32)
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK
  2022-06-08  4:20   ` Jason Wang
@ 2022-06-08 19:06     ` Eugenio Perez Martin
  0 siblings, 0 replies; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 19:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 8, 2022 at 6:21 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > To restore the device in the destination of a live migration we send the
> > commands through control virtqueue. For a device to read CVQ it must
> > have received DRIVER_OK status bit.
> >
> > However this open a window where the device could start receiving
> > packets in rx queue 0 before it receive the RSS configuration. To avoid
> > that, we will not send vring_enable until all configuration is used by
> > the device.
> >
> > As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
> I may miss something, but it looks to me this should be an independent
> patch or it should depend on live migration series.
>

With x-svq it's possible to migrate a VM, because we don't need to
stop the device: VMM always knows the vq state to program in the
destination (assuming no need for inflight etc).

But it will have better context in the next series for sure.

Thanks!

> Thanks
>
>
> > ---
> >   hw/virtio/vhost-vdpa.c | 20 +++++++++++++++-----
> >   1 file changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 31b3d4d013..13e5e2a061 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -748,13 +748,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> >       return idx;
> >   }
> >
> > +/**
> > + * Set ready all vring of the device
> > + *
> > + * @dev: Vhost device
> > + */
> >   static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >   {
> >       int i;
> >       trace_vhost_vdpa_set_vring_ready(dev);
> > -    for (i = 0; i < dev->nvqs; ++i) {
> > +    for (i = 0; i < dev->vq_index_end; ++i) {
> >           struct vhost_vring_state state = {
> > -            .index = dev->vq_index + i,
> > +            .index = i,
> >               .num = 1,
> >           };
> >           vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > @@ -1117,7 +1122,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >           if (unlikely(!ok)) {
> >               return -1;
> >           }
> > -        vhost_vdpa_set_vring_ready(dev);
> >       } else {
> >           ok = vhost_vdpa_svqs_stop(dev);
> >           if (unlikely(!ok)) {
> > @@ -1131,16 +1135,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >       }
> >
> >       if (started) {
> > +        int r;
> >           memory_listener_register(&v->listener, &address_space_memory);
> > -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        if (unlikely(r)) {
> > +            return r;
> > +        }
> > +        vhost_vdpa_set_vring_ready(dev);
> >       } else {
> >           vhost_vdpa_reset_device(dev);
> >           vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >                                      VIRTIO_CONFIG_S_DRIVER);
> >           memory_listener_unregister(&v->listener);
> >
> > -        return 0;
> >       }
> > +
> > +    return 0;
> >   }
> >
> >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group
  2022-06-08  4:25   ` Jason Wang
@ 2022-06-08 19:21     ` Eugenio Perez Martin
  2022-06-09  7:13       ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 19:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 8, 2022 at 6:25 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > CVQ needs to be in its own group, not shared with any data vq. Enable
> > the checking of it here, before introducing address space id concepts.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/vhost.h |  2 +
> >   hw/net/vhost_net.c        |  4 +-
> >   hw/virtio/vhost-vdpa.c    | 79 ++++++++++++++++++++++++++++++++++++++-
> >   hw/virtio/trace-events    |  1 +
> >   4 files changed, 84 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index b291fe4e24..cebec1d817 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -84,6 +84,8 @@ struct vhost_dev {
> >       int vq_index_end;
> >       /* if non-zero, minimum required value for max_queues */
> >       int num_queues;
> > +    /* Must be a vq group different than any other vhost dev */
> > +    bool independent_vq_group;
>
>
> We probably need a better abstraction here.
>
> E.g having a parent vhost_dev_group structure.
>

I think there is room for improvement too, but to make this work we
don't need the device model to know all the other devices at this
moment. I'm open to implementing it if we decide that solution is more
maintainable or whatever other reason though.

>
> >       uint64_t features;
> >       uint64_t acked_features;
> >       uint64_t backend_features;
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index ccac5b7a64..1c2386c01c 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -339,14 +339,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >       }
> >
> >       for (i = 0; i < nvhosts; i++) {
> > +        bool cvq_idx = i >= data_queue_pairs;
> >
> > -        if (i < data_queue_pairs) {
> > +        if (!cvq_idx) {
> >               peer = qemu_get_peer(ncs, i);
> >           } else { /* Control Virtqueue */
> >               peer = qemu_get_peer(ncs, n->max_queue_pairs);
> >           }
> >
> >           net = get_vhost_net(peer);
> > +        net->dev.independent_vq_group = !!cvq_idx;
> >           vhost_net_set_vq_index(net, i * 2, index_end);
> >
> >           /* Suppress the masking guest notifiers on vhost user
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index eec6d544e9..52dd8baa8d 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -685,7 +685,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> >   {
> >       uint64_t features;
> >       uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> > -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> > +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> > +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> >       int r;
> >
> >       if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> > @@ -1110,6 +1111,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
> >       return true;
> >   }
> >
> > +static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
> > +                                      struct vhost_vring_state *state)
> > +{
> > +    int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
> > +    trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
> > +    return ret;
> > +}
> > +
> > +static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    struct vhost_vring_state this_vq_group = {
> > +        .index = dev->vq_index,
> > +    };
> > +    int ret;
> > +
> > +    if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
> > +        return true;
> > +    }
>
>
> This should be false?
>
>
> > +
> > +    if (!v->shadow_vqs_enabled) {
> > +        return true;
> > +    }
>
>
> And here?
>

They're true so it doesn't get in the middle if the device already
knows there is no need to check vhost_dev for an independent group.

With recent mq changes, I think I can delete these checks and move
them to net/vhost-vdpa.

>
> > +
> > +    ret = vhost_vdpa_get_vring_group(dev, &this_vq_group);
> > +    if (unlikely(ret)) {
> > +        goto call_err;
> > +    }
> > +
> > +    for (int i = 1; i < dev->nvqs; ++i) {
> > +        struct vhost_vring_state vq_group = {
> > +            .index = dev->vq_index + i,
> > +        };
> > +
> > +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > +        if (unlikely(ret)) {
> > +            goto call_err;
> > +        }
> > +        if (unlikely(vq_group.num != this_vq_group.num)) {
> > +            error_report("VQ %d group is different than VQ %d one",
> > +                         this_vq_group.index, vq_group.index);
>
>
> Not sure this is needed. The group id is not tied to vq index if I
> understand correctly.
>
> E.g we have 1 qp with cvq, we can have
>
> group 0 cvq
>
> group 1 tx/rx
>

This function is severly undocumented, thanks for pointing out :).

It checks if the virtqueues that belong to this vhost_dev does not
share vq group with any other virtqueue in the device. We need to
check it at device startup, since cvq index changes depending on _F_MQ
negotiated.

Since we're going to check all other virtqueues, we don't need to know
other vhost_dev individually: We know the set of vqs to check is all
vqs but our vhost_dev one.

Does it make it more clear?

Thanks!

> Thanks
>
>
> > +            return false;
> > +        }
> > +    }
> > +
> > +    for (int i = 0; i < dev->vq_index_end; ++i) {
> > +        struct vhost_vring_state vq_group = {
> > +            .index = i,
> > +        };
> > +
> > +        if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
> > +            continue;
> > +        }
> > +
> > +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > +        if (unlikely(ret)) {
> > +            goto call_err;
> > +        }
> > +        if (unlikely(vq_group.num == this_vq_group.num)) {
> > +            error_report("VQ %d group is the same as VQ %d one",
> > +                         this_vq_group.index, vq_group.index);
> > +            return false;
> > +        }
> > +    }
> > +
> > +    return true;
> > +
> > +call_err:
> > +    error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
> > +    return false;
> > +}
> > +
> >   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >   {
> >       struct vhost_vdpa *v = dev->opaque;
> > @@ -1118,6 +1191,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >
> >       if (started) {
> >           vhost_vdpa_host_notifiers_init(dev);
> > +        if (dev->independent_vq_group &&
> > +            !vhost_dev_is_independent_group(dev)) {
> > +            return -1;
> > +        }
> >           ok = vhost_vdpa_svqs_start(dev);
> >           if (unlikely(!ok)) {
> >               return -1;
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index ab8e095b73..ffb8eb26e7 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -46,6 +46,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
> >   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
> >   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
> >   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> > +vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> >   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
> >   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
> >   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-08  5:51 ` [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Jason Wang
@ 2022-06-08 19:28   ` Eugenio Perez Martin
  2022-06-13 16:31     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 19:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > Control virtqueue is used by networking device for accepting various
> > commands from the driver. It's a must to support multiqueue and other
> > configurations.
> >
> > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > states, effectively intercepting them so qemu can track what regions of memory
> > are dirty because device action and needs migration. However, this does not
> > solve networking device state seen by the driver because CVQ messages, like
> > changes on MAC addresses from the driver.
> >
> > To solve that, this series uses SVQ infraestructure proposed to intercept
> > networking control messages used by the device. This way, qemu is able to
> > update VirtIONet device model and to migrate it.
> >
> > However, to intercept all queues would slow device data forwarding. To solve
> > that, only the CVQ must be intercepted all the time. This is achieved using
> > the ASID infraestructure, that allows different translations for different
> > virtqueues. The most updated kernel part of ASID is proposed at [1].
> >
> > You can run qemu in two modes after applying this series: only intercepting
> > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> >
> > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> >
> > First three patches enable the update of the virtio-net device model for each
> > CVQ message acknoledged by the device.
> >
> > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > This allows simplyfing the memory mapping, instead of map all the guest's
> > memory like in the data virtqueues.
> >
> > Patch 10 allows to inject control messages to the device. This allows to set
> > state to the device both at QEMU startup and at live migration destination. In
> > the future, this may also be used to emulate _F_ANNOUNCE.
> >
> > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > because they are still not accepted in the kernel.
> >
> > Patches 12-16 enables the set of the features of the net device model to the
> > vdpa device at device start.
> >
> > Last ones enables the sepparated ASID and SVQ.
> >
> > Comments are welcomed.
>
>
> As discussed, I think we need to split this huge series into smaller ones:
>
> 1) shadow CVQ only, this makes rx-filter-event work
> 2) ASID support for CVQ
>
> And for 1) we need consider whether or not it could be simplified.
>
> Or do it in reverse order, since if we do 1) first, we may have security
> issues.
>

I'm ok with both, but I also think 2) before 1) might make more sense.
There is no way to only shadow CVQ otherwise ATM.

Can we do as with previous base SVQ patches? they were merged although
there is still no way to enable SVQ.

Thanks!

> Thoughts?
>
> Thanks
>
>
> >
> > TODO:
> > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> >    reason, blocking migration. This is tricky, since it can cause that the VM
> >    cannot be migrated anymore, so some way of block it must be used.
> > * Review failure paths, some are with TODO notes, other don't.
> >
> > Changes from rfc v7:
> > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> >    more memory listeners.
> > * Move net backend start callback to SVQ.
> > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > * Changed ioctls, but they're provisional anyway.
> > * Reorder commits so refactor and code adding ones are closer to usage.
> > * Usual cleaning: better tracing, doc, patches messages, ...
> >
> > Changes from rfc v6:
> > * Fix bad iotlb updates order when batching was enabled
> > * Add reference counting to iova_tree so cleaning is simpler.
> >
> > Changes from rfc v5:
> > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> >
> > Changes from rfc v4:
> > * Add missing tracing
> > * Add multiqueue support
> > * Use already sent version for replacing g_memdup
> > * Care with memory management
> >
> > Changes from rfc v3:
> > * Fix bad returning of descriptors to SVQ list.
> >
> > Changes from rfc v2:
> > * Fix use-after-free.
> >
> > Changes from rfc v1:
> > * Rebase to latest master.
> > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > * Update device model so (MAC) state can be migrated too.
> >
> > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> >
> > Eugenio Pérez (21):
> >    virtio-net: Expose ctrl virtqueue logic
> >    vhost: Add custom used buffer callback
> >    vdpa: control virtqueue support on shadow virtqueue
> >    virtio: Make virtqueue_alloc_element non-static
> >    vhost: Add vhost_iova_tree_find
> >    vdpa: Add map/unmap operation callback to SVQ
> >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> >    vhost: Add SVQElement
> >    vhost: Add svq copy desc mode
> >    vhost: Add vhost_svq_inject
> >    vhost: Update kernel headers
> >    vdpa: delay set_vring_ready after DRIVER_OK
> >    vhost: Add ShadowVirtQueueStart operation
> >    vhost: Make possible to check for device exclusive vq group
> >    vhost: add vhost_svq_poll
> >    vdpa: Add vhost_vdpa_start_control_svq
> >    vdpa: Add asid attribute to vdpa device
> >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> >    vhost: Add reference counting to vhost_iova_tree
> >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> >    vdpa: Add x-cvq-svq
> >
> >   qapi/net.json                                |  13 +-
> >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> >   include/hw/virtio/vhost-vdpa.h               |   3 +
> >   include/hw/virtio/vhost.h                    |   3 +
> >   include/hw/virtio/virtio-net.h               |   4 +
> >   include/hw/virtio/virtio.h                   |   1 +
> >   include/standard-headers/linux/vhost_types.h |  11 +-
> >   linux-headers/linux/vhost.h                  |  25 +-
> >   hw/net/vhost_net.c                           |   5 +-
> >   hw/net/virtio-net.c                          |  84 +++--
> >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> >   hw/virtio/virtio.c                           |   2 +-
> >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> >   hw/virtio/trace-events                       |  10 +-
> >   17 files changed, 1012 insertions(+), 130 deletions(-)
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 02/21] vhost: Add custom used buffer callback
  2022-06-07  6:12   ` Jason Wang
@ 2022-06-08 19:38     ` Eugenio Perez Martin
  0 siblings, 0 replies; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-08 19:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 7, 2022 at 8:12 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > The callback allows SVQ users to know the VirtQueue requests and
> > responses. QEMU can use this to synchronize virtio device model state,
> > allowing to migrate it with minimum changes to the migration code.
> >
> > In the case of networking, this will be used to inspect control
> > virtqueue messages.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h | 16 +++++++++++++++-
> >   include/hw/virtio/vhost-vdpa.h     |  2 ++
> >   hw/virtio/vhost-shadow-virtqueue.c |  9 ++++++++-
> >   hw/virtio/vhost-vdpa.c             |  3 ++-
> >   4 files changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index c132c994e9..6593f07db3 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -15,6 +15,13 @@
> >   #include "standard-headers/linux/vhost_types.h"
> >   #include "hw/virtio/vhost-iova-tree.h"
> >
> > +typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> > +                                         const VirtQueueElement *elem);
>
>
> Nit: I wonder if something like "VirtQueueCallback" is sufficient (e.g
> kernel use "callback" directly)
>

I didn't think about the notification part of the "callback" but more
on the function callback, to notify the net or vhost-vdpa net
subsystem :). But I think it can be named your way for sure.

If we ever have other callbacks closer to vq than to vq elements to
rename it later shouldn't be a big deal.

>
> > +
> > +typedef struct VhostShadowVirtqueueOps {
> > +    VirtQueueElementCallback used_elem_handler;
> > +} VhostShadowVirtqueueOps;
> > +
> >   /* Shadow virtqueue to relay notifications */
> >   typedef struct VhostShadowVirtqueue {
> >       /* Shadow vring */
> > @@ -59,6 +66,12 @@ typedef struct VhostShadowVirtqueue {
> >        */
> >       uint16_t *desc_next;
> >
> > +    /* Optional callbacks */
> > +    const VhostShadowVirtqueueOps *ops;
>
>
> Can we merge map_ops to ops?
>

It can be merged, but they are set by different actors.

map_ops is received by hw/virtio/vhost-vdpa, while this ops depends on
the kind of device. Is it ok to fill the ops members "by chunks"?

>
> > +
> > +    /* Optional custom used virtqueue element handler */
> > +    VirtQueueElementCallback used_elem_cb;
>
>
> This seems not used in this series.
>

Right, this is a leftover. Thanks for pointing it out!

Thanks!

> Thanks
>
>
> > +
> >       /* Next head to expose to the device */
> >       uint16_t shadow_avail_idx;
> >
> > @@ -85,7 +98,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> >                        VirtQueue *vq);
> >   void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >
> > -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
> > +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> > +                                    const VhostShadowVirtqueueOps *ops);
> >
> >   void vhost_svq_free(gpointer vq);
> >   G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index a29dbb3f53..f1ba46a860 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -17,6 +17,7 @@
> >   #include "hw/virtio/vhost-iova-tree.h"
> >   #include "hw/virtio/virtio.h"
> >   #include "standard-headers/linux/vhost_types.h"
> > +#include "hw/virtio/vhost-shadow-virtqueue.h"
> >
> >   typedef struct VhostVDPAHostNotifier {
> >       MemoryRegion mr;
> > @@ -35,6 +36,7 @@ typedef struct vhost_vdpa {
> >       /* IOVA mapping used by the Shadow Virtqueue */
> >       VhostIOVATree *iova_tree;
> >       GPtrArray *shadow_vqs;
> > +    const VhostShadowVirtqueueOps *shadow_vq_ops;
> >       struct vhost_dev *dev;
> >       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >   } VhostVDPA;
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 56c96ebd13..167db8be45 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -410,6 +410,10 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                   break;
> >               }
> >
> > +            if (svq->ops && svq->ops->used_elem_handler) {
> > +                svq->ops->used_elem_handler(svq->vdev, elem);
> > +            }
> > +
> >               if (unlikely(i >= svq->vring.num)) {
> >                   qemu_log_mask(LOG_GUEST_ERROR,
> >                            "More than %u used buffers obtained in a %u size SVQ",
> > @@ -607,12 +611,14 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >    * shadow methods and file descriptors.
> >    *
> >    * @iova_tree: Tree to perform descriptors translations
> > + * @ops: SVQ operations hooks
> >    *
> >    * Returns the new virtqueue or NULL.
> >    *
> >    * In case of error, reason is reported through error_report.
> >    */
> > -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
> > +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> > +                                    const VhostShadowVirtqueueOps *ops)
> >   {
> >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> >       int r;
> > @@ -634,6 +640,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
> >       event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
> >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> >       svq->iova_tree = iova_tree;
> > +    svq->ops = ops;
> >       return g_steal_pointer(&svq);
> >
> >   err_init_hdev_call:
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 66f054a12c..7677b337e6 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -418,7 +418,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >
> >       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
> >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > -        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
> > +        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
> > +                                                            v->shadow_vq_ops);
> >
> >           if (unlikely(!svq)) {
> >               error_setg(errp, "Cannot create svq %u", n);
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 09/21] vhost: Add svq copy desc mode
  2022-06-08 19:02     ` Eugenio Perez Martin
@ 2022-06-09  7:00       ` Jason Wang
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Wang @ 2022-06-09  7:00 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Thu, Jun 9, 2022 at 3:03 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Jun 8, 2022 at 6:14 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > Enable SVQ to not to forward the descriptor translating its address to
> > > qemu's IOVA but copying to a region outside of the guest.
> > >
> > > Virtio-net control VQ will use this mode, so we don't need to send all
> > > the guest's memory every time there is a change, but only on messages.
> > > Reversely, CVQ will only have access to control messages.  This lead to
> > > less messing with memory listeners.
> > >
> > > We could also try to send only the required translation by message, but
> > > this presents a problem when many control messages occupy the same
> > > guest's memory region.
> > >
> > > Lastly, this allows us to inject messages from QEMU to the device in a
> > > simple manner.  CVQ should be used rarely and with small messages, so all
> > > the drawbacks should be assumible.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   hw/virtio/vhost-shadow-virtqueue.h |  10 ++
> > >   include/hw/virtio/vhost-vdpa.h     |   1 +
> > >   hw/virtio/vhost-shadow-virtqueue.c | 174 +++++++++++++++++++++++++++--
> > >   hw/virtio/vhost-vdpa.c             |   1 +
> > >   net/vhost-vdpa.c                   |   1 +
> > >   5 files changed, 175 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > index e06ac52158..79cb2d301f 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -17,6 +17,12 @@
> > >
> > >   typedef struct SVQElement {
> > >       VirtQueueElement elem;
> > > +
> > > +    /* SVQ IOVA address of in buffer and out buffer if cloned */
> > > +    hwaddr in_iova, out_iova;
> >
> >
> > It might worth to mention that we'd expect a single buffer here.
> >
>
> I'll do it. There is another comment like that in another place, I'll
> copy it here.
>
> >
> > > +
> > > +    /* Length of in buffer */
> > > +    size_t in_len;
> > >   } SVQElement;
> > >
> > >   typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> > > @@ -102,6 +108,9 @@ typedef struct VhostShadowVirtqueue {
> > >
> > >       /* Next head to consume from the device */
> > >       uint16_t last_used_idx;
> > > +
> > > +    /* Copy each descriptor to QEMU iova */
> > > +    bool copy_descs;
> > >   } VhostShadowVirtqueue;
> > >
> > >   bool vhost_svq_valid_features(uint64_t features, Error **errp);
> > > @@ -119,6 +128,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >
> > >   VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
> > >                                       const VhostShadowVirtqueueOps *ops,
> > > +                                    bool copy_descs,
> > >                                       const VhostShadowVirtqueueMapOps *map_ops,
> > >                                       void *map_ops_opaque);
> > >
> > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > > index f1ba46a860..dc2884eea4 100644
> > > --- a/include/hw/virtio/vhost-vdpa.h
> > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > @@ -33,6 +33,7 @@ typedef struct vhost_vdpa {
> > >       struct vhost_vdpa_iova_range iova_range;
> > >       uint64_t acked_features;
> > >       bool shadow_vqs_enabled;
> > > +    bool svq_copy_descs;
> > >       /* IOVA mapping used by the Shadow Virtqueue */
> > >       VhostIOVATree *iova_tree;
> > >       GPtrArray *shadow_vqs;
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 044005ba89..5a8feb1cbc 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -16,6 +16,7 @@
> > >   #include "qemu/log.h"
> > >   #include "qemu/memalign.h"
> > >   #include "linux-headers/linux/vhost.h"
> > > +#include "qemu/iov.h"
> > >
> > >   /**
> > >    * Validate the transport device features that both guests can use with the SVQ
> > > @@ -70,6 +71,30 @@ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> > >       return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
> > >   }
> > >
> > > +static void vhost_svq_alloc_buffer(void **base, size_t *len,
> > > +                                   const struct iovec *iov, size_t num,
> > > +                                   bool write)
> > > +{
> > > +    *len = iov_size(iov, num);
> >
> >
> > Since this behavior is trigger able by the guest, we need an upper limit
> > here.
> >
>
> Good point. What could be a good limit?
>

We probably need to inspect the class/command of the header in this case.

Actually, it's not an vDPA specific issue, we probably need a limit
even for Qemu backend.

The only annoying command is the VIRTIO_NET_CTRL_MAC_TABLE_SET which
accepts an variable macs array.

> As you propose later, maybe I can redesign SVQ so it either forwards
> the buffer to the device or calls an available element callback. It
> can inject the right copied buffer by itself. This way we know the
> right buffer size beforehand.

That could be one way.

>
> >
> > > +    size_t buf_size = ROUND_UP(*len, 4096);
> >
> >
> > I see a kind of duplicated round up which is done in
> > vhost_svq_write_descs().
> >
>
> Yes, it's better to return this size somehow.
>
> > Btw, should we use TARGET_PAGE_SIZE instead of the magic 4096 here?
> >
>
> Yes. But since we're going to expose pages to the device, it should be
> host_page_size, right?

Right.

>
> >
> > > +
> > > +    if (!num) {
> > > +        return;
> > > +    }
> > > +
> > > +    /*
> > > +     * Linearize element. If guest had a descriptor chain, we expose the device
> > > +     * a single buffer.
> > > +     */
> > > +    *base = qemu_memalign(4096, buf_size);
> > > +    if (!write) {
> > > +        iov_to_buf(iov, num, 0, *base, *len);
> > > +        memset(*base + *len, 0, buf_size - *len);
> > > +    } else {
> > > +        memset(*base, 0, *len);
> > > +    }
> > > +}
> > > +
> > >   /**
> > >    * Translate addresses between the qemu's virtual address and the SVQ IOVA
> > >    *
> > > @@ -126,7 +151,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > >    * Write descriptors to SVQ vring
> > >    *
> > >    * @svq: The shadow virtqueue
> > > + * @svq_elem: The shadow virtqueue element
> > >    * @sg: Cache for hwaddr
> > > + * @descs_len: Total written buffer if svq->copy_descs.
> > >    * @iovec: The iovec from the guest
> > >    * @num: iovec length
> > >    * @more_descs: True if more descriptors come in the chain
> > > @@ -134,7 +161,9 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > >    *
> > >    * Return true if success, false otherwise and print error.
> > >    */
> > > -static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> > > +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
> > > +                                        SVQElement *svq_elem, hwaddr *sg,
> > > +                                        size_t *descs_len,
> > >                                           const struct iovec *iovec, size_t num,
> > >                                           bool more_descs, bool write)
> > >   {
> > > @@ -142,18 +171,41 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> > >       unsigned n;
> > >       uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > >       vring_desc_t *descs = svq->vring.desc;
> > > -    bool ok;
> > > -
> > >       if (num == 0) {
> > >           return true;
> > >       }
> > >
> > > -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> > > -    if (unlikely(!ok)) {
> > > -        return false;
> > > +    if (svq->copy_descs) {
> > > +        void *buf;
> > > +        DMAMap map = {};
> > > +        int r;
> > > +
> > > +        vhost_svq_alloc_buffer(&buf, descs_len, iovec, num, write);
> > > +        map.translated_addr = (hwaddr)(uintptr_t)buf;
> > > +        map.size = ROUND_UP(*descs_len, 4096) - 1;
> > > +        map.perm = write ? IOMMU_RW : IOMMU_RO,
> > > +        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
> > > +        if (unlikely(r != IOVA_OK)) {
> > > +            error_report("Cannot map injected element");
> > > +            return false;
> > > +        }
> > > +
> > > +        r = svq->map_ops->map(map.iova, map.size + 1,
> > > +                              (void *)map.translated_addr, !write,
> > > +                              svq->map_ops_opaque);
> > > +        /* TODO: Handle error */
> > > +        assert(r == 0);
> > > +        num = 1;
> > > +        sg[0] = map.iova;
> >
> >
> > I think it would be simple if stick a simple logic of
> > vhost_svq_vring_write_descs() here.
> >
> > E.g we can move the above logic to the caller and it can simply prepare
> > a dedicated in/out sg for the copied buffer.
> >
>
> Yes, it can be done that way.
>
> >
> > > +    } else {
> > > +        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> > > +        if (unlikely(!ok)) {
> > > +            return false;
> > > +        }
> > >       }
> > >
> > >       for (n = 0; n < num; n++) {
> > > +        uint32_t len = svq->copy_descs ? *descs_len : iovec[n].iov_len;
> > >           if (more_descs || (n + 1 < num)) {
> > >               descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > >               descs[i].next = cpu_to_le16(svq->desc_next[i]);
> > > @@ -161,7 +213,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> > >               descs[i].flags = flags;
> > >           }
> > >           descs[i].addr = cpu_to_le64(sg[n]);
> > > -        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > > +        descs[i].len = cpu_to_le32(len);
> > >
> > >           last = i;
> > >           i = cpu_to_le16(svq->desc_next[i]);
> > > @@ -178,7 +230,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
> > >       unsigned avail_idx;
> > >       vring_avail_t *avail = svq->vring.avail;
> > >       bool ok;
> > > -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > > +    g_autofree hwaddr *sgs = NULL;
> > > +    hwaddr *in_sgs, *out_sgs;
> > >
> > >       *head = svq->free_head;
> > >
> > > @@ -189,15 +242,24 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
> > >           return false;
> > >       }
> > >
> > > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> > > +    if (!svq->copy_descs) {
> > > +        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > > +        in_sgs = out_sgs = sgs;
> > > +    } else {
> > > +        in_sgs = &svq_elem->in_iova;
> > > +        out_sgs = &svq_elem->out_iova;
> > > +    }
> > > +    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, (size_t[]){},
> > > +                                     elem->out_sg, elem->out_num,
> > >                                        elem->in_num > 0, false);
> > >       if (unlikely(!ok)) {
> > >           return false;
> > >       }
> > >
> > > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> > > -                                     true);
> > > +    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, &svq_elem->in_len,
> > > +                                     elem->in_sg, elem->in_num, false, true);
> > >       if (unlikely(!ok)) {
> > > +        /* TODO unwind out_sg */
> > >           return false;
> > >       }
> > >
> > > @@ -276,6 +338,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> > >               SVQElement *svq_elem;
> > >               VirtQueueElement *elem;
> > >               bool ok;
> > > +            uint32_t needed_slots;
> > >
> > >               if (svq->next_guest_avail_elem) {
> > >                   svq_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > > @@ -288,7 +351,8 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> > >               }
> > >
> > >               elem = &svq_elem->elem;
> > > -            if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
> > > +            needed_slots = svq->copy_descs ? 1 : elem->out_num + elem->in_num;
> > > +            if (needed_slots > vhost_svq_available_slots(svq)) {
> > >                   /*
> > >                    * This condition is possible since a contiguous buffer in GPA
> > >                    * does not imply a contiguous buffer in qemu's VA
> > > @@ -411,6 +475,76 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
> > >       return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > >   }
> > >
> > > +/**
> > > + * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
> > > + *
> > > + * @svq: Shadow VirtQueue
> > > + * @iova: SVQ IO Virtual address of descriptor
> > > + * @iov: Optional iovec to store device writable buffer
> > > + * @iov_cnt: iov length
> > > + * @buf_len: Length written by the device
> > > + *
> > > + * Print error message in case of error
> > > + */
> > > +static bool vhost_svq_unmap_iov(VhostShadowVirtqueue *svq, hwaddr iova,
> > > +                                const struct iovec *iov, size_t iov_cnt,
> > > +                                size_t buf_len)
> > > +{
> > > +    DMAMap needle = {
> > > +        /*
> > > +         * No need to specify size since contiguous iova chunk was allocated
> > > +         * by SVQ.
> > > +         */
> > > +        .iova = iova,
> > > +    };
> > > +    const DMAMap *map = vhost_iova_tree_find(svq->iova_tree, &needle);
> > > +    int r;
> > > +
> > > +    if (!map) {
> > > +        error_report("Cannot locate expected map");
> > > +        return false;
> > > +    }
> > > +
> > > +    r = svq->map_ops->unmap(map->iova, map->size + 1, svq->map_ops_opaque);
> > > +    if (unlikely(r != 0)) {
> > > +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> > > +        return false;
> > > +    }
> > > +
> > > +    if (iov) {
> > > +        iov_from_buf(iov, iov_cnt, 0, (const void *)map->translated_addr, buf_len);
> > > +    }
> > > +    qemu_vfree((void *)map->translated_addr);
> > > +    vhost_iova_tree_remove(svq->iova_tree, &needle);
> > > +    return true;
> > > +}
> > > +
> > > +/**
> > > + * Unmap shadow virtqueue element
> > > + *
> > > + * @svq_elem: Shadow VirtQueue Element
> > > + * @copy_in: Copy in buffer to the element at unmapping
> > > + */
> > > +static bool vhost_svq_unmap_elem(VhostShadowVirtqueue *svq, SVQElement *svq_elem, uint32_t len, bool copy_in)
> > > +{
> > > +    VirtQueueElement *elem = &svq_elem->elem;
> > > +    const struct iovec *in_iov = copy_in ? elem->in_sg : NULL;
> > > +    size_t in_count = copy_in ? elem->in_num : 0;
> > > +    if (elem->out_num) {
> > > +        bool ok = vhost_svq_unmap_iov(svq, svq_elem->out_iova, NULL, 0, 0);
> > > +        if (unlikely(!ok)) {
> > > +            return false;
> > > +        }
> > > +    }
> > > +
> > > +    if (elem->in_num) {
> > > +        return vhost_svq_unmap_iov(svq, svq_elem->in_iova, in_iov, in_count,
> > > +                                   len);
> > > +    }
> > > +
> > > +    return true;
> > > +}
> > > +
> > >   static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >                               bool check_for_avail_queue)
> > >   {
> > > @@ -429,6 +563,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >                   break;
> > >               }
> > >
> > > +            if (svq->copy_descs) {
> > > +                bool ok = vhost_svq_unmap_elem(svq, svq_elem, len, true);
> > > +                if (unlikely(!ok)) {
> > > +                    return;
> > > +                }
> > > +            }
> > > +
> > >               elem = &svq_elem->elem;
> > >               if (svq->ops && svq->ops->used_elem_handler) {
> > >                   svq->ops->used_elem_handler(svq->vdev, elem);
> > > @@ -611,12 +752,18 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >           g_autofree SVQElement *svq_elem = NULL;
> > >           svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >           if (svq_elem) {
> > > +            if (svq->copy_descs) {
> > > +                vhost_svq_unmap_elem(svq, svq_elem, 0, false);
> > > +            }
> > >               virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
> > >           }
> > >       }
> > >
> > >       next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >       if (next_avail_elem) {
> > > +        if (svq->copy_descs) {
> > > +            vhost_svq_unmap_elem(svq, next_avail_elem, 0, false);
> > > +        }
> > >           virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
> > >       }
> > >       svq->vq = NULL;
> > > @@ -632,6 +779,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >    *
> > >    * @iova_tree: Tree to perform descriptors translations
> > >    * @ops: SVQ operations hooks
> > > + * @copy_descs: Copy each descriptor to QEMU iova
> > >    * @map_ops: SVQ mapping operation hooks
> > >    * @map_ops_opaque: Opaque data to pass to mapping operations
> > >    *
> > > @@ -641,6 +789,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >    */
> > >   VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> > >                                       const VhostShadowVirtqueueOps *ops,
> > > +                                    bool copy_descs,
> > >                                       const VhostShadowVirtqueueMapOps *map_ops,
> > >                                       void *map_ops_opaque)
> > >   {
> > > @@ -665,6 +814,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> > >       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> > >       svq->iova_tree = iova_tree;
> > >       svq->ops = ops;
> > > +    svq->copy_descs = copy_descs;
> > >       svq->map_ops = map_ops;
> > >       svq->map_ops_opaque = map_ops_opaque;
> > >       return g_steal_pointer(&svq);
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index e6ef944e23..31b3d4d013 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -436,6 +436,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> > >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > >           g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
> > >                                                          v->shadow_vq_ops,
> > > +                                                       v->svq_copy_descs,
> > >                                                          &vhost_vdpa_svq_map_ops,
> > >                                                          v);
> > >
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index ef12fc284c..174fec5e77 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -254,6 +254,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > >       s->vhost_vdpa.index = queue_pair_index;
> > >       if (!is_datapath) {
> > >           s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> > > +        s->vhost_vdpa.svq_copy_descs = true;
> > >       }
> > >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> > >       if (ret) {
> >
> >
> > So all these logic seems rather complicated, it might be better to think
> > of a way to simplify the stuffs. The cause of the complexity is that we
> > couple too many stuffs with SVQ.
> >
> > I wonder if we can simply let control virtqueue end in userspace code
> > where it has a full understanding of the semantic, then let it talks to
> > the vhost-vdpa directly:
> >
> > E.g in the case of mq setting, we will start form the
> > virtio_net_handle_mq(). Where we can prepare cvq commands there and send
> > them to vhost-vDPA networking backend where the cvq commands were mapped
> > and submitted to the device?
> >
>
> If I understood you correctly, it's doable.
>
> I'll try to come up with that for the next version.

We need to think of a way to keep SVQ simple, otherwise the code is
hard to debug. I'm not sure my proposal will work but I'm fine if you
have other idea.

Thanks

>
> Thanks!
>
> > Thanks
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group
  2022-06-08 19:21     ` Eugenio Perez Martin
@ 2022-06-09  7:13       ` Jason Wang
  2022-06-09  7:51         ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-09  7:13 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Thu, Jun 9, 2022 at 3:22 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Jun 8, 2022 at 6:25 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > CVQ needs to be in its own group, not shared with any data vq. Enable
> > > the checking of it here, before introducing address space id concepts.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   include/hw/virtio/vhost.h |  2 +
> > >   hw/net/vhost_net.c        |  4 +-
> > >   hw/virtio/vhost-vdpa.c    | 79 ++++++++++++++++++++++++++++++++++++++-
> > >   hw/virtio/trace-events    |  1 +
> > >   4 files changed, 84 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > index b291fe4e24..cebec1d817 100644
> > > --- a/include/hw/virtio/vhost.h
> > > +++ b/include/hw/virtio/vhost.h
> > > @@ -84,6 +84,8 @@ struct vhost_dev {
> > >       int vq_index_end;
> > >       /* if non-zero, minimum required value for max_queues */
> > >       int num_queues;
> > > +    /* Must be a vq group different than any other vhost dev */
> > > +    bool independent_vq_group;
> >
> >
> > We probably need a better abstraction here.
> >
> > E.g having a parent vhost_dev_group structure.
> >
>
> I think there is room for improvement too, but to make this work we
> don't need the device model to know all the other devices at this
> moment. I'm open to implementing it if we decide that solution is more
> maintainable or whatever other reason though.

I see, so let's keep it as is and do the enhancement in the future.

>
> >
> > >       uint64_t features;
> > >       uint64_t acked_features;
> > >       uint64_t backend_features;
> > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > index ccac5b7a64..1c2386c01c 100644
> > > --- a/hw/net/vhost_net.c
> > > +++ b/hw/net/vhost_net.c
> > > @@ -339,14 +339,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> > >       }
> > >
> > >       for (i = 0; i < nvhosts; i++) {
> > > +        bool cvq_idx = i >= data_queue_pairs;
> > >
> > > -        if (i < data_queue_pairs) {
> > > +        if (!cvq_idx) {
> > >               peer = qemu_get_peer(ncs, i);
> > >           } else { /* Control Virtqueue */
> > >               peer = qemu_get_peer(ncs, n->max_queue_pairs);
> > >           }
> > >
> > >           net = get_vhost_net(peer);
> > > +        net->dev.independent_vq_group = !!cvq_idx;
> > >           vhost_net_set_vq_index(net, i * 2, index_end);
> > >
> > >           /* Suppress the masking guest notifiers on vhost user
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index eec6d544e9..52dd8baa8d 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -685,7 +685,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> > >   {
> > >       uint64_t features;
> > >       uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> > > -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> > > +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> > > +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> > >       int r;
> > >
> > >       if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> > > @@ -1110,6 +1111,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
> > >       return true;
> > >   }
> > >
> > > +static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
> > > +                                      struct vhost_vring_state *state)
> > > +{
> > > +    int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
> > > +    trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
> > > +    return ret;
> > > +}
> > > +
> > > +static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
> > > +{
> > > +    struct vhost_vdpa *v = dev->opaque;
> > > +    struct vhost_vring_state this_vq_group = {
> > > +        .index = dev->vq_index,
> > > +    };
> > > +    int ret;
> > > +
> > > +    if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
> > > +        return true;
> > > +    }
> >
> >
> > This should be false?
> >
> >
> > > +
> > > +    if (!v->shadow_vqs_enabled) {
> > > +        return true;
> > > +    }
> >
> >
> > And here?
> >
>
> They're true so it doesn't get in the middle if the device already
> knows there is no need to check vhost_dev for an independent group.

I'm not sure I understand this.

Without ASID but with MQ, we know all vhost_devs are not independent.

>
> With recent mq changes, I think I can delete these checks and move
> them to net/vhost-vdpa.
>
> >
> > > +
> > > +    ret = vhost_vdpa_get_vring_group(dev, &this_vq_group);
> > > +    if (unlikely(ret)) {
> > > +        goto call_err;
> > > +    }
> > > +
> > > +    for (int i = 1; i < dev->nvqs; ++i) {
> > > +        struct vhost_vring_state vq_group = {
> > > +            .index = dev->vq_index + i,
> > > +        };
> > > +
> > > +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > > +        if (unlikely(ret)) {
> > > +            goto call_err;
> > > +        }
> > > +        if (unlikely(vq_group.num != this_vq_group.num)) {
> > > +            error_report("VQ %d group is different than VQ %d one",
> > > +                         this_vq_group.index, vq_group.index);
> >
> >
> > Not sure this is needed. The group id is not tied to vq index if I
> > understand correctly.
> >
> > E.g we have 1 qp with cvq, we can have
> >
> > group 0 cvq
> >
> > group 1 tx/rx
> >
>
> This function is severly undocumented, thanks for pointing out :).
>
> It checks if the virtqueues that belong to this vhost_dev does not
> share vq group with any other virtqueue in the device. We need to
> check it at device startup, since cvq index changes depending on _F_MQ
> negotiated.
>
> Since we're going to check all other virtqueues, we don't need to know
> other vhost_dev individually: We know the set of vqs to check is all
> vqs but our vhost_dev one.
>
> Does it make it more clear?

Kind of, but

We make independent_vq_group to be true for cvq unconditionally, and
check other vhost_dev during start. This seems less optimal. Any
reason we do all of the logic during net_inti_vhost_vdpa()?

E.g we can get and set up the group asid stuff once there instead of
on each stop/start.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > > +            return false;
> > > +        }
> > > +    }
> > > +
> > > +    for (int i = 0; i < dev->vq_index_end; ++i) {
> > > +        struct vhost_vring_state vq_group = {
> > > +            .index = i,
> > > +        };
> > > +
> > > +        if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
> > > +            continue;
> > > +        }
> > > +
> > > +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > > +        if (unlikely(ret)) {
> > > +            goto call_err;
> > > +        }
> > > +        if (unlikely(vq_group.num == this_vq_group.num)) {
> > > +            error_report("VQ %d group is the same as VQ %d one",
> > > +                         this_vq_group.index, vq_group.index);
> > > +            return false;
> > > +        }
> > > +    }
> > > +
> > > +    return true;
> > > +
> > > +call_err:
> > > +    error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
> > > +    return false;
> > > +}
> > > +
> > >   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >   {
> > >       struct vhost_vdpa *v = dev->opaque;
> > > @@ -1118,6 +1191,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >
> > >       if (started) {
> > >           vhost_vdpa_host_notifiers_init(dev);
> > > +        if (dev->independent_vq_group &&
> > > +            !vhost_dev_is_independent_group(dev)) {
> > > +            return -1;
> > > +        }
> > >           ok = vhost_vdpa_svqs_start(dev);
> > >           if (unlikely(!ok)) {
> > >               return -1;
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index ab8e095b73..ffb8eb26e7 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -46,6 +46,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
> > >   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
> > >   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
> > >   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> > > +vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> > >   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
> > >   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
> > >   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group
  2022-06-09  7:13       ` Jason Wang
@ 2022-06-09  7:51         ` Eugenio Perez Martin
  0 siblings, 0 replies; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-09  7:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Thu, Jun 9, 2022 at 9:13 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jun 9, 2022 at 3:22 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 8, 2022 at 6:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > CVQ needs to be in its own group, not shared with any data vq. Enable
> > > > the checking of it here, before introducing address space id concepts.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > >   include/hw/virtio/vhost.h |  2 +
> > > >   hw/net/vhost_net.c        |  4 +-
> > > >   hw/virtio/vhost-vdpa.c    | 79 ++++++++++++++++++++++++++++++++++++++-
> > > >   hw/virtio/trace-events    |  1 +
> > > >   4 files changed, 84 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > index b291fe4e24..cebec1d817 100644
> > > > --- a/include/hw/virtio/vhost.h
> > > > +++ b/include/hw/virtio/vhost.h
> > > > @@ -84,6 +84,8 @@ struct vhost_dev {
> > > >       int vq_index_end;
> > > >       /* if non-zero, minimum required value for max_queues */
> > > >       int num_queues;
> > > > +    /* Must be a vq group different than any other vhost dev */
> > > > +    bool independent_vq_group;
> > >
> > >
> > > We probably need a better abstraction here.
> > >
> > > E.g having a parent vhost_dev_group structure.
> > >
> >
> > I think there is room for improvement too, but to make this work we
> > don't need the device model to know all the other devices at this
> > moment. I'm open to implementing it if we decide that solution is more
> > maintainable or whatever other reason though.
>
> I see, so let's keep it as is and do the enhancement in the future.
>
> >
> > >
> > > >       uint64_t features;
> > > >       uint64_t acked_features;
> > > >       uint64_t backend_features;
> > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > > index ccac5b7a64..1c2386c01c 100644
> > > > --- a/hw/net/vhost_net.c
> > > > +++ b/hw/net/vhost_net.c
> > > > @@ -339,14 +339,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> > > >       }
> > > >
> > > >       for (i = 0; i < nvhosts; i++) {
> > > > +        bool cvq_idx = i >= data_queue_pairs;
> > > >
> > > > -        if (i < data_queue_pairs) {
> > > > +        if (!cvq_idx) {
> > > >               peer = qemu_get_peer(ncs, i);
> > > >           } else { /* Control Virtqueue */
> > > >               peer = qemu_get_peer(ncs, n->max_queue_pairs);
> > > >           }
> > > >
> > > >           net = get_vhost_net(peer);
> > > > +        net->dev.independent_vq_group = !!cvq_idx;
> > > >           vhost_net_set_vq_index(net, i * 2, index_end);
> > > >
> > > >           /* Suppress the masking guest notifiers on vhost user
> > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > index eec6d544e9..52dd8baa8d 100644
> > > > --- a/hw/virtio/vhost-vdpa.c
> > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > @@ -685,7 +685,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> > > >   {
> > > >       uint64_t features;
> > > >       uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> > > > -        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> > > > +        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> > > > +        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
> > > >       int r;
> > > >
> > > >       if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> > > > @@ -1110,6 +1111,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
> > > >       return true;
> > > >   }
> > > >
> > > > +static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
> > > > +                                      struct vhost_vring_state *state)
> > > > +{
> > > > +    int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
> > > > +    trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
> > > > +    return ret;
> > > > +}
> > > > +
> > > > +static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
> > > > +{
> > > > +    struct vhost_vdpa *v = dev->opaque;
> > > > +    struct vhost_vring_state this_vq_group = {
> > > > +        .index = dev->vq_index,
> > > > +    };
> > > > +    int ret;
> > > > +
> > > > +    if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
> > > > +        return true;
> > > > +    }
> > >
> > >
> > > This should be false?
> > >
> > >
> > > > +
> > > > +    if (!v->shadow_vqs_enabled) {
> > > > +        return true;
> > > > +    }
> > >
> > >
> > > And here?
> > >
> >
> > They're true so it doesn't get in the middle if the device already
> > knows there is no need to check vhost_dev for an independent group.
>
> I'm not sure I understand this.
>
> Without ASID but with MQ, we know all vhost_devs are not independent.
>

I think we can move this discussion to another level: What is the
right action if the device exposes MQ but cannot set a different asid
for cvq for whatever reason?

a. To forbid migration (migration_blocker). This maintains retro
compatibility but we could migrate to a VMM that does not support it,
preventing the VM to be migrated again forever. This is the method I'm
implementing for the next version but we can decide otherwise for
sure.
b. To fail device initialization: This makes new versions of qemu to
fail with old devices, but prevents the migration locking problem of
a.

Note that I think we should treat all these cases the same: The vdpa
device not offer _F_ASID, the device does not have >1asid, the cvq
does not belong to an independent group, the ioctl to set vq group
asid does not succeed... I think that treating them differently forces
us to fall into the two drawbacks of 1. and 2. at the same time. More
on this below.

> >
> > With recent mq changes, I think I can delete these checks and move
> > them to net/vhost-vdpa.
> >
> > >
> > > > +
> > > > +    ret = vhost_vdpa_get_vring_group(dev, &this_vq_group);
> > > > +    if (unlikely(ret)) {
> > > > +        goto call_err;
> > > > +    }
> > > > +
> > > > +    for (int i = 1; i < dev->nvqs; ++i) {
> > > > +        struct vhost_vring_state vq_group = {
> > > > +            .index = dev->vq_index + i,
> > > > +        };
> > > > +
> > > > +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > > > +        if (unlikely(ret)) {
> > > > +            goto call_err;
> > > > +        }
> > > > +        if (unlikely(vq_group.num != this_vq_group.num)) {
> > > > +            error_report("VQ %d group is different than VQ %d one",
> > > > +                         this_vq_group.index, vq_group.index);
> > >
> > >
> > > Not sure this is needed. The group id is not tied to vq index if I
> > > understand correctly.
> > >
> > > E.g we have 1 qp with cvq, we can have
> > >
> > > group 0 cvq
> > >
> > > group 1 tx/rx
> > >
> >
> > This function is severly undocumented, thanks for pointing out :).
> >
> > It checks if the virtqueues that belong to this vhost_dev does not
> > share vq group with any other virtqueue in the device. We need to
> > check it at device startup, since cvq index changes depending on _F_MQ
> > negotiated.
> >
> > Since we're going to check all other virtqueues, we don't need to know
> > other vhost_dev individually: We know the set of vqs to check is all
> > vqs but our vhost_dev one.
> >
> > Does it make it more clear?
>
> Kind of, but
>
> We make independent_vq_group to be true for cvq unconditionally, and
> check other vhost_dev during start. This seems less optimal. Any
> reason we do all of the logic during net_inti_vhost_vdpa()?
>
> E.g we can get and set up the group asid stuff once there instead of
> on each stop/start.
>

There are a few reasons:
* vq groups ASID is reset every time the device is reset.
* If the guest does not acknowledge features, we don't know the cvq
index, so we cannot check the vq group of the cvq. I think the device
should even return -EINVAL in this case,
* Assuming we can circumvent the last two points, it does not make
sense to forbid migration of a guest because VMM cannot inspect CVQ as
long as the guest does not negotiate CVQ.

Thanks!

> Thanks
>
> >
> > Thanks!
> >
> > > Thanks
> > >
> > >
> > > > +            return false;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    for (int i = 0; i < dev->vq_index_end; ++i) {
> > > > +        struct vhost_vring_state vq_group = {
> > > > +            .index = i,
> > > > +        };
> > > > +
> > > > +        if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
> > > > +            continue;
> > > > +        }
> > > > +
> > > > +        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > > > +        if (unlikely(ret)) {
> > > > +            goto call_err;
> > > > +        }
> > > > +        if (unlikely(vq_group.num == this_vq_group.num)) {
> > > > +            error_report("VQ %d group is the same as VQ %d one",
> > > > +                         this_vq_group.index, vq_group.index);
> > > > +            return false;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return true;
> > > > +
> > > > +call_err:
> > > > +    error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
> > > > +    return false;
> > > > +}
> > > > +
> > > >   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > > >   {
> > > >       struct vhost_vdpa *v = dev->opaque;
> > > > @@ -1118,6 +1191,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > > >
> > > >       if (started) {
> > > >           vhost_vdpa_host_notifiers_init(dev);
> > > > +        if (dev->independent_vq_group &&
> > > > +            !vhost_dev_is_independent_group(dev)) {
> > > > +            return -1;
> > > > +        }
> > > >           ok = vhost_vdpa_svqs_start(dev);
> > > >           if (unlikely(!ok)) {
> > > >               return -1;
> > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > index ab8e095b73..ffb8eb26e7 100644
> > > > --- a/hw/virtio/trace-events
> > > > +++ b/hw/virtio/trace-events
> > > > @@ -46,6 +46,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
> > > >   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
> > > >   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
> > > >   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> > > > +vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> > > >   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
> > > >   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
> > > >   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-08 19:28   ` Eugenio Perez Martin
@ 2022-06-13 16:31     ` Eugenio Perez Martin
  2022-06-14  8:01       ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-13 16:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > Control virtqueue is used by networking device for accepting various
> > > commands from the driver. It's a must to support multiqueue and other
> > > configurations.
> > >
> > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > states, effectively intercepting them so qemu can track what regions of memory
> > > are dirty because device action and needs migration. However, this does not
> > > solve networking device state seen by the driver because CVQ messages, like
> > > changes on MAC addresses from the driver.
> > >
> > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > networking control messages used by the device. This way, qemu is able to
> > > update VirtIONet device model and to migrate it.
> > >
> > > However, to intercept all queues would slow device data forwarding. To solve
> > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > the ASID infraestructure, that allows different translations for different
> > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > >
> > > You can run qemu in two modes after applying this series: only intercepting
> > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > >
> > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > >
> > > First three patches enable the update of the virtio-net device model for each
> > > CVQ message acknoledged by the device.
> > >
> > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > memory like in the data virtqueues.
> > >
> > > Patch 10 allows to inject control messages to the device. This allows to set
> > > state to the device both at QEMU startup and at live migration destination. In
> > > the future, this may also be used to emulate _F_ANNOUNCE.
> > >
> > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > because they are still not accepted in the kernel.
> > >
> > > Patches 12-16 enables the set of the features of the net device model to the
> > > vdpa device at device start.
> > >
> > > Last ones enables the sepparated ASID and SVQ.
> > >
> > > Comments are welcomed.
> >
> >
> > As discussed, I think we need to split this huge series into smaller ones:
> >
> > 1) shadow CVQ only, this makes rx-filter-event work
> > 2) ASID support for CVQ
> >
> > And for 1) we need consider whether or not it could be simplified.
> >
> > Or do it in reverse order, since if we do 1) first, we may have security
> > issues.
> >
>
> I'm ok with both, but I also think 2) before 1) might make more sense.
> There is no way to only shadow CVQ otherwise ATM.
>

On second thought, that order is kind of harder.

If we only map CVQ buffers, we need to either:
a. Copy them to controlled buffers
b. Track properly when to unmap them

Alternative a. have the same problems exposed in this RFC: It's hard
(and unneeded in the final version) to know the size to copy.
Alternative b. also requires things not needed in the final version,
like to count the number of times each page is mapped and unmapped.

So I'll go to the first alternative, that is also the proposed order
of the RFC. What security issues do you expect beyond the comments in
this series?

Thanks!

> Can we do as with previous base SVQ patches? they were merged although
> there is still no way to enable SVQ.
>
> Thanks!
>
> > Thoughts?
> >
> > Thanks
> >
> >
> > >
> > > TODO:
> > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > >    cannot be migrated anymore, so some way of block it must be used.
> > > * Review failure paths, some are with TODO notes, other don't.
> > >
> > > Changes from rfc v7:
> > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > >    more memory listeners.
> > > * Move net backend start callback to SVQ.
> > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > * Changed ioctls, but they're provisional anyway.
> > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > * Usual cleaning: better tracing, doc, patches messages, ...
> > >
> > > Changes from rfc v6:
> > > * Fix bad iotlb updates order when batching was enabled
> > > * Add reference counting to iova_tree so cleaning is simpler.
> > >
> > > Changes from rfc v5:
> > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > >
> > > Changes from rfc v4:
> > > * Add missing tracing
> > > * Add multiqueue support
> > > * Use already sent version for replacing g_memdup
> > > * Care with memory management
> > >
> > > Changes from rfc v3:
> > > * Fix bad returning of descriptors to SVQ list.
> > >
> > > Changes from rfc v2:
> > > * Fix use-after-free.
> > >
> > > Changes from rfc v1:
> > > * Rebase to latest master.
> > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > * Update device model so (MAC) state can be migrated too.
> > >
> > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > >
> > > Eugenio Pérez (21):
> > >    virtio-net: Expose ctrl virtqueue logic
> > >    vhost: Add custom used buffer callback
> > >    vdpa: control virtqueue support on shadow virtqueue
> > >    virtio: Make virtqueue_alloc_element non-static
> > >    vhost: Add vhost_iova_tree_find
> > >    vdpa: Add map/unmap operation callback to SVQ
> > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > >    vhost: Add SVQElement
> > >    vhost: Add svq copy desc mode
> > >    vhost: Add vhost_svq_inject
> > >    vhost: Update kernel headers
> > >    vdpa: delay set_vring_ready after DRIVER_OK
> > >    vhost: Add ShadowVirtQueueStart operation
> > >    vhost: Make possible to check for device exclusive vq group
> > >    vhost: add vhost_svq_poll
> > >    vdpa: Add vhost_vdpa_start_control_svq
> > >    vdpa: Add asid attribute to vdpa device
> > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > >    vhost: Add reference counting to vhost_iova_tree
> > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > >    vdpa: Add x-cvq-svq
> > >
> > >   qapi/net.json                                |  13 +-
> > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > >   include/hw/virtio/vhost.h                    |   3 +
> > >   include/hw/virtio/virtio-net.h               |   4 +
> > >   include/hw/virtio/virtio.h                   |   1 +
> > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > >   linux-headers/linux/vhost.h                  |  25 +-
> > >   hw/net/vhost_net.c                           |   5 +-
> > >   hw/net/virtio-net.c                          |  84 +++--
> > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > >   hw/virtio/virtio.c                           |   2 +-
> > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > >   hw/virtio/trace-events                       |  10 +-
> > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > >
> >



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-13 16:31     ` Eugenio Perez Martin
@ 2022-06-14  8:01       ` Jason Wang
  2022-06-14  8:13         ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-14  8:01 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > Control virtqueue is used by networking device for accepting various
> > > > commands from the driver. It's a must to support multiqueue and other
> > > > configurations.
> > > >
> > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > are dirty because device action and needs migration. However, this does not
> > > > solve networking device state seen by the driver because CVQ messages, like
> > > > changes on MAC addresses from the driver.
> > > >
> > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > networking control messages used by the device. This way, qemu is able to
> > > > update VirtIONet device model and to migrate it.
> > > >
> > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > the ASID infraestructure, that allows different translations for different
> > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > >
> > > > You can run qemu in two modes after applying this series: only intercepting
> > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > >
> > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > >
> > > > First three patches enable the update of the virtio-net device model for each
> > > > CVQ message acknoledged by the device.
> > > >
> > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > memory like in the data virtqueues.
> > > >
> > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > state to the device both at QEMU startup and at live migration destination. In
> > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > >
> > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > because they are still not accepted in the kernel.
> > > >
> > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > vdpa device at device start.
> > > >
> > > > Last ones enables the sepparated ASID and SVQ.
> > > >
> > > > Comments are welcomed.
> > >
> > >
> > > As discussed, I think we need to split this huge series into smaller ones:
> > >
> > > 1) shadow CVQ only, this makes rx-filter-event work
> > > 2) ASID support for CVQ
> > >
> > > And for 1) we need consider whether or not it could be simplified.
> > >
> > > Or do it in reverse order, since if we do 1) first, we may have security
> > > issues.
> > >
> >
> > I'm ok with both, but I also think 2) before 1) might make more sense.
> > There is no way to only shadow CVQ otherwise ATM.
> >
>
> On second thought, that order is kind of harder.
>
> If we only map CVQ buffers, we need to either:
> a. Copy them to controlled buffers
> b. Track properly when to unmap them

Just to make sure we're at the same page:

I meant we can start with e.g having a dedicated ASID for CVQ but
still using CVQ passthrough.

Then do other stuff on top.

>
> Alternative a. have the same problems exposed in this RFC: It's hard
> (and unneeded in the final version) to know the size to copy.
> Alternative b. also requires things not needed in the final version,
> like to count the number of times each page is mapped and unmapped.
>
> So I'll go to the first alternative, that is also the proposed order
> of the RFC. What security issues do you expect beyond the comments in
> this series?

If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
try to peek/modify it?

Thanks

>
> Thanks!
>
> > Can we do as with previous base SVQ patches? they were merged although
> > there is still no way to enable SVQ.
> >
> > Thanks!
> >
> > > Thoughts?
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > TODO:
> > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > * Review failure paths, some are with TODO notes, other don't.
> > > >
> > > > Changes from rfc v7:
> > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > >    more memory listeners.
> > > > * Move net backend start callback to SVQ.
> > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > * Changed ioctls, but they're provisional anyway.
> > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > >
> > > > Changes from rfc v6:
> > > > * Fix bad iotlb updates order when batching was enabled
> > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > >
> > > > Changes from rfc v5:
> > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > >
> > > > Changes from rfc v4:
> > > > * Add missing tracing
> > > > * Add multiqueue support
> > > > * Use already sent version for replacing g_memdup
> > > > * Care with memory management
> > > >
> > > > Changes from rfc v3:
> > > > * Fix bad returning of descriptors to SVQ list.
> > > >
> > > > Changes from rfc v2:
> > > > * Fix use-after-free.
> > > >
> > > > Changes from rfc v1:
> > > > * Rebase to latest master.
> > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > * Update device model so (MAC) state can be migrated too.
> > > >
> > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > >
> > > > Eugenio Pérez (21):
> > > >    virtio-net: Expose ctrl virtqueue logic
> > > >    vhost: Add custom used buffer callback
> > > >    vdpa: control virtqueue support on shadow virtqueue
> > > >    virtio: Make virtqueue_alloc_element non-static
> > > >    vhost: Add vhost_iova_tree_find
> > > >    vdpa: Add map/unmap operation callback to SVQ
> > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > >    vhost: Add SVQElement
> > > >    vhost: Add svq copy desc mode
> > > >    vhost: Add vhost_svq_inject
> > > >    vhost: Update kernel headers
> > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > >    vhost: Add ShadowVirtQueueStart operation
> > > >    vhost: Make possible to check for device exclusive vq group
> > > >    vhost: add vhost_svq_poll
> > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > >    vdpa: Add asid attribute to vdpa device
> > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > >    vhost: Add reference counting to vhost_iova_tree
> > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > >    vdpa: Add x-cvq-svq
> > > >
> > > >   qapi/net.json                                |  13 +-
> > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > >   include/hw/virtio/vhost.h                    |   3 +
> > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > >   include/hw/virtio/virtio.h                   |   1 +
> > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > >   hw/net/vhost_net.c                           |   5 +-
> > > >   hw/net/virtio-net.c                          |  84 +++--
> > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > >   hw/virtio/virtio.c                           |   2 +-
> > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > >   hw/virtio/trace-events                       |  10 +-
> > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > >
> > >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-14  8:01       ` Jason Wang
@ 2022-06-14  8:13         ` Eugenio Perez Martin
  2022-06-14  8:20           ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-14  8:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > Control virtqueue is used by networking device for accepting various
> > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > configurations.
> > > > >
> > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > are dirty because device action and needs migration. However, this does not
> > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > changes on MAC addresses from the driver.
> > > > >
> > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > networking control messages used by the device. This way, qemu is able to
> > > > > update VirtIONet device model and to migrate it.
> > > > >
> > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > the ASID infraestructure, that allows different translations for different
> > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > >
> > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > >
> > > > > First three patches enable the update of the virtio-net device model for each
> > > > > CVQ message acknoledged by the device.
> > > > >
> > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > memory like in the data virtqueues.
> > > > >
> > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > >
> > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > because they are still not accepted in the kernel.
> > > > >
> > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > vdpa device at device start.
> > > > >
> > > > > Last ones enables the sepparated ASID and SVQ.
> > > > >
> > > > > Comments are welcomed.
> > > >
> > > >
> > > > As discussed, I think we need to split this huge series into smaller ones:
> > > >
> > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > 2) ASID support for CVQ
> > > >
> > > > And for 1) we need consider whether or not it could be simplified.
> > > >
> > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > issues.
> > > >
> > >
> > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > There is no way to only shadow CVQ otherwise ATM.
> > >
> >
> > On second thought, that order is kind of harder.
> >
> > If we only map CVQ buffers, we need to either:
> > a. Copy them to controlled buffers
> > b. Track properly when to unmap them
>
> Just to make sure we're at the same page:
>
> I meant we can start with e.g having a dedicated ASID for CVQ but
> still using CVQ passthrough.
>

That would imply duplicating all the memory listener updates to both
ASIDs. That part of the code needs to be reverted. I'm ok with that,
but I'm not sure if it's worth it to do it that way.

> Then do other stuff on top.
>
> >
> > Alternative a. have the same problems exposed in this RFC: It's hard
> > (and unneeded in the final version) to know the size to copy.
> > Alternative b. also requires things not needed in the final version,
> > like to count the number of times each page is mapped and unmapped.
> >
> > So I'll go to the first alternative, that is also the proposed order
> > of the RFC. What security issues do you expect beyond the comments in
> > this series?
>
> If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> try to peek/modify it?
>

It works the same way as data vqs, we're just updating the device
model in the middle. It should imply the exact same risk as updating
an emulated NIC control plane (including vhost-kernel / vhost-user).

Roughly speaking, it's just to propose patches 01 to 03, with your
comments. That already meets use cases like rx filter notifications
for devices with only one ASID.

Thanks!

> Thanks
>
> >
> > Thanks!
> >
> > > Can we do as with previous base SVQ patches? they were merged although
> > > there is still no way to enable SVQ.
> > >
> > > Thanks!
> > >
> > > > Thoughts?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > TODO:
> > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > >
> > > > > Changes from rfc v7:
> > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > >    more memory listeners.
> > > > > * Move net backend start callback to SVQ.
> > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > * Changed ioctls, but they're provisional anyway.
> > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > >
> > > > > Changes from rfc v6:
> > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > >
> > > > > Changes from rfc v5:
> > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > >
> > > > > Changes from rfc v4:
> > > > > * Add missing tracing
> > > > > * Add multiqueue support
> > > > > * Use already sent version for replacing g_memdup
> > > > > * Care with memory management
> > > > >
> > > > > Changes from rfc v3:
> > > > > * Fix bad returning of descriptors to SVQ list.
> > > > >
> > > > > Changes from rfc v2:
> > > > > * Fix use-after-free.
> > > > >
> > > > > Changes from rfc v1:
> > > > > * Rebase to latest master.
> > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > * Update device model so (MAC) state can be migrated too.
> > > > >
> > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > >
> > > > > Eugenio Pérez (21):
> > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > >    vhost: Add custom used buffer callback
> > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > >    vhost: Add vhost_iova_tree_find
> > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > >    vhost: Add SVQElement
> > > > >    vhost: Add svq copy desc mode
> > > > >    vhost: Add vhost_svq_inject
> > > > >    vhost: Update kernel headers
> > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > >    vhost: Make possible to check for device exclusive vq group
> > > > >    vhost: add vhost_svq_poll
> > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > >    vdpa: Add asid attribute to vdpa device
> > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >    vdpa: Add x-cvq-svq
> > > > >
> > > > >   qapi/net.json                                |  13 +-
> > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > >   hw/virtio/trace-events                       |  10 +-
> > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > >
> > > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-14  8:13         ` Eugenio Perez Martin
@ 2022-06-14  8:20           ` Jason Wang
  2022-06-14  9:31             ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-14  8:20 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > >
> > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > configurations.
> > > > > >
> > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > changes on MAC addresses from the driver.
> > > > > >
> > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > update VirtIONet device model and to migrate it.
> > > > > >
> > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > >
> > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > >
> > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > >
> > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > CVQ message acknoledged by the device.
> > > > > >
> > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > memory like in the data virtqueues.
> > > > > >
> > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > >
> > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > because they are still not accepted in the kernel.
> > > > > >
> > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > vdpa device at device start.
> > > > > >
> > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > >
> > > > > > Comments are welcomed.
> > > > >
> > > > >
> > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > >
> > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > 2) ASID support for CVQ
> > > > >
> > > > > And for 1) we need consider whether or not it could be simplified.
> > > > >
> > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > issues.
> > > > >
> > > >
> > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > There is no way to only shadow CVQ otherwise ATM.
> > > >
> > >
> > > On second thought, that order is kind of harder.
> > >
> > > If we only map CVQ buffers, we need to either:
> > > a. Copy them to controlled buffers
> > > b. Track properly when to unmap them
> >
> > Just to make sure we're at the same page:
> >
> > I meant we can start with e.g having a dedicated ASID for CVQ but
> > still using CVQ passthrough.
> >
>
> That would imply duplicating all the memory listener updates to both
> ASIDs. That part of the code needs to be reverted. I'm ok with that,
> but I'm not sure if it's worth it to do it that way.

I don't get why it is related to memory listeners. The only change is

1) read the groups
2) set cvq to be an independent asid
3) update CVQ's IOTLB with its own ASID

?

>
> > Then do other stuff on top.
> >
> > >
> > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > (and unneeded in the final version) to know the size to copy.
> > > Alternative b. also requires things not needed in the final version,
> > > like to count the number of times each page is mapped and unmapped.
> > >
> > > So I'll go to the first alternative, that is also the proposed order
> > > of the RFC. What security issues do you expect beyond the comments in
> > > this series?
> >
> > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > try to peek/modify it?
> >
>
> It works the same way as data vqs, we're just updating the device
> model in the middle. It should imply the exact same risk as updating
> an emulated NIC control plane (including vhost-kernel / vhost-user).

Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
is owned by guests.

But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
there's no way to prevent guests from accessing it?

If in the case of vhost-kernel/vhost-user, there's a way for the guest
to exploit buffers owned by Qemu, it should be a bug.

Thanks

>
> Roughly speaking, it's just to propose patches 01 to 03, with your
> comments. That already meets use cases like rx filter notifications
> for devices with only one ASID.
>
> Thanks!
>
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > > Can we do as with previous base SVQ patches? they were merged although
> > > > there is still no way to enable SVQ.
> > > >
> > > > Thanks!
> > > >
> > > > > Thoughts?
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > TODO:
> > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > >
> > > > > > Changes from rfc v7:
> > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > >    more memory listeners.
> > > > > > * Move net backend start callback to SVQ.
> > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > >
> > > > > > Changes from rfc v6:
> > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > >
> > > > > > Changes from rfc v5:
> > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > >
> > > > > > Changes from rfc v4:
> > > > > > * Add missing tracing
> > > > > > * Add multiqueue support
> > > > > > * Use already sent version for replacing g_memdup
> > > > > > * Care with memory management
> > > > > >
> > > > > > Changes from rfc v3:
> > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > >
> > > > > > Changes from rfc v2:
> > > > > > * Fix use-after-free.
> > > > > >
> > > > > > Changes from rfc v1:
> > > > > > * Rebase to latest master.
> > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > >
> > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > >
> > > > > > Eugenio Pérez (21):
> > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > >    vhost: Add custom used buffer callback
> > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > >    vhost: Add vhost_iova_tree_find
> > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > >    vhost: Add SVQElement
> > > > > >    vhost: Add svq copy desc mode
> > > > > >    vhost: Add vhost_svq_inject
> > > > > >    vhost: Update kernel headers
> > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > >    vhost: add vhost_svq_poll
> > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > >    vdpa: Add x-cvq-svq
> > > > > >
> > > > > >   qapi/net.json                                |  13 +-
> > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > >
> > > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-14  8:20           ` Jason Wang
@ 2022-06-14  9:31             ` Eugenio Perez Martin
  2022-06-15  3:04               ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-14  9:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > >
> > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > configurations.
> > > > > > >
> > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > changes on MAC addresses from the driver.
> > > > > > >
> > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > update VirtIONet device model and to migrate it.
> > > > > > >
> > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > >
> > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > >
> > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > CVQ message acknoledged by the device.
> > > > > > >
> > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > memory like in the data virtqueues.
> > > > > > >
> > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > >
> > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > because they are still not accepted in the kernel.
> > > > > > >
> > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > vdpa device at device start.
> > > > > > >
> > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > >
> > > > > > > Comments are welcomed.
> > > > > >
> > > > > >
> > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > >
> > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > 2) ASID support for CVQ
> > > > > >
> > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > >
> > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > issues.
> > > > > >
> > > > >
> > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > >
> > > >
> > > > On second thought, that order is kind of harder.
> > > >
> > > > If we only map CVQ buffers, we need to either:
> > > > a. Copy them to controlled buffers
> > > > b. Track properly when to unmap them
> > >
> > > Just to make sure we're at the same page:
> > >
> > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > still using CVQ passthrough.
> > >
> >
> > That would imply duplicating all the memory listener updates to both
> > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > but I'm not sure if it's worth it to do it that way.
>
> I don't get why it is related to memory listeners. The only change is
>
> 1) read the groups
> 2) set cvq to be an independent asid
> 3) update CVQ's IOTLB with its own ASID
>

How to track the mappings of step 3) without a copy?

If we don't copy the buffers to qemu's IOVA, we need to track when to
unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
so we need to refcount them (or similar solution).

This series copies the buffers to an independent buffer in qemu memory
to avoid that. Once you copy them, we have the problem you point at
some patch later: The guest control buffers, so qemu must understand
CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
At that point, we have this series except for the asid part and the
injection of CVQ buffers at the LM destination, isn't it?

CVQ buffers can be copied in the qemu IOVA space and be offered to the
device. Much like SVQ vrings, the copied buffers will not be
accessible from the guest. The hw device (as "non emulated cvq") will
receive a lot of dma updates, but it's temporary. We can add ASID on
top of that as a mean to:
- Not to SVQ data plane (fundamental to the intended use case of vdpa).
- Not to pollute data plane DMA mappings.

> ?
>
> >
> > > Then do other stuff on top.
> > >
> > > >
> > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > (and unneeded in the final version) to know the size to copy.
> > > > Alternative b. also requires things not needed in the final version,
> > > > like to count the number of times each page is mapped and unmapped.
> > > >
> > > > So I'll go to the first alternative, that is also the proposed order
> > > > of the RFC. What security issues do you expect beyond the comments in
> > > > this series?
> > >
> > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > try to peek/modify it?
> > >
> >
> > It works the same way as data vqs, we're just updating the device
> > model in the middle. It should imply the exact same risk as updating
> > an emulated NIC control plane (including vhost-kernel / vhost-user).
>
> Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> is owned by guests.
>

The same way they control the data plane when all data virtqueues are
shadowed for dirty page tracking (more on the risk of qemu updating
the device model below).

> But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> there's no way to prevent guests from accessing it?
>

With SVQ the memory exposed to the device is already shadowed. They
cannot access the CVQ buffers memory the same way they cannot access
the SVQ vrings.

> If in the case of vhost-kernel/vhost-user, there's a way for the guest
> to exploit buffers owned by Qemu, it should be a bug.
>

The only extra step is the call to virtio_net_handle_ctrl_iov
(extracted from virtio_net_handle_ctrl). If a guest can exploit that
in SVQ mode, it can exploit it too with other vhost backends as far as
I see.

> Thanks
>
> >
> > Roughly speaking, it's just to propose patches 01 to 03, with your
> > comments. That already meets use cases like rx filter notifications
> > for devices with only one ASID.
> >

This part of my mail is not correct, we need to add a few patches of
this series on top :). If not, it would be exploitable.

Thanks!

> > Thanks!
> >
> > > Thanks
> > >
> > > >
> > > > Thanks!
> > > >
> > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > there is still no way to enable SVQ.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO:
> > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > >
> > > > > > > Changes from rfc v7:
> > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > >    more memory listeners.
> > > > > > > * Move net backend start callback to SVQ.
> > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > >
> > > > > > > Changes from rfc v6:
> > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > >
> > > > > > > Changes from rfc v5:
> > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > >
> > > > > > > Changes from rfc v4:
> > > > > > > * Add missing tracing
> > > > > > > * Add multiqueue support
> > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > * Care with memory management
> > > > > > >
> > > > > > > Changes from rfc v3:
> > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > >
> > > > > > > Changes from rfc v2:
> > > > > > > * Fix use-after-free.
> > > > > > >
> > > > > > > Changes from rfc v1:
> > > > > > > * Rebase to latest master.
> > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > >
> > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > >
> > > > > > > Eugenio Pérez (21):
> > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > >    vhost: Add custom used buffer callback
> > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > >    vhost: Add SVQElement
> > > > > > >    vhost: Add svq copy desc mode
> > > > > > >    vhost: Add vhost_svq_inject
> > > > > > >    vhost: Update kernel headers
> > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > >    vhost: add vhost_svq_poll
> > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >    vdpa: Add x-cvq-svq
> > > > > > >
> > > > > > >   qapi/net.json                                |  13 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > >
> > > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-14  9:31             ` Eugenio Perez Martin
@ 2022-06-15  3:04               ` Jason Wang
  2022-06-15 10:02                 ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-15  3:04 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > configurations.
> > > > > > > >
> > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > changes on MAC addresses from the driver.
> > > > > > > >
> > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > >
> > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > >
> > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > >
> > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > >
> > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > CVQ message acknoledged by the device.
> > > > > > > >
> > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > memory like in the data virtqueues.
> > > > > > > >
> > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > >
> > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > because they are still not accepted in the kernel.
> > > > > > > >
> > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > vdpa device at device start.
> > > > > > > >
> > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > >
> > > > > > > > Comments are welcomed.
> > > > > > >
> > > > > > >
> > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > >
> > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > 2) ASID support for CVQ
> > > > > > >
> > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > >
> > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > issues.
> > > > > > >
> > > > > >
> > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > >
> > > > >
> > > > > On second thought, that order is kind of harder.
> > > > >
> > > > > If we only map CVQ buffers, we need to either:
> > > > > a. Copy them to controlled buffers
> > > > > b. Track properly when to unmap them
> > > >
> > > > Just to make sure we're at the same page:
> > > >
> > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > still using CVQ passthrough.
> > > >
> > >
> > > That would imply duplicating all the memory listener updates to both
> > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > but I'm not sure if it's worth it to do it that way.
> >
> > I don't get why it is related to memory listeners. The only change is
> >
> > 1) read the groups
> > 2) set cvq to be an independent asid
> > 3) update CVQ's IOTLB with its own ASID
> >
>
> How to track the mappings of step 3) without a copy?

So let me try to explain, what I propose is to split the patches. So
the above could be the first part. Since we know:

1) CVQ is passthrough to guest right now
2) We know CVQ will use an independent ASID

It doesn't harm to implement those first. It's unrelated to the policy
(e.g how to shadow CVQ).

>
> If we don't copy the buffers to qemu's IOVA, we need to track when to
> unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> so we need to refcount them (or similar solution).

Can we use fixed mapping instead of the dynamic ones?

>
> This series copies the buffers to an independent buffer in qemu memory
> to avoid that. Once you copy them, we have the problem you point at
> some patch later: The guest control buffers, so qemu must understand
> CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> At that point, we have this series except for the asid part and the
> injection of CVQ buffers at the LM destination, isn't it?

So we have several stuffs:

1) ASID support
2) Shadow CVQ only
3) State restoring

I hope we can split them into independent series. If we want to shadow
CVQ first, we need to prove that it is safe without ASID.

>
> CVQ buffers can be copied in the qemu IOVA space and be offered to the
> device. Much like SVQ vrings, the copied buffers will not be
> accessible from the guest. The hw device (as "non emulated cvq") will
> receive a lot of dma updates, but it's temporary. We can add ASID on
> top of that as a mean to:
> - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> - Not to pollute data plane DMA mappings.
>
> > ?
> >
> > >
> > > > Then do other stuff on top.
> > > >
> > > > >
> > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > (and unneeded in the final version) to know the size to copy.
> > > > > Alternative b. also requires things not needed in the final version,
> > > > > like to count the number of times each page is mapped and unmapped.
> > > > >
> > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > this series?
> > > >
> > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > try to peek/modify it?
> > > >
> > >
> > > It works the same way as data vqs, we're just updating the device
> > > model in the middle. It should imply the exact same risk as updating
> > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> >
> > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > is owned by guests.
> >
>
> The same way they control the data plane when all data virtqueues are
> shadowed for dirty page tracking (more on the risk of qemu updating
> the device model below).

Ok.

>
> > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > there's no way to prevent guests from accessing it?
> >
>
> With SVQ the memory exposed to the device is already shadowed. They
> cannot access the CVQ buffers memory the same way they cannot access
> the SVQ vrings.

Ok, I think I kind of get you, it looks like we have different
assumptions here: So if we only shadow CVQ, it will have security
issues, since RX/TX is not shadowed. If we shadow CVQ as well as
TX/RX, there's no security issue, since each IOVA is validated and the
descriptors are prepared by Qemu.

This goes back to another question, what's the order of the series.

Thanks


>
> > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > to exploit buffers owned by Qemu, it should be a bug.
> >
>
> The only extra step is the call to virtio_net_handle_ctrl_iov
> (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> in SVQ mode, it can exploit it too with other vhost backends as far as
> I see.
>
> > Thanks
> >
> > >
> > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > comments. That already meets use cases like rx filter notifications
> > > for devices with only one ASID.
> > >
>
> This part of my mail is not correct, we need to add a few patches of
> this series on top :). If not, it would be exploitable.
>
> Thanks!
>
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > there is still no way to enable SVQ.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > TODO:
> > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > >
> > > > > > > > Changes from rfc v7:
> > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > >    more memory listeners.
> > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > >
> > > > > > > > Changes from rfc v6:
> > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > >
> > > > > > > > Changes from rfc v5:
> > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > >
> > > > > > > > Changes from rfc v4:
> > > > > > > > * Add missing tracing
> > > > > > > > * Add multiqueue support
> > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > * Care with memory management
> > > > > > > >
> > > > > > > > Changes from rfc v3:
> > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > >
> > > > > > > > Changes from rfc v2:
> > > > > > > > * Fix use-after-free.
> > > > > > > >
> > > > > > > > Changes from rfc v1:
> > > > > > > > * Rebase to latest master.
> > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > >
> > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > >
> > > > > > > > Eugenio Pérez (21):
> > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > >    vhost: Add SVQElement
> > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > >    vhost: Update kernel headers
> > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > >
> > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-15  3:04               ` Jason Wang
@ 2022-06-15 10:02                 ` Eugenio Perez Martin
  2022-06-17  1:29                   ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-15 10:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > configurations.
> > > > > > > > >
> > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > >
> > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > >
> > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > >
> > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > >
> > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > >
> > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > >
> > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > memory like in the data virtqueues.
> > > > > > > > >
> > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > >
> > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > >
> > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > vdpa device at device start.
> > > > > > > > >
> > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > >
> > > > > > > > > Comments are welcomed.
> > > > > > > >
> > > > > > > >
> > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > >
> > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > 2) ASID support for CVQ
> > > > > > > >
> > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > >
> > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > issues.
> > > > > > > >
> > > > > > >
> > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > >
> > > > > >
> > > > > > On second thought, that order is kind of harder.
> > > > > >
> > > > > > If we only map CVQ buffers, we need to either:
> > > > > > a. Copy them to controlled buffers
> > > > > > b. Track properly when to unmap them
> > > > >
> > > > > Just to make sure we're at the same page:
> > > > >
> > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > still using CVQ passthrough.
> > > > >
> > > >
> > > > That would imply duplicating all the memory listener updates to both
> > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > but I'm not sure if it's worth it to do it that way.
> > >
> > > I don't get why it is related to memory listeners. The only change is
> > >
> > > 1) read the groups
> > > 2) set cvq to be an independent asid
> > > 3) update CVQ's IOTLB with its own ASID
> > >
> >
> > How to track the mappings of step 3) without a copy?
>
> So let me try to explain, what I propose is to split the patches. So
> the above could be the first part. Since we know:
>
> 1) CVQ is passthrough to guest right now
> 2) We know CVQ will use an independent ASID
>
> It doesn't harm to implement those first. It's unrelated to the policy
> (e.g how to shadow CVQ).
>
> >
> > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > so we need to refcount them (or similar solution).
>
> Can we use fixed mapping instead of the dynamic ones?
>

That implies either to implement something like a memory ring (size?),
or to effectively duplicate memory listener mappings.

I'm not against that, but it's something we need to remove on the
final solution. To use the order presented here will avoid that.

> >
> > This series copies the buffers to an independent buffer in qemu memory
> > to avoid that. Once you copy them, we have the problem you point at
> > some patch later: The guest control buffers, so qemu must understand
> > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > At that point, we have this series except for the asid part and the
> > injection of CVQ buffers at the LM destination, isn't it?
>
> So we have several stuffs:
>
> 1) ASID support
> 2) Shadow CVQ only
> 3) State restoring
>
> I hope we can split them into independent series. If we want to shadow
> CVQ first, we need to prove that it is safe without ASID.
>
> >
> > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > device. Much like SVQ vrings, the copied buffers will not be
> > accessible from the guest. The hw device (as "non emulated cvq") will
> > receive a lot of dma updates, but it's temporary. We can add ASID on
> > top of that as a mean to:
> > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > - Not to pollute data plane DMA mappings.
> >
> > > ?
> > >
> > > >
> > > > > Then do other stuff on top.
> > > > >
> > > > > >
> > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > >
> > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > this series?
> > > > >
> > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > try to peek/modify it?
> > > > >
> > > >
> > > > It works the same way as data vqs, we're just updating the device
> > > > model in the middle. It should imply the exact same risk as updating
> > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > >
> > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > is owned by guests.
> > >
> >
> > The same way they control the data plane when all data virtqueues are
> > shadowed for dirty page tracking (more on the risk of qemu updating
> > the device model below).
>
> Ok.
>
> >
> > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > there's no way to prevent guests from accessing it?
> > >
> >
> > With SVQ the memory exposed to the device is already shadowed. They
> > cannot access the CVQ buffers memory the same way they cannot access
> > the SVQ vrings.
>
> Ok, I think I kind of get you, it looks like we have different
> assumptions here: So if we only shadow CVQ, it will have security
> issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> TX/RX, there's no security issue, since each IOVA is validated and the
> descriptors are prepared by Qemu.
>

Right. I expected to maintain the all-shadowed-or-nothing behavior,
sorry if I was not clear.

> This goes back to another question, what's the order of the series.
>

I think that the shortest path is to follow the order of this series.
I tried to reorder your way, but ASID patches have to come with a lot
of CVQ patches if we want proper validation.

We can take the long route if we either implement a fixed ring buffer,
memory listener cloning, or another use case (sub-slicing?). But I
expect more issues to arise there.

I have another question actually, is it ok to implement the cvq use
case but not to merge the x-svq parameter? The more I think on the
parameter the more I see it's better to leave it as a separated patch
for testing until we shape the complete series and it's unneeded.

Thanks!

> Thanks
>
>
> >
> > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > to exploit buffers owned by Qemu, it should be a bug.
> > >
> >
> > The only extra step is the call to virtio_net_handle_ctrl_iov
> > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > in SVQ mode, it can exploit it too with other vhost backends as far as
> > I see.
> >
> > > Thanks
> > >
> > > >
> > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > comments. That already meets use cases like rx filter notifications
> > > > for devices with only one ASID.
> > > >
> >
> > This part of my mail is not correct, we need to add a few patches of
> > this series on top :). If not, it would be exploitable.
> >
> > Thanks!
> >
> > > > Thanks!
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > there is still no way to enable SVQ.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > TODO:
> > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > >
> > > > > > > > > Changes from rfc v7:
> > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > >    more memory listeners.
> > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > >
> > > > > > > > > Changes from rfc v6:
> > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > >
> > > > > > > > > Changes from rfc v5:
> > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > >
> > > > > > > > > Changes from rfc v4:
> > > > > > > > > * Add missing tracing
> > > > > > > > > * Add multiqueue support
> > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > * Care with memory management
> > > > > > > > >
> > > > > > > > > Changes from rfc v3:
> > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > >
> > > > > > > > > Changes from rfc v2:
> > > > > > > > > * Fix use-after-free.
> > > > > > > > >
> > > > > > > > > Changes from rfc v1:
> > > > > > > > > * Rebase to latest master.
> > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > >
> > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > >
> > > > > > > > > Eugenio Pérez (21):
> > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > >    vhost: Add SVQElement
> > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > >    vhost: Update kernel headers
> > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > >
> > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-15 10:02                 ` Eugenio Perez Martin
@ 2022-06-17  1:29                   ` Jason Wang
  2022-06-17  8:17                     ` Eugenio Perez Martin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Wang @ 2022-06-17  1:29 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > > configurations.
> > > > > > > > > >
> > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > >
> > > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > >
> > > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > > >
> > > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > > >
> > > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > >
> > > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > >
> > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > >
> > > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > >
> > > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > >
> > > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > > vdpa device at device start.
> > > > > > > > > >
> > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > >
> > > > > > > > > > Comments are welcomed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > > >
> > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > 2) ASID support for CVQ
> > > > > > > > >
> > > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > > >
> > > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > > issues.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > >
> > > > > > >
> > > > > > > On second thought, that order is kind of harder.
> > > > > > >
> > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > a. Copy them to controlled buffers
> > > > > > > b. Track properly when to unmap them
> > > > > >
> > > > > > Just to make sure we're at the same page:
> > > > > >
> > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > still using CVQ passthrough.
> > > > > >
> > > > >
> > > > > That would imply duplicating all the memory listener updates to both
> > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > but I'm not sure if it's worth it to do it that way.
> > > >
> > > > I don't get why it is related to memory listeners. The only change is
> > > >
> > > > 1) read the groups
> > > > 2) set cvq to be an independent asid
> > > > 3) update CVQ's IOTLB with its own ASID
> > > >
> > >
> > > How to track the mappings of step 3) without a copy?
> >
> > So let me try to explain, what I propose is to split the patches. So
> > the above could be the first part. Since we know:
> >
> > 1) CVQ is passthrough to guest right now
> > 2) We know CVQ will use an independent ASID
> >
> > It doesn't harm to implement those first. It's unrelated to the policy
> > (e.g how to shadow CVQ).
> >
> > >
> > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > so we need to refcount them (or similar solution).
> >
> > Can we use fixed mapping instead of the dynamic ones?
> >
>
> That implies either to implement something like a memory ring (size?),
> or to effectively duplicate memory listener mappings.

I'm not sure I get this.

But it's mainly the CVQ buffer + CVQ virtqueue.

It should be possible if:

1) allocate something like a buffer of several megabytes
2) only process one CVQ command from guest at once

?

>
> I'm not against that, but it's something we need to remove on the
> final solution. To use the order presented here will avoid that.
>
> > >
> > > This series copies the buffers to an independent buffer in qemu memory
> > > to avoid that. Once you copy them, we have the problem you point at
> > > some patch later: The guest control buffers, so qemu must understand
> > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > At that point, we have this series except for the asid part and the
> > > injection of CVQ buffers at the LM destination, isn't it?
> >
> > So we have several stuffs:
> >
> > 1) ASID support
> > 2) Shadow CVQ only
> > 3) State restoring
> >
> > I hope we can split them into independent series. If we want to shadow
> > CVQ first, we need to prove that it is safe without ASID.
> >
> > >
> > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > device. Much like SVQ vrings, the copied buffers will not be
> > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > top of that as a mean to:
> > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > - Not to pollute data plane DMA mappings.
> > >
> > > > ?
> > > >
> > > > >
> > > > > > Then do other stuff on top.
> > > > > >
> > > > > > >
> > > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > > >
> > > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > > this series?
> > > > > >
> > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > > try to peek/modify it?
> > > > > >
> > > > >
> > > > > It works the same way as data vqs, we're just updating the device
> > > > > model in the middle. It should imply the exact same risk as updating
> > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > >
> > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > is owned by guests.
> > > >
> > >
> > > The same way they control the data plane when all data virtqueues are
> > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > the device model below).
> >
> > Ok.
> >
> > >
> > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > there's no way to prevent guests from accessing it?
> > > >
> > >
> > > With SVQ the memory exposed to the device is already shadowed. They
> > > cannot access the CVQ buffers memory the same way they cannot access
> > > the SVQ vrings.
> >
> > Ok, I think I kind of get you, it looks like we have different
> > assumptions here: So if we only shadow CVQ, it will have security
> > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > TX/RX, there's no security issue, since each IOVA is validated and the
> > descriptors are prepared by Qemu.
> >
>
> Right. I expected to maintain the all-shadowed-or-nothing behavior,
> sorry if I was not clear.
>
> > This goes back to another question, what's the order of the series.
> >
>
> I think that the shortest path is to follow the order of this series.
> I tried to reorder your way, but ASID patches have to come with a lot
> of CVQ patches if we want proper validation.

Ok, so if this is the case, let's just split this series and keep the order.

>
> We can take the long route if we either implement a fixed ring buffer,
> memory listener cloning, or another use case (sub-slicing?). But I
> expect more issues to arise there.
>
> I have another question actually, is it ok to implement the cvq use
> case but not to merge the x-svq parameter? The more I think on the
> parameter the more I see it's better to leave it as a separated patch
> for testing until we shape the complete series and it's unneeded.

That's fine.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > to exploit buffers owned by Qemu, it should be a bug.
> > > >
> > >
> > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > I see.
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > comments. That already meets use cases like rx filter notifications
> > > > > for devices with only one ASID.
> > > > >
> > >
> > > This part of my mail is not correct, we need to add a few patches of
> > > this series on top :). If not, it would be exploitable.
> > >
> > > Thanks!
> > >
> > > > > Thanks!
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > > there is still no way to enable SVQ.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > TODO:
> > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > > >    more memory listeners.
> > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > * Add missing tracing
> > > > > > > > > > * Add multiqueue support
> > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > * Care with memory management
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > * Fix use-after-free.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > >
> > > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > >
> > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > >
> > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-17  1:29                   ` Jason Wang
@ 2022-06-17  8:17                     ` Eugenio Perez Martin
  2022-06-20  5:07                       ` Jason Wang
  0 siblings, 1 reply; 51+ messages in thread
From: Eugenio Perez Martin @ 2022-06-17  8:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Fri, Jun 17, 2022 at 3:29 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > > > configurations.
> > > > > > > > > > >
> > > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > > >
> > > > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > > >
> > > > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > > > >
> > > > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > > > >
> > > > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > > >
> > > > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > > >
> > > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > > >
> > > > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > > >
> > > > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > > >
> > > > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > > > vdpa device at device start.
> > > > > > > > > > >
> > > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > > >
> > > > > > > > > > > Comments are welcomed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > > > >
> > > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > > 2) ASID support for CVQ
> > > > > > > > > >
> > > > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > > > >
> > > > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > > > issues.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > > >
> > > > > > > >
> > > > > > > > On second thought, that order is kind of harder.
> > > > > > > >
> > > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > > a. Copy them to controlled buffers
> > > > > > > > b. Track properly when to unmap them
> > > > > > >
> > > > > > > Just to make sure we're at the same page:
> > > > > > >
> > > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > > still using CVQ passthrough.
> > > > > > >
> > > > > >
> > > > > > That would imply duplicating all the memory listener updates to both
> > > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > > but I'm not sure if it's worth it to do it that way.
> > > > >
> > > > > I don't get why it is related to memory listeners. The only change is
> > > > >
> > > > > 1) read the groups
> > > > > 2) set cvq to be an independent asid
> > > > > 3) update CVQ's IOTLB with its own ASID
> > > > >
> > > >
> > > > How to track the mappings of step 3) without a copy?
> > >
> > > So let me try to explain, what I propose is to split the patches. So
> > > the above could be the first part. Since we know:
> > >
> > > 1) CVQ is passthrough to guest right now
> > > 2) We know CVQ will use an independent ASID
> > >
> > > It doesn't harm to implement those first. It's unrelated to the policy
> > > (e.g how to shadow CVQ).
> > >
> > > >
> > > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > > so we need to refcount them (or similar solution).
> > >
> > > Can we use fixed mapping instead of the dynamic ones?
> > >
> >
> > That implies either to implement something like a memory ring (size?),
> > or to effectively duplicate memory listener mappings.
>
> I'm not sure I get this.
>
> But it's mainly the CVQ buffer + CVQ virtqueue.
>
> It should be possible if:
>
> 1) allocate something like a buffer of several megabytes

It's technically possible, but we need to deal with situations that do
not happen in the final version once we teach qemu how to deal with
CVQ. For example, what do we do if it does not fit?

Current workflow deals with it automatically, as we teach qemu about
CVQ before splitting it to a separated ASID. The big buffer looks like
a good *transversal* optimization to me. For example, when indirect
descriptors are supported, we will need something like that to not
abuse map/unmap ops. CVQ can use it too. But it will be better if we
provide it with a good default + tunable IMO.

> 2) only process one CVQ command from guest at once
>

I don't get why it's needed, it's to make sure CVQ never fills that
buffer? It should be easy to copy as many guest's CVQ buffers as
possible there and then stop when it's full.

> ?
>
> >
> > I'm not against that, but it's something we need to remove on the
> > final solution. To use the order presented here will avoid that.
> >
> > > >
> > > > This series copies the buffers to an independent buffer in qemu memory
> > > > to avoid that. Once you copy them, we have the problem you point at
> > > > some patch later: The guest control buffers, so qemu must understand
> > > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > > At that point, we have this series except for the asid part and the
> > > > injection of CVQ buffers at the LM destination, isn't it?
> > >
> > > So we have several stuffs:
> > >
> > > 1) ASID support
> > > 2) Shadow CVQ only
> > > 3) State restoring
> > >
> > > I hope we can split them into independent series. If we want to shadow
> > > CVQ first, we need to prove that it is safe without ASID.
> > >
> > > >
> > > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > > device. Much like SVQ vrings, the copied buffers will not be
> > > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > > top of that as a mean to:
> > > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > > - Not to pollute data plane DMA mappings.
> > > >
> > > > > ?
> > > > >
> > > > > >
> > > > > > > Then do other stuff on top.
> > > > > > >
> > > > > > > >
> > > > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > > > >
> > > > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > > > this series?
> > > > > > >
> > > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > > > try to peek/modify it?
> > > > > > >
> > > > > >
> > > > > > It works the same way as data vqs, we're just updating the device
> > > > > > model in the middle. It should imply the exact same risk as updating
> > > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > > >
> > > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > > is owned by guests.
> > > > >
> > > >
> > > > The same way they control the data plane when all data virtqueues are
> > > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > > the device model below).
> > >
> > > Ok.
> > >
> > > >
> > > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > > there's no way to prevent guests from accessing it?
> > > > >
> > > >
> > > > With SVQ the memory exposed to the device is already shadowed. They
> > > > cannot access the CVQ buffers memory the same way they cannot access
> > > > the SVQ vrings.
> > >
> > > Ok, I think I kind of get you, it looks like we have different
> > > assumptions here: So if we only shadow CVQ, it will have security
> > > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > > TX/RX, there's no security issue, since each IOVA is validated and the
> > > descriptors are prepared by Qemu.
> > >
> >
> > Right. I expected to maintain the all-shadowed-or-nothing behavior,
> > sorry if I was not clear.
> >
> > > This goes back to another question, what's the order of the series.
> > >
> >
> > I think that the shortest path is to follow the order of this series.
> > I tried to reorder your way, but ASID patches have to come with a lot
> > of CVQ patches if we want proper validation.
>
> Ok, so if this is the case, let's just split this series and keep the order.
>
> >
> > We can take the long route if we either implement a fixed ring buffer,
> > memory listener cloning, or another use case (sub-slicing?). But I
> > expect more issues to arise there.
> >
> > I have another question actually, is it ok to implement the cvq use
> > case but not to merge the x-svq parameter? The more I think on the
> > parameter the more I see it's better to leave it as a separated patch
> > for testing until we shape the complete series and it's unneeded.
>
> That's fine.
>
> Thanks
>
> >
> > Thanks!
> >
> > > Thanks
> > >
> > >
> > > >
> > > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > > to exploit buffers owned by Qemu, it should be a bug.
> > > > >
> > > >
> > > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > > I see.
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > > comments. That already meets use cases like rx filter notifications
> > > > > > for devices with only one ASID.
> > > > > >
> > > >
> > > > This part of my mail is not correct, we need to add a few patches of
> > > > this series on top :). If not, it would be exploitable.
> > > >
> > > > Thanks!
> > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > > > there is still no way to enable SVQ.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > TODO:
> > > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > > > >    more memory listeners.
> > > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > > * Add missing tracing
> > > > > > > > > > > * Add multiqueue support
> > > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > > * Care with memory management
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > > * Fix use-after-free.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > > >
> > > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > > >
> > > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
  2022-06-17  8:17                     ` Eugenio Perez Martin
@ 2022-06-20  5:07                       ` Jason Wang
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Wang @ 2022-06-20  5:07 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Gautam Dawar, Michael S. Tsirkin, Markus Armbruster,
	Gonglei (Arei),
	Harpreet Singh Anand, Cornelia Huck, Zhu Lingshan,
	Laurent Vivier, Eli Cohen, Paolo Bonzini, Liuxiangdong,
	Eric Blake, Cindy Lu, Parav Pandit

On Fri, Jun 17, 2022 at 4:17 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Fri, Jun 17, 2022 at 3:29 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > > > > configurations.
> > > > > > > > > > > >
> > > > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > > > >
> > > > > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > > > >
> > > > > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > > > > >
> > > > > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > > > > >
> > > > > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > > > >
> > > > > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > > > >
> > > > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > > > >
> > > > > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > > > >
> > > > > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > > > >
> > > > > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > > > > vdpa device at device start.
> > > > > > > > > > > >
> > > > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > > > >
> > > > > > > > > > > > Comments are welcomed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > > > > >
> > > > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > > > 2) ASID support for CVQ
> > > > > > > > > > >
> > > > > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > > > > >
> > > > > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > > > > issues.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > On second thought, that order is kind of harder.
> > > > > > > > >
> > > > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > > > a. Copy them to controlled buffers
> > > > > > > > > b. Track properly when to unmap them
> > > > > > > >
> > > > > > > > Just to make sure we're at the same page:
> > > > > > > >
> > > > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > > > still using CVQ passthrough.
> > > > > > > >
> > > > > > >
> > > > > > > That would imply duplicating all the memory listener updates to both
> > > > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > > > but I'm not sure if it's worth it to do it that way.
> > > > > >
> > > > > > I don't get why it is related to memory listeners. The only change is
> > > > > >
> > > > > > 1) read the groups
> > > > > > 2) set cvq to be an independent asid
> > > > > > 3) update CVQ's IOTLB with its own ASID
> > > > > >
> > > > >
> > > > > How to track the mappings of step 3) without a copy?
> > > >
> > > > So let me try to explain, what I propose is to split the patches. So
> > > > the above could be the first part. Since we know:
> > > >
> > > > 1) CVQ is passthrough to guest right now
> > > > 2) We know CVQ will use an independent ASID
> > > >
> > > > It doesn't harm to implement those first. It's unrelated to the policy
> > > > (e.g how to shadow CVQ).
> > > >
> > > > >
> > > > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > > > so we need to refcount them (or similar solution).
> > > >
> > > > Can we use fixed mapping instead of the dynamic ones?
> > > >
> > >
> > > That implies either to implement something like a memory ring (size?),
> > > or to effectively duplicate memory listener mappings.
> >
> > I'm not sure I get this.
> >
> > But it's mainly the CVQ buffer + CVQ virtqueue.
> >
> > It should be possible if:
> >
> > 1) allocate something like a buffer of several megabytes
>
> It's technically possible, but we need to deal with situations that do
> not happen in the final version once we teach qemu how to deal with
> CVQ. For example, what do we do if it does not fit?

Then double the size of the area? For CVQ, Qemu should know the
maximum size of the request, otherwise it would be another blocker for
live migration.

>
> Current workflow deals with it automatically, as we teach qemu about
> CVQ before splitting it to a separated ASID. The big buffer looks like
> a good *transversal* optimization to me. For example, when indirect
> descriptors are supported, we will need something like that to not
> abuse map/unmap ops. CVQ can use it too. But it will be better if we
> provide it with a good default + tunable IMO.

That's fine.

>
> > 2) only process one CVQ command from guest at once
> >
>
> I don't get why it's needed, it's to make sure CVQ never fills that
> buffer? It should be easy to copy as many guest's CVQ buffers as
> possible there and then stop when it's full.

It's not a must, just a proposal to start from something that is simpler ...

Thanks

>
> > ?
> >
> > >
> > > I'm not against that, but it's something we need to remove on the
> > > final solution. To use the order presented here will avoid that.
> > >
> > > > >
> > > > > This series copies the buffers to an independent buffer in qemu memory
> > > > > to avoid that. Once you copy them, we have the problem you point at
> > > > > some patch later: The guest control buffers, so qemu must understand
> > > > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > > > At that point, we have this series except for the asid part and the
> > > > > injection of CVQ buffers at the LM destination, isn't it?
> > > >
> > > > So we have several stuffs:
> > > >
> > > > 1) ASID support
> > > > 2) Shadow CVQ only
> > > > 3) State restoring
> > > >
> > > > I hope we can split them into independent series. If we want to shadow
> > > > CVQ first, we need to prove that it is safe without ASID.
> > > >
> > > > >
> > > > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > > > device. Much like SVQ vrings, the copied buffers will not be
> > > > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > > > top of that as a mean to:
> > > > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > > > - Not to pollute data plane DMA mappings.
> > > > >
> > > > > > ?
> > > > > >
> > > > > > >
> > > > > > > > Then do other stuff on top.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > > > > >
> > > > > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > > > > this series?
> > > > > > > >
> > > > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > > > > try to peek/modify it?
> > > > > > > >
> > > > > > >
> > > > > > > It works the same way as data vqs, we're just updating the device
> > > > > > > model in the middle. It should imply the exact same risk as updating
> > > > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > > > >
> > > > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > > > is owned by guests.
> > > > > >
> > > > >
> > > > > The same way they control the data plane when all data virtqueues are
> > > > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > > > the device model below).
> > > >
> > > > Ok.
> > > >
> > > > >
> > > > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > > > there's no way to prevent guests from accessing it?
> > > > > >
> > > > >
> > > > > With SVQ the memory exposed to the device is already shadowed. They
> > > > > cannot access the CVQ buffers memory the same way they cannot access
> > > > > the SVQ vrings.
> > > >
> > > > Ok, I think I kind of get you, it looks like we have different
> > > > assumptions here: So if we only shadow CVQ, it will have security
> > > > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > > > TX/RX, there's no security issue, since each IOVA is validated and the
> > > > descriptors are prepared by Qemu.
> > > >
> > >
> > > Right. I expected to maintain the all-shadowed-or-nothing behavior,
> > > sorry if I was not clear.
> > >
> > > > This goes back to another question, what's the order of the series.
> > > >
> > >
> > > I think that the shortest path is to follow the order of this series.
> > > I tried to reorder your way, but ASID patches have to come with a lot
> > > of CVQ patches if we want proper validation.
> >
> > Ok, so if this is the case, let's just split this series and keep the order.
> >
> > >
> > > We can take the long route if we either implement a fixed ring buffer,
> > > memory listener cloning, or another use case (sub-slicing?). But I
> > > expect more issues to arise there.
> > >
> > > I have another question actually, is it ok to implement the cvq use
> > > case but not to merge the x-svq parameter? The more I think on the
> > > parameter the more I see it's better to leave it as a separated patch
> > > for testing until we shape the complete series and it's unneeded.
> >
> > That's fine.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > > > to exploit buffers owned by Qemu, it should be a bug.
> > > > > >
> > > > >
> > > > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > > > I see.
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > > > comments. That already meets use cases like rx filter notifications
> > > > > > > for devices with only one ASID.
> > > > > > >
> > > > >
> > > > > This part of my mail is not correct, we need to add a few patches of
> > > > > this series on top :). If not, it would be exploitable.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > > > > there is still no way to enable SVQ.
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > > Thoughts?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > TODO:
> > > > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > > > > >    more memory listeners.
> > > > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > > > * Add missing tracing
> > > > > > > > > > > > * Add multiqueue support
> > > > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > > > * Care with memory management
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > > > * Fix use-after-free.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > > > >
> > > > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > > > >
> > > > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-06-20  5:09 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 19:12 [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
2022-06-07  6:13   ` Jason Wang
2022-06-08 16:30     ` Eugenio Perez Martin
2022-05-19 19:12 ` [RFC PATCH v8 02/21] vhost: Add custom used buffer callback Eugenio Pérez
2022-06-07  6:12   ` Jason Wang
2022-06-08 19:38     ` Eugenio Perez Martin
2022-05-19 19:12 ` [RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
2022-06-07  6:05   ` Jason Wang
2022-06-08 16:38     ` Eugenio Perez Martin
2022-05-19 19:12 ` [RFC PATCH v8 04/21] virtio: Make virtqueue_alloc_element non-static Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 05/21] vhost: Add vhost_iova_tree_find Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 06/21] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 07/21] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 08/21] vhost: Add SVQElement Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 09/21] vhost: Add svq copy desc mode Eugenio Pérez
2022-06-08  4:14   ` Jason Wang
2022-06-08 19:02     ` Eugenio Perez Martin
2022-06-09  7:00       ` Jason Wang
2022-05-19 19:12 ` [RFC PATCH v8 10/21] vhost: Add vhost_svq_inject Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 11/21] vhost: Update kernel headers Eugenio Pérez
2022-06-08  4:18   ` Jason Wang
2022-06-08 19:04     ` Eugenio Perez Martin
2022-05-19 19:12 ` [RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
2022-06-08  4:20   ` Jason Wang
2022-06-08 19:06     ` Eugenio Perez Martin
2022-05-19 19:12 ` [RFC PATCH v8 13/21] vhost: Add ShadowVirtQueueStart operation Eugenio Pérez
2022-05-19 19:12 ` [RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
2022-06-08  4:25   ` Jason Wang
2022-06-08 19:21     ` Eugenio Perez Martin
2022-06-09  7:13       ` Jason Wang
2022-06-09  7:51         ` Eugenio Perez Martin
2022-05-19 19:13 ` [RFC PATCH v8 15/21] vhost: add vhost_svq_poll Eugenio Pérez
2022-05-19 19:13 ` [RFC PATCH v8 16/21] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
2022-05-19 19:13 ` [RFC PATCH v8 17/21] vdpa: Add asid attribute to vdpa device Eugenio Pérez
2022-05-19 19:13 ` [RFC PATCH v8 18/21] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
2022-05-19 19:13 ` [RFC PATCH v8 19/21] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
2022-05-19 19:13 ` [RFC PATCH v8 20/21] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
2022-05-19 19:13 ` [RFC PATCH v8 21/21] vdpa: Add x-cvq-svq Eugenio Pérez
2022-06-08  5:51 ` [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ Jason Wang
2022-06-08 19:28   ` Eugenio Perez Martin
2022-06-13 16:31     ` Eugenio Perez Martin
2022-06-14  8:01       ` Jason Wang
2022-06-14  8:13         ` Eugenio Perez Martin
2022-06-14  8:20           ` Jason Wang
2022-06-14  9:31             ` Eugenio Perez Martin
2022-06-15  3:04               ` Jason Wang
2022-06-15 10:02                 ` Eugenio Perez Martin
2022-06-17  1:29                   ` Jason Wang
2022-06-17  8:17                     ` Eugenio Perez Martin
2022-06-20  5:07                       ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.