All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v9 00/23] Net Control VQ support in SVQ
@ 2022-07-06 18:39 Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun Eugenio Pérez
                   ` (22 more replies)
  0 siblings, 23 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support multiqueue and other
configurations.

Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
states, effectively intercepting them so qemu can track what regions of memory
are dirty because device action and needs migration. However, this does not
solve networking device state seen by the driver because CVQ messages, like
changes on MAC addresses from the driver.

This series uses SVQ infraestructure to intercept networking control messages
used by the device. This way, qemu is able to update VirtIONet device model and
to migrate it.

To intercept all queues slows device data forwarding, so this is not the final
solution. To solve that, only the CVQ must be intercepted all the time. This
will be achieved in future revisions using the ASID infraestructure, that
allows different translations for different virtqueues.

Another pending item is to move data virtqueues from passthrough mode to SVQ
one. To achieve that, a reliable way to obtain the vq state is needed. STOP
ioctl will be added for that.

To intercept all the virtqueues and update the qemu nic device model as the
guest changes the device state add the cmdline opt x-svq=on:

-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=on

Lastly, device state is sent each time qemu starts the device, using SVQ to
inject commands through CVQ. This allows the guest to transparently see the
same guest-visible state at resume.

First two patches reorder code so it's easier to apply later patches on top of
the code base.

Third patch reorders the device ok and the set_vring_enable ioctl sending. This
is done so CVQ commands reach the device before the device have the chance to
use rx queues, with incorrect data.

Fourth patch replaces the way of getting vq state. Since qemu will be able to
inject buffers, device's used_idx is not valid anymore and we must use
guest-visible one.

Fifth patch creates the API in SVQ to call when device start. This will allow
vhost-vdpa net to inject control commands before the rest of queues start.

Sixth path enables SVQ to return buffers externally. While it's not possible at
this point in the series, CVQ will need to return the available buffers

Patches 8-12 enables SVQ to communicate the caller of SVQ context data of the
used buffer.

Patch 13 enables vhost-vdpa net to inject buffers to the device. This will be
used both to inject the state at the beginning and to decouple guest's CVQ
buffers from the ones sent to the device. This brings protections against
TOCTOU, avoiding the device and qemu to see different messages. In the future,
this may also be used to emulate _F_ANNOUNCE.

The previous patch and patches from 14 to 17 makes SVQ capable of being
inspected.

Patches 18 to 20 enable the update of the virtio-net device model for each
CVQ message acknoledged by the device.

Patches 21-22 enables the update of the device configuration right at start.

Finally, last commit enables x-svq parameter.

Comments are welcomed.

TODO:
* Review failure paths, some are with TODO notes, other don't.

Changes from rfc v8:
* Remove ASID part. Delete x-svq-cvq mode too.
* Move all DMA memory management to net/vhost-vdpa, instead of svq.
* Use of qemu_real_host_page_size.
* Improved doc, general fixes.

Changes from rfc v7:
* Don't map all guest space in ASID 1 but copy all the buffers. No need for
  more memory listeners.
* Move net backend start callback to SVQ.
* Wait for device CVQ commands used by the device at SVQ start, avoiding races.
* Changed ioctls, but they're provisional anyway.
* Reorder commits so refactor and code adding ones are closer to usage.
* Usual cleaning: better tracing, doc, patches messages, ...

Changes from rfc v6:
* Fix bad iotlb updates order when batching was enabled
* Add reference counting to iova_tree so cleaning is simpler.

Changes from rfc v5:
* Fixes bad calculus of cvq end group when MQ is not acked by the guest.

Changes from rfc v4:
* Add missing tracing
* Add multiqueue support
* Use already sent version for replacing g_memdup
* Care with memory management

Changes from rfc v3:
* Fix bad returning of descriptors to SVQ list.

Changes from rfc v2:
* Fix use-after-free.

Changes from rfc v1:
* Rebase to latest master.
* Configure ASID instead of assuming cvq asid != data vqs asid.
* Update device model so (MAC) state can be migrated too.

Eugenio Pérez (23):
  vhost: Return earlier if used buffers overrun
  vhost: move descriptor translation to vhost_svq_vring_write_descs
  vdpa: delay set_vring_ready after DRIVER_OK
  vhost: Get vring base from vq, not svq
  vhost: Add ShadowVirtQueueStart operation
  virtio-net: Expose ctrl virtqueue logic
  vhost: add vhost_svq_push_elem
  vhost: Decouple vhost_svq_add_split from VirtQueueElement
  vhost: Add SVQElement
  vhost: Reorder vhost_svq_last_desc_of_chain
  vhost: Move last chain id to SVQ element
  vhost: Add opaque member to SVQElement
  vhost: Add vhost_svq_inject
  vhost: add vhost_svq_poll
  vhost: Add custom used buffer callback
  vhost: Add svq avail_handler callback
  vhost: add detach SVQ operation
  vdpa: Export vhost_vdpa_dma_map and unmap calls
  vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  vdpa: Buffer CVQ support on shadow virtqueue
  vdpa: Add vhost_vdpa_start_control_svq
  vdpa: Inject virtio-net mac address via CVQ at start
  vdpa: Add x-svq to NetdevVhostVDPAOptions

 qapi/net.json                      |   9 +-
 hw/virtio/vhost-shadow-virtqueue.h |  67 +++-
 include/hw/virtio/vhost-vdpa.h     |   7 +
 include/hw/virtio/virtio-net.h     |   4 +
 include/hw/virtio/virtio.h         |   1 +
 hw/net/virtio-net.c                |  84 +++--
 hw/virtio/vhost-shadow-virtqueue.c | 284 ++++++++++++---
 hw/virtio/vhost-vdpa.c             |  54 ++-
 hw/virtio/virtio.c                 |   5 +
 net/vhost-vdpa.c                   | 541 ++++++++++++++++++++++++++++-
 10 files changed, 935 insertions(+), 121 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-08  8:52   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 02/23] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Previous function misses the just picked avail buffer from the queue.
This way keeps blocking the used queue forever, but is cleaner to check
before calling to vhost_svq_get_buf.

Fixes: 100890f7cad50 ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 56c96ebd13..9280285435 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -405,19 +405,21 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
         vhost_svq_disable_notification(svq);
         while (true) {
             uint32_t len;
-            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
-            if (!elem) {
-                break;
-            }
+            g_autofree VirtQueueElement *elem = NULL;
 
             if (unlikely(i >= svq->vring.num)) {
                 qemu_log_mask(LOG_GUEST_ERROR,
                          "More than %u used buffers obtained in a %u size SVQ",
                          i, svq->vring.num);
-                virtqueue_fill(vq, elem, len, i);
-                virtqueue_flush(vq, i);
+                virtqueue_flush(vq, svq->vring.num);
                 return;
             }
+
+            elem = vhost_svq_get_buf(svq, &len);
+            if (!elem) {
+                break;
+            }
+
             virtqueue_fill(vq, elem, len, i++);
         }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 02/23] vhost: move descriptor translation to vhost_svq_vring_write_descs
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

It's done for both in and out descriptors so it's better placed here.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 39 +++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 9280285435..2939f4a243 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -122,17 +122,35 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
     return true;
 }
 
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-                                    const struct iovec *iovec, size_t num,
-                                    bool more_descs, bool write)
+/**
+ * Write descriptors to SVQ vring
+ *
+ * @svq: The shadow virtqueue
+ * @sg: Cache for hwaddr
+ * @iovec: The iovec from the guest
+ * @num: iovec length
+ * @more_descs: True if more descriptors come in the chain
+ * @write: True if they are writeable descriptors
+ *
+ * Return true if success, false otherwise and print error.
+ */
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+                                        const struct iovec *iovec, size_t num,
+                                        bool more_descs, bool write)
 {
     uint16_t i = svq->free_head, last = svq->free_head;
     unsigned n;
     uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
     vring_desc_t *descs = svq->vring.desc;
+    bool ok;
 
     if (num == 0) {
-        return;
+        return true;
+    }
+
+    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+    if (unlikely(!ok)) {
+        return false;
     }
 
     for (n = 0; n < num; n++) {
@@ -150,6 +168,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     }
 
     svq->free_head = le16_to_cpu(svq->desc_next[last]);
+    return true;
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -169,21 +188,19 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
         return false;
     }
 
-    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+                                     elem->in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
-    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-                            elem->in_num > 0, false);
-
 
-    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
+                                     true);
     if (unlikely(!ok)) {
+        /* TODO unwind out_sg */
         return false;
     }
 
-    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-
     /*
      * Put the entry in the available array (but don't update avail->idx until
      * they do sync).
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 02/23] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-08  9:06   ` Jason Wang
  2022-07-13  5:51   ` Michael S. Tsirkin
  2022-07-06 18:39 ` [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq Eugenio Pérez
                   ` (19 subsequent siblings)
  22 siblings, 2 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

To restore the device in the destination of a live migration we send the
commands through control virtqueue. For a device to read CVQ it must
have received DRIVER_OK status bit.

However this open a window where the device could start receiving
packets in rx queue 0 before it receive the RSS configuration. To avoid
that, we will not send vring_enable until all configuration is used by
the device.

As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 66f054a12c..2ee8009594 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
     return idx;
 }
 
+/**
+ * Set ready all vring of the device
+ *
+ * @dev: Vhost device
+ */
 static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
 {
     int i;
     trace_vhost_vdpa_set_vring_ready(dev);
-    for (i = 0; i < dev->nvqs; ++i) {
+    for (i = 0; i < dev->vq_index_end; ++i) {
         struct vhost_vring_state state = {
-            .index = dev->vq_index + i,
+            .index = i,
             .num = 1,
         };
         vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
@@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(!ok)) {
             return -1;
         }
-        vhost_vdpa_set_vring_ready(dev);
     } else {
         ok = vhost_vdpa_svqs_stop(dev);
         if (unlikely(!ok)) {
@@ -1111,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     }
 
     if (started) {
+        int r;
+
         memory_listener_register(&v->listener, &address_space_memory);
-        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+        if (unlikely(r)) {
+            return r;
+        }
+        vhost_vdpa_set_vring_ready(dev);
     } else {
         vhost_vdpa_reset_device(dev);
         vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                    VIRTIO_CONFIG_S_DRIVER);
         memory_listener_unregister(&v->listener);
-
-        return 0;
     }
+
+    return 0;
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-08  9:12   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 05/23] vhost: Add ShadowVirtQueueStart operation Eugenio Pérez
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

The used idx used to match with this, but it will not match from the
moment we introduce svq_inject. Rewind all the descriptors not used by
vdpa device and get the vq state properly.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio.h | 1 +
 hw/virtio/vhost-vdpa.c     | 7 +++----
 hw/virtio/virtio.c         | 5 +++++
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..4b51ab9d06 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
 unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
+unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
                                      unsigned int idx);
 void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2ee8009594..de76128030 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
     struct vhost_vdpa *v = dev->opaque;
-    int vdpa_idx = ring->index - dev->vq_index;
     int ret;
 
     if (v->shadow_vqs_enabled) {
-        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
-
+        const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
         /*
          * Setting base as last used idx, so destination will see as available
          * all the entries that the device did not use, including the in-flight
@@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
          * TODO: This is ok for networking, but other kinds of devices might
          * have problems with these retransmissions.
          */
-        ring->num = svq->last_used_idx;
+        ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index) -
+                    virtio_queue_get_in_use(vq);
         return 0;
     }
 
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..e02656f7a2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3420,6 +3420,11 @@ unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
     }
 }
 
+unsigned int virtio_queue_get_in_use(const VirtQueue *vq)
+{
+    return vq->inuse;
+}
+
 static void virtio_queue_packed_set_last_avail_idx(VirtIODevice *vdev,
                                                    int n, unsigned int idx)
 {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 05/23] vhost: Add ShadowVirtQueueStart operation
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 06/23] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

It allows to run commands at SVQ start.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 18 +++++++++++++++++-
 hw/virtio/vhost-shadow-virtqueue.c |  8 +++++++-
 hw/virtio/vhost-vdpa.c             | 17 ++++++++++++++++-
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index c132c994e9..91c31715d9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,14 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
+                                    void *opaque);
+
+typedef struct VhostShadowVirtqueueOps {
+    ShadowVirtQueueStart start;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -59,6 +67,12 @@ typedef struct VhostShadowVirtqueue {
      */
     uint16_t *desc_next;
 
+    /* Caller callbacks */
+    const VhostShadowVirtqueueOps *ops;
+
+    /* Caller callbacks opaque */
+    void *ops_opaque;
+
     /* Next head to expose to the device */
     uint16_t shadow_avail_idx;
 
@@ -85,7 +99,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops,
+                                    void *ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 2939f4a243..be64e0b85c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -626,12 +626,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ owner callbacks
+ * @ops_opaque: ops opaque pointer
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops,
+                                    void *ops_opaque)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -653,6 +657,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
+    svq->ops = ops;
+    svq->ops_opaque = ops_opaque;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index de76128030..69cfaf05d6 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -418,7 +418,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
+        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree, NULL,
+                                                            NULL);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
@@ -1122,6 +1123,20 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (unlikely(r)) {
             return r;
         }
+
+        if (v->shadow_vqs_enabled) {
+            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+                                                              i);
+                if (svq->ops) {
+                    r = svq->ops->start(svq, svq->ops_opaque);
+                    if (unlikely(r)) {
+                        return r;
+                    }
+                }
+            }
+        }
+
         vhost_vdpa_set_vring_ready(dev);
     } else {
         vhost_vdpa_reset_device(dev);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 06/23] virtio-net: Expose ctrl virtqueue logic
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 05/23] vhost: Add ShadowVirtQueueStart operation Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 07/23] vhost: add vhost_svq_push_elem Eugenio Pérez
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

This allows external vhost-net devices to modify the state of the
VirtIO device model once vhost-vdpa device has acknowledge the control
commands.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio-net.h |  4 ++
 hw/net/virtio-net.c            | 84 ++++++++++++++++++++--------------
 2 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032627..42caea0d1d 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -218,6 +218,10 @@ struct VirtIONet {
     struct EBPFRSSContext ebpf_rss;
 };
 
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                  const struct iovec *in_sg, unsigned in_num,
+                                  const struct iovec *out_sg,
+                                  unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                    const char *type);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ad948ee7c..53bb92c9f1 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1434,57 +1434,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
     return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                  const struct iovec *in_sg, unsigned in_num,
+                                  const struct iovec *out_sg,
+                                  unsigned out_num)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_ctrl_hdr ctrl;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-    VirtQueueElement *elem;
     size_t s;
     struct iovec *iov, *iov2;
-    unsigned int iov_cnt;
+
+    if (iov_size(in_sg, in_num) < sizeof(status) ||
+        iov_size(out_sg, out_num) < sizeof(ctrl)) {
+        virtio_error(vdev, "virtio-net ctrl missing headers");
+        return 0;
+    }
+
+    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
+    iov_discard_front(&iov, &out_num, sizeof(ctrl));
+    if (s != sizeof(ctrl)) {
+        status = VIRTIO_NET_ERR;
+    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+    }
+
+    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
+    assert(s == sizeof(status));
+
+    g_free(iov2);
+    return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtQueueElement *elem;
 
     for (;;) {
+        size_t written;
         elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
         if (!elem) {
             break;
         }
-        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-            virtio_error(vdev, "virtio-net ctrl missing headers");
+
+        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+                                             elem->out_sg, elem->out_num);
+        if (written > 0) {
+            virtqueue_push(vq, elem, written);
+            virtio_notify(vdev, vq);
+            g_free(elem);
+        } else {
             virtqueue_detach_element(vq, elem, 0);
             g_free(elem);
             break;
         }
-
-        iov_cnt = elem->out_num;
-        iov2 = iov = g_memdup2(elem->out_sg,
-                               sizeof(struct iovec) * elem->out_num);
-        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
-        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
-        if (s != sizeof(ctrl)) {
-            status = VIRTIO_NET_ERR;
-        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
-            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
-            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
-        }
-
-        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
-        assert(s == sizeof(status));
-
-        virtqueue_push(vq, elem, sizeof(status));
-        virtio_notify(vdev, vq);
-        g_free(iov2);
-        g_free(elem);
     }
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 07/23] vhost: add vhost_svq_push_elem
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 06/23] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement Eugenio Pérez
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

This function allows external SVQ users to return guest's available
buffers.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 91c31715d9..0fbdd69153 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -88,6 +88,8 @@ typedef struct VhostShadowVirtqueue {
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
+void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+                         const VirtQueueElement *elem, uint32_t len);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index be64e0b85c..2fc5789b73 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -410,6 +410,22 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
     return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
 }
 
+/**
+ * Push an element to SVQ, returning it to the guest.
+ */
+void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+                         const VirtQueueElement *elem, uint32_t len)
+{
+    virtqueue_push(svq->vq, elem, len);
+    if (svq->next_guest_avail_elem) {
+        /*
+         * Avail ring was full when vhost_svq_flush was called, so it's a
+         * good moment to make more descriptors available if possible.
+         */
+        vhost_handle_guest_kick(svq);
+    }
+}
+
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                             bool check_for_avail_queue)
 {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 07/23] vhost: add vhost_svq_push_elem Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-11  8:00   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 09/23] vhost: Add SVQElement Eugenio Pérez
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

VirtQueueElement comes from the guest, but we're heading SVQ to be able
to inject element without the guest's knowledge.

To do so, make this accept sg buffers directly, instead of using
VirtQueueElement.

Add vhost_svq_add_element to maintain element convenience

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 2fc5789b73..46d3c1d74f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -172,30 +172,32 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                VirtQueueElement *elem, unsigned *head)
+                                const struct iovec *out_sg, size_t out_num,
+                                const struct iovec *in_sg, size_t in_num,
+                                unsigned *head)
 {
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
     bool ok;
-    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+    g_autofree hwaddr *sgs = NULL;
 
     *head = svq->free_head;
 
     /* We need some descriptors here */
-    if (unlikely(!elem->out_num && !elem->in_num)) {
+    if (unlikely(!out_num && !in_num)) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "Guest provided element with no descriptors");
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-                                     elem->in_num > 0, false);
+    sgs = g_new(hwaddr, MAX(out_num, in_num));
+    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
+                                     false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
-                                     true);
+    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
     if (unlikely(!ok)) {
         /* TODO unwind out_sg */
         return false;
@@ -223,10 +225,13 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
  * takes ownership of the element: In case of failure, it is free and the SVQ
  * is considered broken.
  */
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+                          size_t out_num, const struct iovec *in_sg,
+                          size_t in_num, VirtQueueElement *elem)
 {
     unsigned qemu_head;
-    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+    bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
+                                  &qemu_head);
     if (unlikely(!ok)) {
         g_free(elem);
         return false;
@@ -250,6 +255,18 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
     event_notifier_set(&svq->hdev_kick);
 }
 
+static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
+                                  VirtQueueElement *elem)
+{
+    bool ok = vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
+                            elem->in_num, elem);
+    if (ok) {
+        vhost_svq_kick(svq);
+    }
+
+    return ok;
+}
+
 /**
  * Forward available buffers.
  *
@@ -302,12 +319,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 return;
             }
 
-            ok = vhost_svq_add(svq, elem);
+            ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
             if (unlikely(!ok)) {
                 /* VQ is broken, just return and ignore any other kicks */
                 return;
             }
-            vhost_svq_kick(svq);
         }
 
         virtio_queue_set_notification(svq->vq, true);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 09/23] vhost: Add SVQElement
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-11  9:00   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 10/23] vhost: Reorder vhost_svq_last_desc_of_chain Eugenio Pérez
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

This will allow SVQ to add metadata to the different queue elements. To
simplify changes, only store actual element at this patch.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  8 ++++--
 hw/virtio/vhost-shadow-virtqueue.c | 41 ++++++++++++++++++++----------
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 0fbdd69153..e434dc63b0 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,10 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQElement {
+    VirtQueueElement *elem;
+} SVQElement;
+
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
                                     void *opaque);
@@ -55,8 +59,8 @@ typedef struct VhostShadowVirtqueue {
     /* IOVA mapping */
     VhostIOVATree *iova_tree;
 
-    /* Map for use the guest's descriptors */
-    VirtQueueElement **ring_id_maps;
+    /* Each element context */
+    SVQElement *ring_id_maps;
 
     /* Next VirtQueue element that guest made available */
     VirtQueueElement *next_guest_avail_elem;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 46d3c1d74f..913bca8769 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -237,7 +237,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return false;
     }
 
-    svq->ring_id_maps[qemu_head] = elem;
+    svq->ring_id_maps[qemu_head].elem = elem;
     return true;
 }
 
@@ -385,15 +385,25 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
     return i;
 }
 
-static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
-                                           uint32_t *len)
+static bool vhost_svq_is_empty_elem(SVQElement elem)
+{
+    return elem.elem == NULL;
+}
+
+static SVQElement vhost_svq_empty_elem(void)
+{
+    return (SVQElement){};
+}
+
+static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
 {
     const vring_used_t *used = svq->vring.used;
     vring_used_elem_t used_elem;
+    SVQElement svq_elem = vhost_svq_empty_elem();
     uint16_t last_used, last_used_chain, num;
 
     if (!vhost_svq_more_used(svq)) {
-        return NULL;
+        return svq_elem;
     }
 
     /* Only get used array entries after they have been exposed by dev */
@@ -406,24 +416,25 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
     if (unlikely(used_elem.id >= svq->vring.num)) {
         qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
                       svq->vdev->name, used_elem.id);
-        return NULL;
+        return svq_elem;
     }
 
-    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+    svq_elem = svq->ring_id_maps[used_elem.id];
+    svq->ring_id_maps[used_elem.id] = vhost_svq_empty_elem();
+    if (unlikely(vhost_svq_is_empty_elem(svq_elem))) {
         qemu_log_mask(LOG_GUEST_ERROR,
             "Device %s says index %u is used, but it was not available",
             svq->vdev->name, used_elem.id);
-        return NULL;
+        return svq_elem;
     }
 
-    num = svq->ring_id_maps[used_elem.id]->in_num +
-          svq->ring_id_maps[used_elem.id]->out_num;
+    num = svq_elem.elem->in_num + svq_elem.elem->out_num;
     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
     svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
 
     *len = used_elem.len;
-    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
+    return svq_elem;
 }
 
 /**
@@ -454,6 +465,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
         vhost_svq_disable_notification(svq);
         while (true) {
             uint32_t len;
+            SVQElement svq_elem;
             g_autofree VirtQueueElement *elem = NULL;
 
             if (unlikely(i >= svq->vring.num)) {
@@ -464,11 +476,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 return;
             }
 
-            elem = vhost_svq_get_buf(svq, &len);
-            if (!elem) {
+            svq_elem = vhost_svq_get_buf(svq, &len);
+            if (vhost_svq_is_empty_elem(svq_elem)) {
                 break;
             }
 
+            elem = g_steal_pointer(&svq_elem.elem);
             virtqueue_fill(vq, elem, len, i++);
         }
 
@@ -611,7 +624,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
     memset(svq->vring.used, 0, device_size);
-    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+    svq->ring_id_maps = g_new0(SVQElement, svq->vring.num);
     svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
         svq->desc_next[i] = cpu_to_le16(i + 1);
@@ -636,7 +649,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
     for (unsigned i = 0; i < svq->vring.num; ++i) {
         g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->ring_id_maps[i]);
+        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
         if (elem) {
             virtqueue_detach_element(svq->vq, elem, 0);
         }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 10/23] vhost: Reorder vhost_svq_last_desc_of_chain
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 09/23] vhost: Add SVQElement Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-06 18:39 ` [RFC PATCH v9 11/23] vhost: Move last chain id to SVQ element Eugenio Pérez
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

SVQ is going to store it in SVQElement, so we need it before add functions.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 913bca8769..cf1745fd4d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -218,6 +218,16 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
     return true;
 }
 
+static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
+                                             uint16_t num, uint16_t i)
+{
+    for (uint16_t j = 0; j < (num - 1); ++j) {
+        i = le16_to_cpu(svq->desc_next[i]);
+    }
+
+    return i;
+}
+
 /**
  * Add an element to a SVQ.
  *
@@ -375,16 +385,6 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
     svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
 }
 
-static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
-                                             uint16_t num, uint16_t i)
-{
-    for (uint16_t j = 0; j < (num - 1); ++j) {
-        i = le16_to_cpu(svq->desc_next[i]);
-    }
-
-    return i;
-}
-
 static bool vhost_svq_is_empty_elem(SVQElement elem)
 {
     return elem.elem == NULL;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 11/23] vhost: Move last chain id to SVQ element
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 10/23] vhost: Reorder vhost_svq_last_desc_of_chain Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-11  9:02   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement Eugenio Pérez
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

We will allow SVQ user to store opaque data for each element, so its
easier if we store this kind of information just at avail.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  3 +++
 hw/virtio/vhost-shadow-virtqueue.c | 14 ++++++++------
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index e434dc63b0..0e434e9fd0 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,9 @@
 
 typedef struct SVQElement {
     VirtQueueElement *elem;
+
+    /* Last descriptor of the chain */
+    uint32_t last_chain_id;
 } SVQElement;
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index cf1745fd4d..c5e49e51c5 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -239,7 +239,9 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
                           size_t out_num, const struct iovec *in_sg,
                           size_t in_num, VirtQueueElement *elem)
 {
+    SVQElement *svq_elem;
     unsigned qemu_head;
+    size_t n;
     bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
                                   &qemu_head);
     if (unlikely(!ok)) {
@@ -247,7 +249,10 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return false;
     }
 
-    svq->ring_id_maps[qemu_head].elem = elem;
+    n = out_num + in_num;
+    svq_elem = &svq->ring_id_maps[qemu_head];
+    svq_elem->elem = elem;
+    svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
     return true;
 }
 
@@ -400,7 +405,7 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
     const vring_used_t *used = svq->vring.used;
     vring_used_elem_t used_elem;
     SVQElement svq_elem = vhost_svq_empty_elem();
-    uint16_t last_used, last_used_chain, num;
+    uint16_t last_used;
 
     if (!vhost_svq_more_used(svq)) {
         return svq_elem;
@@ -428,11 +433,8 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
         return svq_elem;
     }
 
-    num = svq_elem.elem->in_num + svq_elem.elem->out_num;
-    last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
-    svq->desc_next[last_used_chain] = svq->free_head;
+    svq->desc_next[svq_elem.last_chain_id] = svq->free_head;
     svq->free_head = used_elem.id;
-
     *len = used_elem.len;
     return svq_elem;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (10 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 11/23] vhost: Move last chain id to SVQ element Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-11  9:05   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject Eugenio Pérez
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

When qemu injects buffers to the vdpa device it will be used to maintain
contextual data. If SVQ has no operation, it will be used to maintain
the VirtQueueElement pointer.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
 hw/virtio/vhost-shadow-virtqueue.c | 13 +++++++------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 0e434e9fd0..a811f90e01 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -16,7 +16,8 @@
 #include "hw/virtio/vhost-iova-tree.h"
 
 typedef struct SVQElement {
-    VirtQueueElement *elem;
+    /* Opaque data */
+    void *opaque;
 
     /* Last descriptor of the chain */
     uint32_t last_chain_id;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index c5e49e51c5..492bb12b5f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -237,7 +237,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
  */
 static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
                           size_t out_num, const struct iovec *in_sg,
-                          size_t in_num, VirtQueueElement *elem)
+                          size_t in_num, void *opaque)
 {
     SVQElement *svq_elem;
     unsigned qemu_head;
@@ -245,13 +245,12 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
     bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
                                   &qemu_head);
     if (unlikely(!ok)) {
-        g_free(elem);
         return false;
     }
 
     n = out_num + in_num;
     svq_elem = &svq->ring_id_maps[qemu_head];
-    svq_elem->elem = elem;
+    svq_elem->opaque = opaque;
     svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
     return true;
 }
@@ -277,6 +276,8 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
                             elem->in_num, elem);
     if (ok) {
         vhost_svq_kick(svq);
+    } else {
+        g_free(elem);
     }
 
     return ok;
@@ -392,7 +393,7 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 
 static bool vhost_svq_is_empty_elem(SVQElement elem)
 {
-    return elem.elem == NULL;
+    return elem.opaque == NULL;
 }
 
 static SVQElement vhost_svq_empty_elem(void)
@@ -483,7 +484,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 break;
             }
 
-            elem = g_steal_pointer(&svq_elem.elem);
+            elem = g_steal_pointer(&svq_elem.opaque);
             virtqueue_fill(vq, elem, len, i++);
         }
 
@@ -651,7 +652,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
     for (unsigned i = 0; i < svq->vring.num; ++i) {
         g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
+        elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
         if (elem) {
             virtqueue_detach_element(svq->vq, elem, 0);
         }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (11 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-11  9:14   ` Jason Wang
  2022-07-06 18:39 ` [RFC PATCH v9 14/23] vhost: add vhost_svq_poll Eugenio Pérez
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

This allows qemu to inject buffers to the device.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 34 ++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index a811f90e01..d01d2370db 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -98,6 +98,8 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
 void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
                          const VirtQueueElement *elem, uint32_t len);
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+                     size_t out_num, size_t in_num, void *opaque);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 492bb12b5f..bd9e34b413 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -283,6 +283,40 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
     return ok;
 }
 
+/**
+ * Inject a chain of buffers to the device
+ *
+ * @svq: Shadow VirtQueue
+ * @iov: I/O vector
+ * @out_num: Number of front out descriptors
+ * @in_num: Number of last input descriptors
+ * @opaque: Contextual data to store in descriptor
+ *
+ * Return 0 on success, -ENOMEM if cannot inject
+ */
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+                     size_t out_num, size_t in_num, void *opaque)
+{
+    bool ok;
+
+    /*
+     * All vhost_svq_inject calls are controlled by qemu so we won't hit this
+     * assertions.
+     */
+    assert(out_num || in_num);
+    assert(svq->ops);
+
+    if (unlikely(svq->next_guest_avail_elem)) {
+        error_report("Injecting in a full queue");
+        return -ENOMEM;
+    }
+
+    ok = vhost_svq_add(svq, iov, out_num, iov + out_num, in_num, opaque);
+    assert(ok);
+    vhost_svq_kick(svq);
+    return 0;
+}
+
 /**
  * Forward available buffers.
  *
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 14/23] vhost: add vhost_svq_poll
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (12 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject Eugenio Pérez
@ 2022-07-06 18:39 ` Eugenio Pérez
  2022-07-11  9:19   ` Jason Wang
  2022-07-06 18:40 ` [RFC PATCH v9 15/23] vhost: Add custom used buffer callback Eugenio Pérez
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

It allows the Shadow Control VirtQueue to wait the device to use the commands
that restore the net device state after a live migration.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 54 ++++++++++++++++++++++++++++--
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index d01d2370db..c8668fbdd6 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -100,6 +100,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
                          const VirtQueueElement *elem, uint32_t len);
 int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
                      size_t out_num, size_t in_num, void *opaque);
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index bd9e34b413..ed7f1d0bc9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -10,6 +10,8 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
+#include <glib/gpoll.h>
+
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
@@ -490,10 +492,11 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
     }
 }
 
-static void vhost_svq_flush(VhostShadowVirtqueue *svq,
-                            bool check_for_avail_queue)
+static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
+                              bool check_for_avail_queue)
 {
     VirtQueue *vq = svq->vq;
+    size_t ret = 0;
 
     /* Forward as many used buffers as possible. */
     do {
@@ -510,7 +513,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                          "More than %u used buffers obtained in a %u size SVQ",
                          i, svq->vring.num);
                 virtqueue_flush(vq, svq->vring.num);
-                return;
+                return ret;
             }
 
             svq_elem = vhost_svq_get_buf(svq, &len);
@@ -520,6 +523,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 
             elem = g_steal_pointer(&svq_elem.opaque);
             virtqueue_fill(vq, elem, len, i++);
+            ret++;
         }
 
         virtqueue_flush(vq, i);
@@ -533,6 +537,50 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
             vhost_handle_guest_kick(svq);
         }
     } while (!vhost_svq_enable_notification(svq));
+
+    return ret;
+}
+
+/**
+ * Poll the SVQ for device used buffers.
+ *
+ * This function race with main event loop SVQ polling, so extra
+ * synchronization is needed.
+ *
+ * Return the number of descriptors read from the device.
+ */
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq)
+{
+    int fd = event_notifier_get_fd(&svq->hdev_call);
+    GPollFD poll_fd = {
+        .fd = fd,
+        .events = G_IO_IN,
+    };
+    assert(fd >= 0);
+    int r = g_poll(&poll_fd, 1, -1);
+
+    if (unlikely(r < 0)) {
+        error_report("Cannot poll device call fd "G_POLLFD_FORMAT": (%d) %s",
+                     poll_fd.fd, errno, g_strerror(errno));
+        return -errno;
+    }
+
+    if (r == 0) {
+        return 0;
+    }
+
+    if (unlikely(poll_fd.revents & ~(G_IO_IN))) {
+        error_report(
+            "Error polling device call fd "G_POLLFD_FORMAT": revents=%d",
+            poll_fd.fd, poll_fd.revents);
+        return -1;
+    }
+
+    /*
+     * Max return value of vhost_svq_flush is (uint16_t)-1, so it's safe to
+     * convert to ssize_t.
+     */
+    return vhost_svq_flush(svq, false);
 }
 
 /**
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 15/23] vhost: Add custom used buffer callback
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (13 preceding siblings ...)
  2022-07-06 18:39 ` [RFC PATCH v9 14/23] vhost: add vhost_svq_poll Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-06 18:40 ` [RFC PATCH v9 16/23] vhost: Add svq avail_handler callback Eugenio Pérez
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

The callback allows SVQ users to know the VirtQueue requests and
responses. QEMU can use this to synchronize virtio device model state,
allowing to migrate it with minimum changes to the migration code.

If callbacks are specified at svq creation, the buffers need to be
injected to the device using vhost_svq_inject. An opaque data must be
given with it, and its returned to the callback at used_handler call.

In the case of networking, this will be used to inspect control
virtqueue messages and to recover status injection at the first time.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  5 +++++
 hw/virtio/vhost-shadow-virtqueue.c | 16 +++++++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index c8668fbdd6..296fef6f21 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -27,8 +27,13 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
                                     void *opaque);
 
+typedef void (*VirtQueueUsedCallback)(VhostShadowVirtqueue *svq,
+                                      void *used_elem_opaque,
+                                      uint32_t written);
+
 typedef struct VhostShadowVirtqueueOps {
     ShadowVirtQueueStart start;
+    VirtQueueUsedCallback used_handler;
 } VhostShadowVirtqueueOps;
 
 /* Shadow virtqueue to relay notifications */
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index ed7f1d0bc9..b92ca4a63f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -506,7 +506,6 @@ static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
         while (true) {
             uint32_t len;
             SVQElement svq_elem;
-            g_autofree VirtQueueElement *elem = NULL;
 
             if (unlikely(i >= svq->vring.num)) {
                 qemu_log_mask(LOG_GUEST_ERROR,
@@ -521,13 +520,20 @@ static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
                 break;
             }
 
-            elem = g_steal_pointer(&svq_elem.opaque);
-            virtqueue_fill(vq, elem, len, i++);
+            if (svq->ops) {
+                svq->ops->used_handler(svq, svq_elem.opaque, len);
+            } else {
+                g_autofree VirtQueueElement *elem = NULL;
+                elem = g_steal_pointer(&svq_elem.opaque);
+                virtqueue_fill(vq, elem, len, i++);
+            }
             ret++;
         }
 
-        virtqueue_flush(vq, i);
-        event_notifier_set(&svq->svq_call);
+        if (i > 0) {
+            virtqueue_flush(vq, i);
+            event_notifier_set(&svq->svq_call);
+        }
 
         if (check_for_avail_queue && svq->next_guest_avail_elem) {
             /*
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 16/23] vhost: Add svq avail_handler callback
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (14 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 15/23] vhost: Add custom used buffer callback Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-06 18:40 ` [RFC PATCH v9 17/23] vhost: add detach SVQ operation Eugenio Pérez
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.

When this callback is defined the ownership of guest's virtqueue element
is transferred to the callback. This means that if the user wants to
forward the descriptor it needs to manually inject it. The callback is
also free to process the command by itself and use the element with
svq_push.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 16 ++++++++++++++++
 hw/virtio/vhost-shadow-virtqueue.c |  8 +++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 296fef6f21..4300cb66f8 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -27,12 +27,28 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
 typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
                                     void *opaque);
 
+/**
+ * Callback to handle an avail buffer.
+ *
+ * @svq:  Shadow virtqueue
+ * @elem:  Element placed in the queue by the guest
+ * @vq_callback_opaque:  Opaque
+ *
+ * Returns true if the vq is running as expected, false otherwise.
+ *
+ * Note that ownership of elem is transferred to the callback.
+ */
+typedef bool (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq,
+                                       VirtQueueElement *elem,
+                                       void *vq_callback_opaque);
+
 typedef void (*VirtQueueUsedCallback)(VhostShadowVirtqueue *svq,
                                       void *used_elem_opaque,
                                       uint32_t written);
 
 typedef struct VhostShadowVirtqueueOps {
     ShadowVirtQueueStart start;
+    VirtQueueAvailCallback avail_handler;
     VirtQueueUsedCallback used_handler;
 } VhostShadowVirtqueueOps;
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index b92ca4a63f..dffea256f1 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -371,7 +371,13 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 return;
             }
 
-            ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
+            if (svq->ops) {
+                ok = svq->ops->avail_handler(svq, g_steal_pointer(&elem),
+                                             svq->ops_opaque);
+            } else {
+                ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
+            }
+
             if (unlikely(!ok)) {
                 /* VQ is broken, just return and ignore any other kicks */
                 return;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 17/23] vhost: add detach SVQ operation
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (15 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 16/23] vhost: Add svq avail_handler callback Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-06 18:40 ` [RFC PATCH v9 18/23] vdpa: Export vhost_vdpa_dma_map and unmap calls Eugenio Pérez
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

To notify the caller it needs to discard the element.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 11 +++++++++++
 hw/virtio/vhost-shadow-virtqueue.c | 11 ++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 4300cb66f8..583b6fda5d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -46,10 +46,21 @@ typedef void (*VirtQueueUsedCallback)(VhostShadowVirtqueue *svq,
                                       void *used_elem_opaque,
                                       uint32_t written);
 
+/**
+ * Detach the element from the shadow virtqueue.  SVQ needs to free it and it
+ * cannot be pushed or discarded.
+ *
+ * @elem_opaque: The element opaque
+ *
+ * Return the guest element to detach and free if any.
+ */
+typedef VirtQueueElement *(*VirtQueueDetachCallback)(void *elem_opaque);
+
 typedef struct VhostShadowVirtqueueOps {
     ShadowVirtQueueStart start;
     VirtQueueAvailCallback avail_handler;
     VirtQueueUsedCallback used_handler;
+    VirtQueueDetachCallback detach_handler;
 } VhostShadowVirtqueueOps;
 
 /* Shadow virtqueue to relay notifications */
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index dffea256f1..4f072f040b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -746,7 +746,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
     for (unsigned i = 0; i < svq->vring.num; ++i) {
         g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
+        void *opaque = g_steal_pointer(&svq->ring_id_maps[i].opaque);
+
+        if (!opaque) {
+            continue;
+        } else if (svq->ops) {
+            elem = svq->ops->detach_handler(opaque);
+        } else {
+            elem = opaque;
+        }
+
         if (elem) {
             virtqueue_detach_element(svq->vq, elem, 0);
         }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 18/23] vdpa: Export vhost_vdpa_dma_map and unmap calls
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (16 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 17/23] vhost: add detach SVQ operation Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-11  9:22   ` Jason Wang
  2022-07-06 18:40 ` [RFC PATCH v9 19/23] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks that
can set a different state in qemu device model and vdpa device.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h | 4 ++++
 hw/virtio/vhost-vdpa.c         | 7 +++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a29dbb3f53..7214eb47dc 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -39,4 +39,8 @@ typedef struct vhost_vdpa {
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+                       void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+
 #endif
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 69cfaf05d6..613c3483b0 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -71,8 +71,8 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
     return false;
 }
 
-static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-                              void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+                       void *vaddr, bool readonly)
 {
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
@@ -97,8 +97,7 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
     return ret;
 }
 
-static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
-                                hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
 {
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 19/23] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (17 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 18/23] vdpa: Export vhost_vdpa_dma_map and unmap calls Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-12  4:11   ` Jason Wang
  2022-07-06 18:40 ` [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue Eugenio Pérez
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index df1e69ee72..b0158f625e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -219,20 +219,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+    int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+    if (ret) {
+        error_setg_errno(errp, errno,
+                         "Fail to query features from vhost-vDPA device");
+    }
+    return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+                                          int *has_cvq, Error **errp)
 {
     unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
     g_autofree struct vhost_vdpa_config *config = NULL;
     __virtio16 *max_queue_pairs;
-    uint64_t features;
     int ret;
 
-    ret = ioctl(fd, VHOST_GET_FEATURES, &features);
-    if (ret) {
-        error_setg(errp, "Fail to query features from vhost-vDPA device");
-        return ret;
-    }
-
     if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
         *has_cvq = 1;
     } else {
@@ -262,10 +266,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp)
 {
     const NetdevVhostVDPAOptions *opts;
+    uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
-    int queue_pairs, i, has_cvq = 0;
+    int queue_pairs, r, i, has_cvq = 0;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -279,7 +284,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return -errno;
     }
 
-    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+    r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
+    if (r) {
+        return r;
+    }
+
+    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
                                                  &has_cvq, errp);
     if (queue_pairs < 0) {
         qemu_close(vdpa_device_fd);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (18 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 19/23] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-12  7:17   ` Jason Wang
  2022-07-06 18:40 ` [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like multiqueue.

Virtio-net control VQ will copy the descriptors to qemu's VA, so we
avoid TOCTOU with the guest's or device's memory every time there is a
device model change.  When address space isolation is implemented, this
will allow, CVQ to only have access to control messages too.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented.  If virtio-net driver changes MAC the virtio-net device
model will be updated with the new one.

Others cvq commands could be added here straightforwardly but they have
been not tested.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h |   3 +
 hw/virtio/vhost-vdpa.c         |   5 +-
 net/vhost-vdpa.c               | 373 +++++++++++++++++++++++++++++++++
 3 files changed, 379 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 7214eb47dc..1111d85643 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -15,6 +15,7 @@
 #include <gmodule.h>
 
 #include "hw/virtio/vhost-iova-tree.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -35,6 +36,8 @@ typedef struct vhost_vdpa {
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
+    const VhostShadowVirtqueueOps *shadow_vq_ops;
+    void *shadow_vq_ops_opaque;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 613c3483b0..94bda07b4d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -417,9 +417,10 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree, NULL,
-                                                            NULL);
+        g_autoptr(VhostShadowVirtqueue) svq = NULL;
 
+        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+                            v->shadow_vq_ops_opaque);
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
             return -1;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index b0158f625e..e415cc8de5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -11,11 +11,15 @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
+#include "qemu/buffer.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/memalign.h"
 #include "qemu/option.h"
 #include "qapi/error.h"
 #include <linux/vhost.h>
@@ -25,6 +29,26 @@
 #include "monitor/monitor.h"
 #include "hw/virtio/vhost.h"
 
+typedef struct CVQElement {
+    /* Device's in and out buffer */
+    void *in_buf, *out_buf;
+
+    /* Optional guest element from where this cvqelement was created */
+    VirtQueueElement *guest_elem;
+
+    /* Control header sent by the guest. */
+    struct virtio_net_ctrl_hdr ctrl;
+
+    /* vhost-vdpa device, for cleanup reasons */
+    struct vhost_vdpa *vdpa;
+
+    /* Length of out data */
+    size_t out_len;
+
+    /* Copy of the out data sent by the guest excluding ctrl. */
+    uint8_t out_data[];
+} CVQElement;
+
 /* Todo:need to add the multiqueue support here */
 typedef struct VhostVDPAState {
     NetClientState nc;
@@ -187,6 +211,351 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+/**
+ * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
+ *
+ * @svq: Shadow VirtQueue
+ * @iova: SVQ IO Virtual address of descriptor
+ * @iov: Optional iovec to store device writable buffer
+ * @iov_cnt: iov length
+ * @buf_len: Length written by the device
+ *
+ * TODO: Use me! and adapt to net/vhost-vdpa format
+ * Print error message in case of error
+ */
+static void vhost_vdpa_cvq_unmap_buf(CVQElement *elem, void *addr)
+{
+    struct vhost_vdpa *v = elem->vdpa;
+    VhostIOVATree *tree = v->iova_tree;
+    DMAMap needle = {
+        /*
+         * No need to specify size or to look for more translations since
+         * this contiguous chunk was allocated by us.
+         */
+        .translated_addr = (hwaddr)(uintptr_t)addr,
+    };
+    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
+    int r;
+
+    if (unlikely(!map)) {
+        error_report("Cannot locate expected map");
+        goto err;
+    }
+
+    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
+    if (unlikely(r != 0)) {
+        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
+    }
+
+    vhost_iova_tree_remove(tree, map);
+
+err:
+    qemu_vfree(addr);
+}
+
+static void vhost_vdpa_cvq_delete_elem(CVQElement *elem)
+{
+    if (elem->out_buf) {
+        vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->out_buf));
+    }
+
+    if (elem->in_buf) {
+        vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->in_buf));
+    }
+
+    /* Guest element must have been returned to the guest or free otherway */
+    assert(!elem->guest_elem);
+
+    g_free(elem);
+}
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(CVQElement, vhost_vdpa_cvq_delete_elem);
+
+static int vhost_vdpa_net_cvq_svq_inject(VhostShadowVirtqueue *svq,
+                                         CVQElement *cvq_elem,
+                                         size_t out_len)
+{
+    const struct iovec iov[] = {
+        {
+            .iov_base = cvq_elem->out_buf,
+            .iov_len = out_len,
+        },{
+            .iov_base = cvq_elem->in_buf,
+            .iov_len = sizeof(virtio_net_ctrl_ack),
+        }
+    };
+
+    return vhost_svq_inject(svq, iov, 1, 1, cvq_elem);
+}
+
+static void *vhost_vdpa_cvq_alloc_buf(struct vhost_vdpa *v,
+                                      const uint8_t *out_data, size_t data_len,
+                                      bool write)
+{
+    DMAMap map = {};
+    size_t buf_len = ROUND_UP(data_len, qemu_real_host_page_size());
+    void *buf = qemu_memalign(qemu_real_host_page_size(), buf_len);
+    int r;
+
+    if (!write) {
+        memcpy(buf, out_data, data_len);
+        memset(buf + data_len, 0, buf_len - data_len);
+    } else {
+        memset(buf, 0, data_len);
+    }
+
+    map.translated_addr = (hwaddr)(uintptr_t)buf;
+    map.size = buf_len - 1;
+    map.perm = write ? IOMMU_RW : IOMMU_RO,
+    r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
+    if (unlikely(r != IOVA_OK)) {
+        error_report("Cannot map injected element");
+        goto err;
+    }
+
+    r = vhost_vdpa_dma_map(v, map.iova, buf_len, buf, !write);
+    /* TODO: Handle error */
+    assert(r == 0);
+
+    return buf;
+
+err:
+    qemu_vfree(buf);
+    return NULL;
+}
+
+/**
+ * Allocate an element suitable to be injected
+ *
+ * @iov: The iovec
+ * @out_num: Number of out elements, placed first in iov
+ * @in_num: Number of in elements, placed after out ones
+ * @elem: Optional guest element from where this one was created
+ *
+ * TODO: Do we need a sg for out_num? I think not
+ */
+static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
+                                             struct virtio_net_ctrl_hdr ctrl,
+                                             const struct iovec *out_sg,
+                                             size_t out_num, size_t out_size,
+                                             VirtQueueElement *elem)
+{
+    g_autoptr(CVQElement) cvq_elem = g_malloc(sizeof(CVQElement) + out_size);
+    uint8_t *out_cursor = cvq_elem->out_data;
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    /* Start with a clean base */
+    memset(cvq_elem, 0, sizeof(*cvq_elem));
+    cvq_elem->vdpa = &s->vhost_vdpa;
+
+    /*
+     * Linearize element. If guest had a descriptor chain, we expose the device
+     * a single buffer.
+     */
+    cvq_elem->out_len = out_size;
+    memcpy(out_cursor, &ctrl, sizeof(ctrl));
+    out_size -= sizeof(ctrl);
+    out_cursor += sizeof(ctrl);
+    iov_to_buf(out_sg, out_num, 0, out_cursor, out_size);
+
+    cvq_elem->out_buf = vhost_vdpa_cvq_alloc_buf(v, cvq_elem->out_data,
+                                                 out_size, false);
+    assert(cvq_elem->out_buf);
+    cvq_elem->in_buf = vhost_vdpa_cvq_alloc_buf(v, NULL,
+                                                sizeof(virtio_net_ctrl_ack),
+                                                true);
+    assert(cvq_elem->in_buf);
+
+    cvq_elem->guest_elem = elem;
+    cvq_elem->ctrl = ctrl;
+    return g_steal_pointer(&cvq_elem);
+}
+
+/**
+ * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
+ * iov_size.
+ */
+static uint64_t vhost_vdpa_net_iov_len(const struct iovec *iov,
+                                       unsigned int iov_cnt, size_t max)
+{
+    uint64_t len = 0;
+
+    for (unsigned int i = 0; len < max && i < iov_cnt; i++) {
+        bool overflow = uadd64_overflow(iov[i].iov_len, len, &len);
+        if (unlikely(overflow)) {
+            return UINT64_MAX;
+        }
+    }
+
+    return len;
+}
+
+static CVQElement *vhost_vdpa_net_cvq_copy_elem(VhostVDPAState *s,
+                                                VirtQueueElement *elem)
+{
+    struct virtio_net_ctrl_hdr ctrl;
+    g_autofree struct iovec *iov = NULL;
+    struct iovec *iov2;
+    unsigned int out_num = elem->out_num;
+    size_t n, out_size = 0;
+
+    /* TODO: in buffer MUST have only a single entry with a char? size */
+    if (unlikely(vhost_vdpa_net_iov_len(elem->in_sg, elem->in_num,
+                                        sizeof(virtio_net_ctrl_ack))
+                                              < sizeof(virtio_net_ctrl_ack))) {
+        return NULL;
+    }
+
+    n = iov_to_buf(elem->out_sg, out_num, 0, &ctrl, sizeof(ctrl));
+    if (unlikely(n != sizeof(ctrl))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid out size\n", __func__);
+        return NULL;
+    }
+
+    iov = iov2 = g_memdup2(elem->out_sg, sizeof(struct iovec) * elem->out_num);
+    iov_discard_front(&iov2, &out_num, sizeof(ctrl));
+    switch (ctrl.class) {
+    case VIRTIO_NET_CTRL_MAC:
+        switch (ctrl.cmd) {
+        case VIRTIO_NET_CTRL_MAC_ADDR_SET:
+            if (likely(vhost_vdpa_net_iov_len(iov2, out_num, 6))) {
+                out_size += 6;
+                break;
+            }
+
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac size\n", __func__);
+            return NULL;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n",
+                          __func__, ctrl.cmd);
+            return NULL;
+        };
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n",
+                      __func__, ctrl.class);
+        return NULL;
+    };
+
+    return vhost_vdpa_cvq_alloc_elem(s, ctrl, iov2, out_num,
+                                     sizeof(ctrl) + out_size, elem);
+}
+
+/**
+ * Validate and copy control virtqueue commands.
+ *
+ * Following QEMU guidelines, we offer a copy of the buffers to the device to
+ * prevent TOCTOU bugs.  This functions check that the buffers length are
+ * expected too.
+ */
+static bool vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
+                                             VirtQueueElement *guest_elem,
+                                             void *opaque)
+{
+    VhostVDPAState *s = opaque;
+    g_autoptr(CVQElement) cvq_elem = NULL;
+    g_autofree VirtQueueElement *elem = guest_elem;
+    size_t out_size, in_len;
+    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+    int r;
+
+    cvq_elem = vhost_vdpa_net_cvq_copy_elem(s, elem);
+    if (unlikely(!cvq_elem)) {
+        goto err;
+    }
+
+    /* out size validated at vhost_vdpa_net_cvq_copy_elem */
+    out_size = iov_size(elem->out_sg, elem->out_num);
+    r = vhost_vdpa_net_cvq_svq_inject(svq, cvq_elem, out_size);
+    if (unlikely(r != 0)) {
+        goto err;
+    }
+
+    cvq_elem->guest_elem = g_steal_pointer(&elem);
+    /* Now CVQ elem belongs to SVQ */
+    g_steal_pointer(&cvq_elem);
+    return true;
+
+err:
+    in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
+                          sizeof(status));
+    vhost_svq_push_elem(svq, elem, in_len);
+    return true;
+}
+
+static VirtQueueElement *vhost_vdpa_net_handle_ctrl_detach(void *elem_opaque)
+{
+    g_autoptr(CVQElement) cvq_elem = elem_opaque;
+    return g_steal_pointer(&cvq_elem->guest_elem);
+}
+
+static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
+                                            void *vq_elem_opaque,
+                                            uint32_t dev_written)
+{
+    g_autoptr(CVQElement) cvq_elem = vq_elem_opaque;
+    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+    const struct iovec out = {
+        .iov_base = cvq_elem->out_data,
+        .iov_len = cvq_elem->out_len,
+    };
+    const DMAMap status_map_needle = {
+        .translated_addr = (hwaddr)(uintptr_t)cvq_elem->in_buf,
+        .size = sizeof(status),
+    };
+    const DMAMap *in_map;
+    const struct iovec in = {
+        .iov_base = &status,
+        .iov_len = sizeof(status),
+    };
+    g_autofree VirtQueueElement *guest_elem = NULL;
+
+    if (unlikely(dev_written < sizeof(status))) {
+        error_report("Insufficient written data (%llu)",
+                     (long long unsigned)dev_written);
+        goto out;
+    }
+
+    in_map = vhost_iova_tree_find_iova(svq->iova_tree, &status_map_needle);
+    if (unlikely(!in_map)) {
+        error_report("Cannot locate out mapping");
+        goto out;
+    }
+
+    switch (cvq_elem->ctrl.class) {
+    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
+        break;
+    default:
+        error_report("Unexpected ctrl class %u", cvq_elem->ctrl.class);
+        goto out;
+    };
+
+    memcpy(&status, cvq_elem->in_buf, sizeof(status));
+    if (status != VIRTIO_NET_OK) {
+        goto out;
+    }
+
+    status = VIRTIO_NET_ERR;
+    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1);
+    if (status != VIRTIO_NET_OK) {
+        error_report("Bad CVQ processing in model");
+        goto out;
+    }
+
+out:
+    guest_elem = g_steal_pointer(&cvq_elem->guest_elem);
+    if (guest_elem) {
+        iov_from_buf(guest_elem->in_sg, guest_elem->in_num, 0, &status,
+                     sizeof(status));
+        vhost_svq_push_elem(svq, guest_elem, sizeof(status));
+    }
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+    .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
+    .used_handler = vhost_vdpa_net_handle_ctrl_used,
+    .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            const char *device,
                                            const char *name,
@@ -211,6 +580,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    if (!is_datapath) {
+        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+        s->vhost_vdpa.shadow_vq_ops_opaque = s;
+    }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         qemu_del_net_client(nc);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (19 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-12  7:26   ` Jason Wang
  2022-07-06 18:40 ` [RFC PATCH v9 22/23] vdpa: Inject virtio-net mac address via CVQ at start Eugenio Pérez
  2022-07-06 18:40 ` [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

As a first step we only enable CVQ first than others. Future patches add
state restore.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index e415cc8de5..77d013833f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -370,6 +370,24 @@ static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
     return g_steal_pointer(&cvq_elem);
 }
 
+static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
+                                        void *opaque)
+{
+    struct vhost_vring_state state = {
+        .index = virtio_get_queue_index(svq->vq),
+        .num = 1,
+    };
+    VhostVDPAState *s = opaque;
+    struct vhost_dev *dev = s->vhost_vdpa.dev;
+    struct vhost_vdpa *v = dev->opaque;
+    int r;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+    r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, &state);
+    return r < 0 ? -errno : r;
+}
+
 /**
  * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
  * iov_size.
@@ -554,6 +572,7 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
     .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
     .used_handler = vhost_vdpa_net_handle_ctrl_used,
     .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
+    .start = vhost_vdpa_start_control_svq,
 };
 
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 22/23] vdpa: Inject virtio-net mac address via CVQ at start
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (20 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-06 18:40 ` [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
  22 siblings, 0 replies; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

This is needed so the destination vdpa device see the same state a the
guest set in the source.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 77d013833f..bb6ac7d96c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -380,12 +380,59 @@ static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
     VhostVDPAState *s = opaque;
     struct vhost_dev *dev = s->vhost_vdpa.dev;
     struct vhost_vdpa *v = dev->opaque;
+    VirtIONet *n = VIRTIO_NET(dev->vdev);
+    uint64_t features = dev->vdev->host_features;
+    size_t num = 0;
     int r;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
 
     r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, &state);
-    return r < 0 ? -errno : r;
+    if (unlikely(r < 0)) {
+        return -errno;
+    }
+
+    if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+        CVQElement *cvq_elem;
+        const struct virtio_net_ctrl_hdr ctrl = {
+            .class = VIRTIO_NET_CTRL_MAC,
+            .cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+        };
+        uint8_t mac[6];
+        const struct iovec out[] = {
+            {
+                .iov_base = (void *)&ctrl,
+                .iov_len = sizeof(ctrl),
+            },{
+                .iov_base = mac,
+                .iov_len = sizeof(mac),
+            },
+        };
+
+        memcpy(mac, n->mac, sizeof(mac));
+        cvq_elem = vhost_vdpa_cvq_alloc_elem(s, ctrl, out, ARRAY_SIZE(out),
+                                             iov_size(out, ARRAY_SIZE(out)),
+                                             NULL);
+        assert(cvq_elem);
+        r = vhost_vdpa_net_cvq_svq_inject(svq, cvq_elem,
+                                          sizeof(ctrl) + sizeof(mac));
+        if (unlikely(r)) {
+            assert(!"Need to test for pending buffers etc");
+            return r;
+        }
+        num++;
+    }
+
+    while (num) {
+        /*
+         * We can call vhost_svq_poll here because BQL protects calls to run.
+         */
+        size_t used = vhost_svq_poll(svq);
+        assert(used <= num);
+        num -= used;
+    }
+
+    return 0;
 }
 
 /**
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
                   ` (21 preceding siblings ...)
  2022-07-06 18:40 ` [RFC PATCH v9 22/23] vdpa: Inject virtio-net mac address via CVQ at start Eugenio Pérez
@ 2022-07-06 18:40 ` Eugenio Pérez
  2022-07-07  6:23   ` Markus Armbruster
  22 siblings, 1 reply; 65+ messages in thread
From: Eugenio Pérez @ 2022-07-06 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |  9 +++++-
 net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 9af11e9a3b..75ba2cb989 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,19 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #          (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+#         (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
     '*vhostdev':     'str',
-    '*queues':       'int' } }
+    '*queues':       'int',
+    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index bb6ac7d96c..3f10636e05 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -93,6 +93,30 @@ const int vdpa_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+    BIT_ULL(VIRTIO_NET_F_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
+    BIT_ULL(VIRTIO_NET_F_MTU) |
+    BIT_ULL(VIRTIO_NET_F_MAC) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+    BIT_ULL(VIRTIO_NET_F_STATUS) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+    BIT_ULL(VIRTIO_NET_F_MQ) |
+    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+    BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -151,7 +175,11 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_dev *dev = &s->vhost_net->dev;
 
+    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -454,6 +482,14 @@ static uint64_t vhost_vdpa_net_iov_len(const struct iovec *iov,
     return len;
 }
 
+static int vhost_vdpa_get_iova_range(int fd,
+                                     struct vhost_vdpa_iova_range *iova_range)
+{
+    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+    return ret < 0 ? -errno : 0;
+}
+
 static CVQElement *vhost_vdpa_net_cvq_copy_elem(VhostVDPAState *s,
                                                 VirtQueueElement *elem)
 {
@@ -628,7 +664,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            int vdpa_device_fd,
                                            int queue_pair_index,
                                            int nvqs,
-                                           bool is_datapath)
+                                           bool is_datapath,
+                                           bool svq,
+                                           VhostIOVATree *iova_tree)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -646,6 +684,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.iova_tree = iova_tree;
     if (!is_datapath) {
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
         s->vhost_vdpa.shadow_vq_ops_opaque = s;
@@ -708,6 +748,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
+    g_autoptr(VhostIOVATree) iova_tree = NULL;
     NetClientState *nc;
     int queue_pairs, r, i, has_cvq = 0;
 
@@ -735,22 +776,45 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return queue_pairs;
     }
 
+    if (opts->x_svq) {
+        struct vhost_vdpa_iova_range iova_range;
+
+        uint64_t invalid_dev_features =
+            features & ~vdpa_svq_device_features &
+            /* Transport are all accepted at this point */
+            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
+
+        if (invalid_dev_features) {
+            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
+                       invalid_dev_features);
+            goto err_svq;
+        }
+
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    }
+
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true);
+                                     vdpa_device_fd, i, 2, true, opts->x_svq,
+                                     iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false);
+                                 vdpa_device_fd, i, 1, false,
+                                 opts->x_svq, iova_tree);
         if (!nc)
             goto err;
     }
 
+    /* iova_tree ownership belongs to last NetClientState */
+    g_steal_pointer(&iova_tree);
     return 0;
 
 err:
@@ -759,6 +823,8 @@ err:
             qemu_del_net_client(ncs[i]);
         }
     }
+
+err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-07-06 18:40 ` [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
@ 2022-07-07  6:23   ` Markus Armbruster
  2022-07-08 10:53     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Markus Armbruster @ 2022-07-07  6:23 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Harpreet Singh Anand, Eric Blake,
	Laurent Vivier, Parav Pandit, Cornelia Huck, Paolo Bonzini,
	Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Eugenio Pérez <eperezma@redhat.com> writes:

> Finally offering the possibility to enable SVQ from the command line.

QMP, too, I guess.

>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  qapi/net.json    |  9 +++++-
>  net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 4 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index 9af11e9a3b..75ba2cb989 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -445,12 +445,19 @@
>  # @queues: number of queues to be created for multiqueue vhost-vdpa
>  #          (default: 1)
>  #
> +# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
> +#         (default: false)
> +#
> +# Features:
> +# @unstable: Member @x-svq is experimental.
> +#
>  # Since: 5.1
>  ##
>  { 'struct': 'NetdevVhostVDPAOptions',
>    'data': {
>      '*vhostdev':     'str',
> -    '*queues':       'int' } }
> +    '*queues':       'int',
> +    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
>  
>  ##

QAPI schema:
Acked-by: Markus Armbruster <armbru@redhat.com>

[...]



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun
  2022-07-06 18:39 ` [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun Eugenio Pérez
@ 2022-07-08  8:52   ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-08  8:52 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Previous function misses the just picked avail buffer from the queue.
> This way keeps blocking the used queue forever, but is cleaner to check
> before calling to vhost_svq_get_buf.
>
> Fixes: 100890f7cad50 ("vhost: Shadow virtqueue buffers forwarding")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 56c96ebd13..9280285435 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -405,19 +405,21 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>          vhost_svq_disable_notification(svq);
>          while (true) {
>              uint32_t len;
> -            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
> -            if (!elem) {
> -                break;
> -            }
> +            g_autofree VirtQueueElement *elem = NULL;
>
>              if (unlikely(i >= svq->vring.num)) {
>                  qemu_log_mask(LOG_GUEST_ERROR,
>                           "More than %u used buffers obtained in a %u size SVQ",
>                           i, svq->vring.num);
> -                virtqueue_fill(vq, elem, len, i);
> -                virtqueue_flush(vq, i);
> +                virtqueue_flush(vq, svq->vring.num);
>                  return;
>              }
> +
> +            elem = vhost_svq_get_buf(svq, &len);
> +            if (!elem) {
> +                break;
> +            }
> +
>              virtqueue_fill(vq, elem, len, i++);
>          }
>
> --
> 2.31.1
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK
  2022-07-06 18:39 ` [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
@ 2022-07-08  9:06   ` Jason Wang
  2022-07-08  9:56     ` Eugenio Perez Martin
  2022-07-08  9:59     ` Eugenio Perez Martin
  2022-07-13  5:51   ` Michael S. Tsirkin
  1 sibling, 2 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-08  9:06 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> To restore the device in the destination of a live migration we send the
> commands through control virtqueue. For a device to read CVQ it must
> have received DRIVER_OK status bit.
>
> However this open a window where the device could start receiving
> packets in rx queue 0 before it receive the RSS configuration. To avoid
> that, we will not send vring_enable until all configuration is used by
> the device.
>
> As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.

I wonder if it's better to delay this to the series that implements
migration since the shadow cvq doesn't depends on this?

>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-vdpa.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 66f054a12c..2ee8009594 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>      return idx;
>  }
>
> +/**
> + * Set ready all vring of the device
> + *
> + * @dev: Vhost device
> + */
>  static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>  {
>      int i;
>      trace_vhost_vdpa_set_vring_ready(dev);
> -    for (i = 0; i < dev->nvqs; ++i) {
> +    for (i = 0; i < dev->vq_index_end; ++i) {
>          struct vhost_vring_state state = {
> -            .index = dev->vq_index + i,
> +            .index = i,

Looks like a cleanup or bugfix which deserves a separate patch?

>              .num = 1,
>          };
>          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> @@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          if (unlikely(!ok)) {
>              return -1;
>          }
> -        vhost_vdpa_set_vring_ready(dev);
>      } else {
>          ok = vhost_vdpa_svqs_stop(dev);
>          if (unlikely(!ok)) {
> @@ -1111,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>      }
>
>      if (started) {
> +        int r;
> +
>          memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        if (unlikely(r)) {
> +            return r;
> +        }
> +        vhost_vdpa_set_vring_ready(dev);

Interesting, does this mean we only enable the last two queues without
this patch?

Thanks

>      } else {
>          vhost_vdpa_reset_device(dev);
>          vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                     VIRTIO_CONFIG_S_DRIVER);
>          memory_listener_unregister(&v->listener);
> -
> -        return 0;
>      }
> +
> +    return 0;
>  }
>
>  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> --
> 2.31.1
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq
  2022-07-06 18:39 ` [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq Eugenio Pérez
@ 2022-07-08  9:12   ` Jason Wang
  2022-07-08 10:10     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-08  9:12 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> The used idx used to match with this, but it will not match from the
> moment we introduce svq_inject.

It might be better to explain what "svq_inject" means here.

> Rewind all the descriptors not used by
> vdpa device and get the vq state properly.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/hw/virtio/virtio.h | 1 +
>  hw/virtio/vhost-vdpa.c     | 7 +++----
>  hw/virtio/virtio.c         | 5 +++++
>  3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index db1c0ddf6b..4b51ab9d06 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
>  hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
>  hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
>  unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
> +unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
>  void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
>                                       unsigned int idx);
>  void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 2ee8009594..de76128030 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>                                         struct vhost_vring_state *ring)
>  {
>      struct vhost_vdpa *v = dev->opaque;
> -    int vdpa_idx = ring->index - dev->vq_index;
>      int ret;
>
>      if (v->shadow_vqs_enabled) {
> -        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> -
> +        const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>          /*
>           * Setting base as last used idx, so destination will see as available
>           * all the entries that the device did not use, including the in-flight
> @@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>           * TODO: This is ok for networking, but other kinds of devices might
>           * have problems with these retransmissions.
>           */
> -        ring->num = svq->last_used_idx;
> +        ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index) -
> +                    virtio_queue_get_in_use(vq);

I think we need to change the above comment as well otherwise readers
might get confused.

I wonder why we need to bother at this time. Is this an issue for
networking devices? And for block device, it's not sufficient since
there's no guarantee that the descriptor is handled in order?

Thanks

>          return 0;
>      }
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 5d607aeaa0..e02656f7a2 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3420,6 +3420,11 @@ unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
>      }
>  }
>
> +unsigned int virtio_queue_get_in_use(const VirtQueue *vq)
> +{
> +    return vq->inuse;
> +}
> +
>  static void virtio_queue_packed_set_last_avail_idx(VirtIODevice *vdev,
>                                                     int n, unsigned int idx)
>  {
> --
> 2.31.1
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK
  2022-07-08  9:06   ` Jason Wang
@ 2022-07-08  9:56     ` Eugenio Perez Martin
  2022-07-08  9:59     ` Eugenio Perez Martin
  1 sibling, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-08  9:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Fri, Jul 8, 2022 at 11:06 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > To restore the device in the destination of a live migration we send the
> > commands through control virtqueue. For a device to read CVQ it must
> > have received DRIVER_OK status bit.
> >
> > However this open a window where the device could start receiving
> > packets in rx queue 0 before it receive the RSS configuration. To avoid
> > that, we will not send vring_enable until all configuration is used by
> > the device.
> >
> > As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
>
> I wonder if it's better to delay this to the series that implements
> migration since the shadow cvq doesn't depends on this?
>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/virtio/vhost-vdpa.c | 22 ++++++++++++++++------
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 66f054a12c..2ee8009594 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> >      return idx;
> >  }
> >
> > +/**
> > + * Set ready all vring of the device
> > + *
> > + * @dev: Vhost device
> > + */
> >  static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >  {
> >      int i;
> >      trace_vhost_vdpa_set_vring_ready(dev);
> > -    for (i = 0; i < dev->nvqs; ++i) {
> > +    for (i = 0; i < dev->vq_index_end; ++i) {
> >          struct vhost_vring_state state = {
> > -            .index = dev->vq_index + i,
> > +            .index = i,
>
> Looks like a cleanup or bugfix which deserves a separate patch?
>
> >              .num = 1,
> >          };
> >          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > @@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >          if (unlikely(!ok)) {
> >              return -1;
> >          }
> > -        vhost_vdpa_set_vring_ready(dev);
> >      } else {
> >          ok = vhost_vdpa_svqs_stop(dev);
> >          if (unlikely(!ok)) {
> > @@ -1111,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >      }
> >
> >      if (started) {
> > +        int r;
> > +
> >          memory_listener_register(&v->listener, &address_space_memory);
> > -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        if (unlikely(r)) {
> > +            return r;
> > +        }
> > +        vhost_vdpa_set_vring_ready(dev);
>
> Interesting, does this mean we only enable the last two queues without
> this patch?
>

The function vhost_vdpa_set_vring_ready is changed in this patch.
Instead of enabling only the vrings of the device, it enables all the
vrings of the device, from 0 to dev->vq_index_end.

In the case of networking, vq_index_end changes depending on how CVQ
and MQ are negotiated or not, so we should be safe here.

Based on your comments it's clear that this is an unexpected change
and I need to add that description to the patch message :).

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK
  2022-07-08  9:06   ` Jason Wang
  2022-07-08  9:56     ` Eugenio Perez Martin
@ 2022-07-08  9:59     ` Eugenio Perez Martin
  1 sibling, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-08  9:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Fri, Jul 8, 2022 at 11:06 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > To restore the device in the destination of a live migration we send the
> > commands through control virtqueue. For a device to read CVQ it must
> > have received DRIVER_OK status bit.
> >
> > However this open a window where the device could start receiving
> > packets in rx queue 0 before it receive the RSS configuration. To avoid
> > that, we will not send vring_enable until all configuration is used by
> > the device.
> >
> > As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
>
> I wonder if it's better to delay this to the series that implements
> migration since the shadow cvq doesn't depends on this?
>

(Forgot to add) this series is already capable of doing migration with
CVQ. It's just that it must use SVQ from the moment the source VM
boots up, which is far from ideal.

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq
  2022-07-08  9:12   ` Jason Wang
@ 2022-07-08 10:10     ` Eugenio Perez Martin
  2022-07-12  7:42       ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-08 10:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Fri, Jul 8, 2022 at 11:12 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > The used idx used to match with this, but it will not match from the
> > moment we introduce svq_inject.
>
> It might be better to explain what "svq_inject" means here.
>

Good point, I'll change for the next version.

> > Rewind all the descriptors not used by
> > vdpa device and get the vq state properly.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  include/hw/virtio/virtio.h | 1 +
> >  hw/virtio/vhost-vdpa.c     | 7 +++----
> >  hw/virtio/virtio.c         | 5 +++++
> >  3 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index db1c0ddf6b..4b51ab9d06 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
> >  hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
> >  hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
> >  unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
> > +unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
> >  void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
> >                                       unsigned int idx);
> >  void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 2ee8009594..de76128030 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >                                         struct vhost_vring_state *ring)
> >  {
> >      struct vhost_vdpa *v = dev->opaque;
> > -    int vdpa_idx = ring->index - dev->vq_index;
> >      int ret;
> >
> >      if (v->shadow_vqs_enabled) {
> > -        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> > -
> > +        const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> >          /*
> >           * Setting base as last used idx, so destination will see as available
> >           * all the entries that the device did not use, including the in-flight
> > @@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >           * TODO: This is ok for networking, but other kinds of devices might
> >           * have problems with these retransmissions.
> >           */
> > -        ring->num = svq->last_used_idx;
> > +        ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index) -
> > +                    virtio_queue_get_in_use(vq);
>
> I think we need to change the above comment as well otherwise readers
> might get confused.
>

Re-thinking this: This part has always been buggy, so this is actually
a fix. I'll tag it for next versions or, even better, send it
separately.

But the comment still holds: We cannot use the device's used idx since
it could not match with the guest visible one. This is actually easy
to trigger if we migrate a guest many times with traffic.

Maybe it's cleaner to export directly used_idx from VirtQueue? Extra
care is needed with packed vq, but SVQ still does not support it. I
didn't want to duplicate that logic in virtio ring handling.

> I wonder why we need to bother at this time. Is this an issue for
> networking devices?

Every device has this issue when migrating as soon as the device's
used index is not the same as the guest's one.

> And for block device, it's not sufficient since
> there's no guarantee that the descriptor is handled in order?
>

Right, that part still hold here.

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-07-07  6:23   ` Markus Armbruster
@ 2022-07-08 10:53     ` Eugenio Perez Martin
  2022-07-08 12:51       ` Markus Armbruster
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-08 10:53 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-level, Liuxiangdong, Harpreet Singh Anand, Eric Blake,
	Laurent Vivier, Parav Pandit, Cornelia Huck, Paolo Bonzini,
	Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

On Thu, Jul 7, 2022 at 8:23 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Eugenio Pérez <eperezma@redhat.com> writes:
>
> > Finally offering the possibility to enable SVQ from the command line.
>
> QMP, too, I guess.
>

Hi Markus,

I'm not sure what you mean. Dynamic enabling / disabling of SVQ was
delayed, and now it's only possible to enable or disable it from the
beginning of the run of qemu. Do you mean to enable SVQ before
starting the guest somehow?

Thanks!

> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  qapi/net.json    |  9 +++++-
> >  net/vhost-vdpa.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 77 insertions(+), 4 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 9af11e9a3b..75ba2cb989 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -445,12 +445,19 @@
> >  # @queues: number of queues to be created for multiqueue vhost-vdpa
> >  #          (default: 1)
> >  #
> > +# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
> > +#         (default: false)
> > +#
> > +# Features:
> > +# @unstable: Member @x-svq is experimental.
> > +#
> >  # Since: 5.1
> >  ##
> >  { 'struct': 'NetdevVhostVDPAOptions',
> >    'data': {
> >      '*vhostdev':     'str',
> > -    '*queues':       'int' } }
> > +    '*queues':       'int',
> > +    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
> >
> >  ##
>
> QAPI schema:
> Acked-by: Markus Armbruster <armbru@redhat.com>
>
> [...]
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-07-08 10:53     ` Eugenio Perez Martin
@ 2022-07-08 12:51       ` Markus Armbruster
  2022-07-11  7:14         ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Markus Armbruster @ 2022-07-08 12:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Harpreet Singh Anand, Eric Blake,
	Laurent Vivier, Parav Pandit, Cornelia Huck, Paolo Bonzini,
	Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

Eugenio Perez Martin <eperezma@redhat.com> writes:

> On Thu, Jul 7, 2022 at 8:23 AM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Eugenio Pérez <eperezma@redhat.com> writes:
>>
>> > Finally offering the possibility to enable SVQ from the command line.
>>
>> QMP, too, I guess.
>>
>
> Hi Markus,
>
> I'm not sure what you mean. Dynamic enabling / disabling of SVQ was
> delayed, and now it's only possible to enable or disable it from the
> beginning of the run of qemu. Do you mean to enable SVQ before
> starting the guest somehow?

QMP command netdev_add takes a Netdev argument.  Branch 'vhost-vdpa' has
member x-svq.  Are you telling me it doesn't work there?  Or only before
the guest runs?

[...]



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-07-08 12:51       ` Markus Armbruster
@ 2022-07-11  7:14         ` Eugenio Perez Martin
  0 siblings, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-11  7:14 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-level, Liuxiangdong, Harpreet Singh Anand, Eric Blake,
	Laurent Vivier, Parav Pandit, Cornelia Huck, Paolo Bonzini,
	Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu, Jason Wang

On Fri, Jul 8, 2022 at 2:51 PM Markus Armbruster <armbru@redhat.com> wrote:
>
> Eugenio Perez Martin <eperezma@redhat.com> writes:
>
> > On Thu, Jul 7, 2022 at 8:23 AM Markus Armbruster <armbru@redhat.com> wrote:
> >>
> >> Eugenio Pérez <eperezma@redhat.com> writes:
> >>
> >> > Finally offering the possibility to enable SVQ from the command line.
> >>
> >> QMP, too, I guess.
> >>
> >
> > Hi Markus,
> >
> > I'm not sure what you mean. Dynamic enabling / disabling of SVQ was
> > delayed, and now it's only possible to enable or disable it from the
> > beginning of the run of qemu. Do you mean to enable SVQ before
> > starting the guest somehow?
>
> QMP command netdev_add takes a Netdev argument.  Branch 'vhost-vdpa' has
> member x-svq.  Are you telling me it doesn't work there?  Or only before
> the guest runs?
>

Oh, that's right, adding a device via QMP works as you describe.

Thanks!

> [...]
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement
  2022-07-06 18:39 ` [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement Eugenio Pérez
@ 2022-07-11  8:00   ` Jason Wang
  2022-07-11  8:27     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-11  8:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:39, Eugenio Pérez 写道:
> VirtQueueElement comes from the guest, but we're heading SVQ to be able
> to inject element without the guest's knowledge.
>
> To do so, make this accept sg buffers directly, instead of using
> VirtQueueElement.
>
> Add vhost_svq_add_element to maintain element convenience
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++---------
>   1 file changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 2fc5789b73..46d3c1d74f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -172,30 +172,32 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>   }
>   
>   static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -                                VirtQueueElement *elem, unsigned *head)
> +                                const struct iovec *out_sg, size_t out_num,
> +                                const struct iovec *in_sg, size_t in_num,
> +                                unsigned *head)
>   {
>       unsigned avail_idx;
>       vring_avail_t *avail = svq->vring.avail;
>       bool ok;
> -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> +    g_autofree hwaddr *sgs = NULL;


Is this change a must for this patch? (looks not related to the 
decoupling anyhow)

Other looks good.

Thanks


>   
>       *head = svq->free_head;
>   
>       /* We need some descriptors here */
> -    if (unlikely(!elem->out_num && !elem->in_num)) {
> +    if (unlikely(!out_num && !in_num)) {
>           qemu_log_mask(LOG_GUEST_ERROR,
>                         "Guest provided element with no descriptors");
>           return false;
>       }
>   
> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> -                                     elem->in_num > 0, false);
> +    sgs = g_new(hwaddr, MAX(out_num, in_num));
> +    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
> +                                     false);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> -                                     true);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
>       if (unlikely(!ok)) {
>           /* TODO unwind out_sg */
>           return false;
> @@ -223,10 +225,13 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>    * takes ownership of the element: In case of failure, it is free and the SVQ
>    * is considered broken.
>    */
> -static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> +                          size_t out_num, const struct iovec *in_sg,
> +                          size_t in_num, VirtQueueElement *elem)
>   {
>       unsigned qemu_head;
> -    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
> +    bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
> +                                  &qemu_head);
>       if (unlikely(!ok)) {
>           g_free(elem);
>           return false;
> @@ -250,6 +255,18 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> +static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
> +                                  VirtQueueElement *elem)
> +{
> +    bool ok = vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
> +                            elem->in_num, elem);
> +    if (ok) {
> +        vhost_svq_kick(svq);
> +    }
> +
> +    return ok;
> +}
> +
>   /**
>    * Forward available buffers.
>    *
> @@ -302,12 +319,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>                   return;
>               }
>   
> -            ok = vhost_svq_add(svq, elem);
> +            ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
>               if (unlikely(!ok)) {
>                   /* VQ is broken, just return and ignore any other kicks */
>                   return;
>               }
> -            vhost_svq_kick(svq);
>           }
>   
>           virtio_queue_set_notification(svq->vq, true);



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement
  2022-07-11  8:00   ` Jason Wang
@ 2022-07-11  8:27     ` Eugenio Perez Martin
  2022-07-12  7:43       ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-11  8:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Mon, Jul 11, 2022 at 10:00 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> > VirtQueueElement comes from the guest, but we're heading SVQ to be able
> > to inject element without the guest's knowledge.
> >
> > To do so, make this accept sg buffers directly, instead of using
> > VirtQueueElement.
> >
> > Add vhost_svq_add_element to maintain element convenience
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++---------
> >   1 file changed, 27 insertions(+), 11 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 2fc5789b73..46d3c1d74f 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -172,30 +172,32 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> >   }
> >
> >   static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > -                                VirtQueueElement *elem, unsigned *head)
> > +                                const struct iovec *out_sg, size_t out_num,
> > +                                const struct iovec *in_sg, size_t in_num,
> > +                                unsigned *head)
> >   {
> >       unsigned avail_idx;
> >       vring_avail_t *avail = svq->vring.avail;
> >       bool ok;
> > -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > +    g_autofree hwaddr *sgs = NULL;
>
>
> Is this change a must for this patch? (looks not related to the
> decoupling anyhow)
>

Right, the delay on the variable assignment is an artifact I missed in
the cleaning. I can revert for the next version if any.

With that reverted, can I add the acked-by tag from you?

Thanks!

> Other looks good.
>
> Thanks
>
>
> >
> >       *head = svq->free_head;
> >
> >       /* We need some descriptors here */
> > -    if (unlikely(!elem->out_num && !elem->in_num)) {
> > +    if (unlikely(!out_num && !in_num)) {
> >           qemu_log_mask(LOG_GUEST_ERROR,
> >                         "Guest provided element with no descriptors");
> >           return false;
> >       }
> >
> > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> > -                                     elem->in_num > 0, false);
> > +    sgs = g_new(hwaddr, MAX(out_num, in_num));
> > +    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
> > +                                     false);
> >       if (unlikely(!ok)) {
> >           return false;
> >       }
> >
> > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> > -                                     true);
> > +    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
> >       if (unlikely(!ok)) {
> >           /* TODO unwind out_sg */
> >           return false;
> > @@ -223,10 +225,13 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> >    * takes ownership of the element: In case of failure, it is free and the SVQ
> >    * is considered broken.
> >    */
> > -static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > +static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> > +                          size_t out_num, const struct iovec *in_sg,
> > +                          size_t in_num, VirtQueueElement *elem)
> >   {
> >       unsigned qemu_head;
> > -    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
> > +    bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
> > +                                  &qemu_head);
> >       if (unlikely(!ok)) {
> >           g_free(elem);
> >           return false;
> > @@ -250,6 +255,18 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> >       event_notifier_set(&svq->hdev_kick);
> >   }
> >
> > +static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
> > +                                  VirtQueueElement *elem)
> > +{
> > +    bool ok = vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
> > +                            elem->in_num, elem);
> > +    if (ok) {
> > +        vhost_svq_kick(svq);
> > +    }
> > +
> > +    return ok;
> > +}
> > +
> >   /**
> >    * Forward available buffers.
> >    *
> > @@ -302,12 +319,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >                   return;
> >               }
> >
> > -            ok = vhost_svq_add(svq, elem);
> > +            ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
> >               if (unlikely(!ok)) {
> >                   /* VQ is broken, just return and ignore any other kicks */
> >                   return;
> >               }
> > -            vhost_svq_kick(svq);
> >           }
> >
> >           virtio_queue_set_notification(svq->vq, true);
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 09/23] vhost: Add SVQElement
  2022-07-06 18:39 ` [RFC PATCH v9 09/23] vhost: Add SVQElement Eugenio Pérez
@ 2022-07-11  9:00   ` Jason Wang
  2022-07-11  9:33     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-11  9:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:39, Eugenio Pérez 写道:
> This will allow SVQ to add metadata to the different queue elements. To
> simplify changes, only store actual element at this patch.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  8 ++++--
>   hw/virtio/vhost-shadow-virtqueue.c | 41 ++++++++++++++++++++----------
>   2 files changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 0fbdd69153..e434dc63b0 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,10 @@
>   #include "standard-headers/linux/vhost_types.h"
>   #include "hw/virtio/vhost-iova-tree.h"
>   
> +typedef struct SVQElement {
> +    VirtQueueElement *elem;
> +} SVQElement;
> +
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
>                                       void *opaque);
> @@ -55,8 +59,8 @@ typedef struct VhostShadowVirtqueue {
>       /* IOVA mapping */
>       VhostIOVATree *iova_tree;
>   
> -    /* Map for use the guest's descriptors */
> -    VirtQueueElement **ring_id_maps;
> +    /* Each element context */
> +    SVQElement *ring_id_maps;
>   
>       /* Next VirtQueue element that guest made available */
>       VirtQueueElement *next_guest_avail_elem;
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 46d3c1d74f..913bca8769 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -237,7 +237,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>           return false;
>       }
>   
> -    svq->ring_id_maps[qemu_head] = elem;
> +    svq->ring_id_maps[qemu_head].elem = elem;
>       return true;
>   }
>   
> @@ -385,15 +385,25 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
>       return i;
>   }
>   
> -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> -                                           uint32_t *len)
> +static bool vhost_svq_is_empty_elem(SVQElement elem)
> +{
> +    return elem.elem == NULL;
> +}
> +
> +static SVQElement vhost_svq_empty_elem(void)
> +{
> +    return (SVQElement){};
> +}


I wonder what's the benefit of using this instead of passing pointer to 
SVQElement and using memset().


> +
> +static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
>   {
>       const vring_used_t *used = svq->vring.used;
>       vring_used_elem_t used_elem;
> +    SVQElement svq_elem = vhost_svq_empty_elem();
>       uint16_t last_used, last_used_chain, num;
>   
>       if (!vhost_svq_more_used(svq)) {
> -        return NULL;
> +        return svq_elem;
>       }
>   
>       /* Only get used array entries after they have been exposed by dev */
> @@ -406,24 +416,25 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>       if (unlikely(used_elem.id >= svq->vring.num)) {
>           qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
>                         svq->vdev->name, used_elem.id);
> -        return NULL;
> +        return svq_elem;
>       }
>   
> -    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +    svq_elem = svq->ring_id_maps[used_elem.id];
> +    svq->ring_id_maps[used_elem.id] = vhost_svq_empty_elem();
> +    if (unlikely(vhost_svq_is_empty_elem(svq_elem))) {


Any reason we can't simply assign NULL to ring_id_maps[used_elem.id]?

Thanks


>           qemu_log_mask(LOG_GUEST_ERROR,
>               "Device %s says index %u is used, but it was not available",
>               svq->vdev->name, used_elem.id);
> -        return NULL;
> +        return svq_elem;
>       }
>   
> -    num = svq->ring_id_maps[used_elem.id]->in_num +
> -          svq->ring_id_maps[used_elem.id]->out_num;
> +    num = svq_elem.elem->in_num + svq_elem.elem->out_num;
>       last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
>       svq->desc_next[last_used_chain] = svq->free_head;
>       svq->free_head = used_elem.id;
>   
>       *len = used_elem.len;
> -    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> +    return svq_elem;
>   }
>   
>   /**
> @@ -454,6 +465,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>           vhost_svq_disable_notification(svq);
>           while (true) {
>               uint32_t len;
> +            SVQElement svq_elem;
>               g_autofree VirtQueueElement *elem = NULL;
>   
>               if (unlikely(i >= svq->vring.num)) {
> @@ -464,11 +476,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                   return;
>               }
>   
> -            elem = vhost_svq_get_buf(svq, &len);
> -            if (!elem) {
> +            svq_elem = vhost_svq_get_buf(svq, &len);
> +            if (vhost_svq_is_empty_elem(svq_elem)) {
>                   break;
>               }
>   
> +            elem = g_steal_pointer(&svq_elem.elem);
>               virtqueue_fill(vq, elem, len, i++);
>           }
>   
> @@ -611,7 +624,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
>       memset(svq->vring.used, 0, device_size);
> -    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
> +    svq->ring_id_maps = g_new0(SVQElement, svq->vring.num);
>       svq->desc_next = g_new0(uint16_t, svq->vring.num);
>       for (unsigned i = 0; i < svq->vring.num - 1; i++) {
>           svq->desc_next[i] = cpu_to_le16(i + 1);
> @@ -636,7 +649,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   
>       for (unsigned i = 0; i < svq->vring.num; ++i) {
>           g_autofree VirtQueueElement *elem = NULL;
> -        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
>           if (elem) {
>               virtqueue_detach_element(svq->vq, elem, 0);
>           }



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 11/23] vhost: Move last chain id to SVQ element
  2022-07-06 18:39 ` [RFC PATCH v9 11/23] vhost: Move last chain id to SVQ element Eugenio Pérez
@ 2022-07-11  9:02   ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-11  9:02 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:39, Eugenio Pérez 写道:
> We will allow SVQ user to store opaque data for each element, so its
> easier if we store this kind of information just at avail.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Note that kernel driver doesn't have this optimization so far.  I wonder 
if this is not a must, let's post this on top of the shadow CVQ stuffs.

Thanks


> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  3 +++
>   hw/virtio/vhost-shadow-virtqueue.c | 14 ++++++++------
>   2 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index e434dc63b0..0e434e9fd0 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -17,6 +17,9 @@
>   
>   typedef struct SVQElement {
>       VirtQueueElement *elem;
> +
> +    /* Last descriptor of the chain */
> +    uint32_t last_chain_id;
>   } SVQElement;
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index cf1745fd4d..c5e49e51c5 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -239,7 +239,9 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>                             size_t out_num, const struct iovec *in_sg,
>                             size_t in_num, VirtQueueElement *elem)
>   {
> +    SVQElement *svq_elem;
>       unsigned qemu_head;
> +    size_t n;
>       bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
>                                     &qemu_head);
>       if (unlikely(!ok)) {
> @@ -247,7 +249,10 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>           return false;
>       }
>   
> -    svq->ring_id_maps[qemu_head].elem = elem;
> +    n = out_num + in_num;
> +    svq_elem = &svq->ring_id_maps[qemu_head];
> +    svq_elem->elem = elem;
> +    svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
>       return true;
>   }
>   
> @@ -400,7 +405,7 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
>       const vring_used_t *used = svq->vring.used;
>       vring_used_elem_t used_elem;
>       SVQElement svq_elem = vhost_svq_empty_elem();
> -    uint16_t last_used, last_used_chain, num;
> +    uint16_t last_used;
>   
>       if (!vhost_svq_more_used(svq)) {
>           return svq_elem;
> @@ -428,11 +433,8 @@ static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
>           return svq_elem;
>       }
>   
> -    num = svq_elem.elem->in_num + svq_elem.elem->out_num;
> -    last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
> -    svq->desc_next[last_used_chain] = svq->free_head;
> +    svq->desc_next[svq_elem.last_chain_id] = svq->free_head;
>       svq->free_head = used_elem.id;
> -
>       *len = used_elem.len;
>       return svq_elem;
>   }



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement
  2022-07-06 18:39 ` [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement Eugenio Pérez
@ 2022-07-11  9:05   ` Jason Wang
  2022-07-11  9:56     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-11  9:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:39, Eugenio Pérez 写道:
> When qemu injects buffers to the vdpa device it will be used to maintain
> contextual data. If SVQ has no operation, it will be used to maintain
> the VirtQueueElement pointer.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
>   hw/virtio/vhost-shadow-virtqueue.c | 13 +++++++------
>   2 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 0e434e9fd0..a811f90e01 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -16,7 +16,8 @@
>   #include "hw/virtio/vhost-iova-tree.h"
>   
>   typedef struct SVQElement {
> -    VirtQueueElement *elem;
> +    /* Opaque data */
> +    void *opaque;


So I wonder if we can simply:

1) introduce a opaque to VirtQueueElement
2) store pointers to ring_id_maps

Since

1) VirtQueueElement's member looks general
2) help to reduce the tricky codes like vhost_svq_empty_elem() and 
vhost_svq_empty_elem().

Thanks


>   
>       /* Last descriptor of the chain */
>       uint32_t last_chain_id;
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index c5e49e51c5..492bb12b5f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -237,7 +237,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
>    */
>   static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>                             size_t out_num, const struct iovec *in_sg,
> -                          size_t in_num, VirtQueueElement *elem)
> +                          size_t in_num, void *opaque)
>   {
>       SVQElement *svq_elem;
>       unsigned qemu_head;
> @@ -245,13 +245,12 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>       bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
>                                     &qemu_head);
>       if (unlikely(!ok)) {
> -        g_free(elem);
>           return false;
>       }
>   
>       n = out_num + in_num;
>       svq_elem = &svq->ring_id_maps[qemu_head];
> -    svq_elem->elem = elem;
> +    svq_elem->opaque = opaque;
>       svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
>       return true;
>   }
> @@ -277,6 +276,8 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
>                               elem->in_num, elem);
>       if (ok) {
>           vhost_svq_kick(svq);
> +    } else {
> +        g_free(elem);
>       }
>   
>       return ok;
> @@ -392,7 +393,7 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
>   
>   static bool vhost_svq_is_empty_elem(SVQElement elem)
>   {
> -    return elem.elem == NULL;
> +    return elem.opaque == NULL;
>   }
>   
>   static SVQElement vhost_svq_empty_elem(void)
> @@ -483,7 +484,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                   break;
>               }
>   
> -            elem = g_steal_pointer(&svq_elem.elem);
> +            elem = g_steal_pointer(&svq_elem.opaque);
>               virtqueue_fill(vq, elem, len, i++);
>           }
>   
> @@ -651,7 +652,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   
>       for (unsigned i = 0; i < svq->vring.num; ++i) {
>           g_autofree VirtQueueElement *elem = NULL;
> -        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
> +        elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
>           if (elem) {
>               virtqueue_detach_element(svq->vq, elem, 0);
>           }



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject
  2022-07-06 18:39 ` [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject Eugenio Pérez
@ 2022-07-11  9:14   ` Jason Wang
  2022-07-11  9:43     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-11  9:14 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:39, Eugenio Pérez 写道:
> This allows qemu to inject buffers to the device.


Not a native speaker but we probably need a better terminology than 
inject here.

Since the CVQ is totally under the control of the Qemu anyhow.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 34 ++++++++++++++++++++++++++++++
>   2 files changed, 36 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index a811f90e01..d01d2370db 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -98,6 +98,8 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
>   
>   void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
>                            const VirtQueueElement *elem, uint32_t len);
> +int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> +                     size_t out_num, size_t in_num, void *opaque);
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
>   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 492bb12b5f..bd9e34b413 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -283,6 +283,40 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
>       return ok;
>   }
>   
> +/**
> + * Inject a chain of buffers to the device
> + *
> + * @svq: Shadow VirtQueue
> + * @iov: I/O vector
> + * @out_num: Number of front out descriptors
> + * @in_num: Number of last input descriptors
> + * @opaque: Contextual data to store in descriptor
> + *
> + * Return 0 on success, -ENOMEM if cannot inject
> + */
> +int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> +                     size_t out_num, size_t in_num, void *opaque)


If we manage to embed opaque into VirtqueueElement, we can simply use 
vhost_svq_add() here.

Thanks


> +{
> +    bool ok;
> +
> +    /*
> +     * All vhost_svq_inject calls are controlled by qemu so we won't hit this
> +     * assertions.
> +     */
> +    assert(out_num || in_num);
> +    assert(svq->ops);
> +
> +    if (unlikely(svq->next_guest_avail_elem)) {
> +        error_report("Injecting in a full queue");
> +        return -ENOMEM;
> +    }
> +
> +    ok = vhost_svq_add(svq, iov, out_num, iov + out_num, in_num, opaque);
> +    assert(ok);
> +    vhost_svq_kick(svq);
> +    return 0;
> +}
> +
>   /**
>    * Forward available buffers.
>    *



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 14/23] vhost: add vhost_svq_poll
  2022-07-06 18:39 ` [RFC PATCH v9 14/23] vhost: add vhost_svq_poll Eugenio Pérez
@ 2022-07-11  9:19   ` Jason Wang
  2022-07-11 17:52     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-11  9:19 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:39, Eugenio Pérez 写道:
> It allows the Shadow Control VirtQueue to wait the device to use the commands
> that restore the net device state after a live migration.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 54 ++++++++++++++++++++++++++++--
>   2 files changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index d01d2370db..c8668fbdd6 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -100,6 +100,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
>                            const VirtQueueElement *elem, uint32_t len);
>   int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
>                        size_t out_num, size_t in_num, void *opaque);
> +ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq);
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
>   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index bd9e34b413..ed7f1d0bc9 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -10,6 +10,8 @@
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
> +#include <glib/gpoll.h>
> +
>   #include "qemu/error-report.h"
>   #include "qapi/error.h"
>   #include "qemu/main-loop.h"
> @@ -490,10 +492,11 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
>       }
>   }
>   
> -static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> -                            bool check_for_avail_queue)
> +static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                              bool check_for_avail_queue)
>   {
>       VirtQueue *vq = svq->vq;
> +    size_t ret = 0;
>   
>       /* Forward as many used buffers as possible. */
>       do {
> @@ -510,7 +513,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                            "More than %u used buffers obtained in a %u size SVQ",
>                            i, svq->vring.num);
>                   virtqueue_flush(vq, svq->vring.num);
> -                return;
> +                return ret;
>               }
>   
>               svq_elem = vhost_svq_get_buf(svq, &len);
> @@ -520,6 +523,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>   
>               elem = g_steal_pointer(&svq_elem.opaque);
>               virtqueue_fill(vq, elem, len, i++);
> +            ret++;
>           }
>   
>           virtqueue_flush(vq, i);
> @@ -533,6 +537,50 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>               vhost_handle_guest_kick(svq);
>           }
>       } while (!vhost_svq_enable_notification(svq));
> +
> +    return ret;
> +}
> +
> +/**
> + * Poll the SVQ for device used buffers.
> + *
> + * This function race with main event loop SVQ polling, so extra
> + * synchronization is needed.
> + *
> + * Return the number of descriptors read from the device.
> + */
> +ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq)
> +{
> +    int fd = event_notifier_get_fd(&svq->hdev_call);
> +    GPollFD poll_fd = {
> +        .fd = fd,
> +        .events = G_IO_IN,
> +    };
> +    assert(fd >= 0);
> +    int r = g_poll(&poll_fd, 1, -1);


Any reason we can't simply (busy) polling the used ring here? It might 
help to reduce the latency (and it is what kernel driver uses).

Thanks


> +
> +    if (unlikely(r < 0)) {
> +        error_report("Cannot poll device call fd "G_POLLFD_FORMAT": (%d) %s",
> +                     poll_fd.fd, errno, g_strerror(errno));
> +        return -errno;
> +    }
> +
> +    if (r == 0) {
> +        return 0;
> +    }
> +
> +    if (unlikely(poll_fd.revents & ~(G_IO_IN))) {
> +        error_report(
> +            "Error polling device call fd "G_POLLFD_FORMAT": revents=%d",
> +            poll_fd.fd, poll_fd.revents);
> +        return -1;
> +    }
> +
> +    /*
> +     * Max return value of vhost_svq_flush is (uint16_t)-1, so it's safe to
> +     * convert to ssize_t.
> +     */
> +    return vhost_svq_flush(svq, false);
>   }
>   
>   /**



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 18/23] vdpa: Export vhost_vdpa_dma_map and unmap calls
  2022-07-06 18:40 ` [RFC PATCH v9 18/23] vdpa: Export vhost_vdpa_dma_map and unmap calls Eugenio Pérez
@ 2022-07-11  9:22   ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-11  9:22 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:40, Eugenio Pérez 写道:
> Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks that
> can set a different state in qemu device model and vdpa device.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   include/hw/virtio/vhost-vdpa.h | 4 ++++
>   hw/virtio/vhost-vdpa.c         | 7 +++----
>   2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index a29dbb3f53..7214eb47dc 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -39,4 +39,8 @@ typedef struct vhost_vdpa {
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
>   
> +int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> +                       void *vaddr, bool readonly);
> +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
> +
>   #endif
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 69cfaf05d6..613c3483b0 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -71,8 +71,8 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>       return false;
>   }
>   
> -static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> -                              void *vaddr, bool readonly)
> +int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> +                       void *vaddr, bool readonly)
>   {
>       struct vhost_msg_v2 msg = {};
>       int fd = v->device_fd;
> @@ -97,8 +97,7 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
>       return ret;
>   }
>   
> -static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> -                                hwaddr size)
> +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
>   {
>       struct vhost_msg_v2 msg = {};
>       int fd = v->device_fd;



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 09/23] vhost: Add SVQElement
  2022-07-11  9:00   ` Jason Wang
@ 2022-07-11  9:33     ` Eugenio Perez Martin
  2022-07-12  7:49       ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-11  9:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Mon, Jul 11, 2022 at 11:00 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> > This will allow SVQ to add metadata to the different queue elements. To
> > simplify changes, only store actual element at this patch.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  8 ++++--
> >   hw/virtio/vhost-shadow-virtqueue.c | 41 ++++++++++++++++++++----------
> >   2 files changed, 33 insertions(+), 16 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 0fbdd69153..e434dc63b0 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -15,6 +15,10 @@
> >   #include "standard-headers/linux/vhost_types.h"
> >   #include "hw/virtio/vhost-iova-tree.h"
> >
> > +typedef struct SVQElement {
> > +    VirtQueueElement *elem;
> > +} SVQElement;
> > +
> >   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> >   typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
> >                                       void *opaque);
> > @@ -55,8 +59,8 @@ typedef struct VhostShadowVirtqueue {
> >       /* IOVA mapping */
> >       VhostIOVATree *iova_tree;
> >
> > -    /* Map for use the guest's descriptors */
> > -    VirtQueueElement **ring_id_maps;
> > +    /* Each element context */
> > +    SVQElement *ring_id_maps;
> >
> >       /* Next VirtQueue element that guest made available */
> >       VirtQueueElement *next_guest_avail_elem;
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 46d3c1d74f..913bca8769 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -237,7 +237,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> >           return false;
> >       }
> >
> > -    svq->ring_id_maps[qemu_head] = elem;
> > +    svq->ring_id_maps[qemu_head].elem = elem;
> >       return true;
> >   }
> >
> > @@ -385,15 +385,25 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
> >       return i;
> >   }
> >
> > -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> > -                                           uint32_t *len)
> > +static bool vhost_svq_is_empty_elem(SVQElement elem)
> > +{
> > +    return elem.elem == NULL;
> > +}
> > +
> > +static SVQElement vhost_svq_empty_elem(void)
> > +{
> > +    return (SVQElement){};
> > +}
>
>
> I wonder what's the benefit of using this instead of passing pointer to
> SVQElement and using memset().
>

It was a more direct translation of the previous workflow but we can
use memset here for sure.

>
> > +
> > +static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
> >   {
> >       const vring_used_t *used = svq->vring.used;
> >       vring_used_elem_t used_elem;
> > +    SVQElement svq_elem = vhost_svq_empty_elem();
> >       uint16_t last_used, last_used_chain, num;
> >
> >       if (!vhost_svq_more_used(svq)) {
> > -        return NULL;
> > +        return svq_elem;
> >       }
> >
> >       /* Only get used array entries after they have been exposed by dev */
> > @@ -406,24 +416,25 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> >       if (unlikely(used_elem.id >= svq->vring.num)) {
> >           qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
> >                         svq->vdev->name, used_elem.id);
> > -        return NULL;
> > +        return svq_elem;
> >       }
> >
> > -    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> > +    svq_elem = svq->ring_id_maps[used_elem.id];
> > +    svq->ring_id_maps[used_elem.id] = vhost_svq_empty_elem();
> > +    if (unlikely(vhost_svq_is_empty_elem(svq_elem))) {
>
>
> Any reason we can't simply assign NULL to ring_id_maps[used_elem.id]?
>

It simply avoids allocating more memory, so error code paths are
simplified, etc. In the kernel, vring_desc_state_split, desc_extra and
similar are not an array of pointers but an array of states, so we
apply the same here. Returning them by value it's not so common
though.

But we can allocate a state per in-flight descriptor for sure.

Thanks!


> Thanks
>
>
> >           qemu_log_mask(LOG_GUEST_ERROR,
> >               "Device %s says index %u is used, but it was not available",
> >               svq->vdev->name, used_elem.id);
> > -        return NULL;
> > +        return svq_elem;
> >       }
> >
> > -    num = svq->ring_id_maps[used_elem.id]->in_num +
> > -          svq->ring_id_maps[used_elem.id]->out_num;
> > +    num = svq_elem.elem->in_num + svq_elem.elem->out_num;
> >       last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
> >       svq->desc_next[last_used_chain] = svq->free_head;
> >       svq->free_head = used_elem.id;
> >
> >       *len = used_elem.len;
> > -    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > +    return svq_elem;
> >   }
> >
> >   /**
> > @@ -454,6 +465,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >           vhost_svq_disable_notification(svq);
> >           while (true) {
> >               uint32_t len;
> > +            SVQElement svq_elem;
> >               g_autofree VirtQueueElement *elem = NULL;
> >
> >               if (unlikely(i >= svq->vring.num)) {
> > @@ -464,11 +476,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                   return;
> >               }
> >
> > -            elem = vhost_svq_get_buf(svq, &len);
> > -            if (!elem) {
> > +            svq_elem = vhost_svq_get_buf(svq, &len);
> > +            if (vhost_svq_is_empty_elem(svq_elem)) {
> >                   break;
> >               }
> >
> > +            elem = g_steal_pointer(&svq_elem.elem);
> >               virtqueue_fill(vq, elem, len, i++);
> >           }
> >
> > @@ -611,7 +624,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> >       memset(svq->vring.desc, 0, driver_size);
> >       svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
> >       memset(svq->vring.used, 0, device_size);
> > -    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
> > +    svq->ring_id_maps = g_new0(SVQElement, svq->vring.num);
> >       svq->desc_next = g_new0(uint16_t, svq->vring.num);
> >       for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> >           svq->desc_next[i] = cpu_to_le16(i + 1);
> > @@ -636,7 +649,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >
> >       for (unsigned i = 0; i < svq->vring.num; ++i) {
> >           g_autofree VirtQueueElement *elem = NULL;
> > -        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > +        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
> >           if (elem) {
> >               virtqueue_detach_element(svq->vq, elem, 0);
> >           }
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject
  2022-07-11  9:14   ` Jason Wang
@ 2022-07-11  9:43     ` Eugenio Perez Martin
  2022-07-12  7:58       ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-11  9:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Mon, Jul 11, 2022 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> > This allows qemu to inject buffers to the device.
>
>
> Not a native speaker but we probably need a better terminology than
> inject here.
>
> Since the CVQ is totally under the control of the Qemu anyhow.
>

I'm totally fine to change terminology

>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> >   hw/virtio/vhost-shadow-virtqueue.c | 34 ++++++++++++++++++++++++++++++
> >   2 files changed, 36 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index a811f90e01..d01d2370db 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -98,6 +98,8 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
> >
> >   void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
> >                            const VirtQueueElement *elem, uint32_t len);
> > +int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> > +                     size_t out_num, size_t in_num, void *opaque);
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
> >   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 492bb12b5f..bd9e34b413 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -283,6 +283,40 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
> >       return ok;
> >   }
> >
> > +/**
> > + * Inject a chain of buffers to the device
> > + *
> > + * @svq: Shadow VirtQueue
> > + * @iov: I/O vector
> > + * @out_num: Number of front out descriptors
> > + * @in_num: Number of last input descriptors
> > + * @opaque: Contextual data to store in descriptor
> > + *
> > + * Return 0 on success, -ENOMEM if cannot inject
> > + */
> > +int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> > +                     size_t out_num, size_t in_num, void *opaque)
>
>
> If we manage to embed opaque into VirtqueueElement, we can simply use
> vhost_svq_add() here.
>

That works fine as long as SVQ only forwards elements, but it needs to
do more than that: We need to inject new elements without guest
notice.

How could we track elements that do not have corresponding
VirtQueueElement, like the elements sent to restore the status at the
LM destination?

I'll try to make it clearer in the patch message.

Thanks!

> Thanks
>
>
> > +{
> > +    bool ok;
> > +
> > +    /*
> > +     * All vhost_svq_inject calls are controlled by qemu so we won't hit this
> > +     * assertions.
> > +     */
> > +    assert(out_num || in_num);
> > +    assert(svq->ops);
> > +
> > +    if (unlikely(svq->next_guest_avail_elem)) {
> > +        error_report("Injecting in a full queue");
> > +        return -ENOMEM;
> > +    }
> > +
> > +    ok = vhost_svq_add(svq, iov, out_num, iov + out_num, in_num, opaque);
> > +    assert(ok);
> > +    vhost_svq_kick(svq);
> > +    return 0;
> > +}
> > +
> >   /**
> >    * Forward available buffers.
> >    *
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement
  2022-07-11  9:05   ` Jason Wang
@ 2022-07-11  9:56     ` Eugenio Perez Martin
  2022-07-12  7:53       ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-11  9:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Mon, Jul 11, 2022 at 11:05 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> > When qemu injects buffers to the vdpa device it will be used to maintain
> > contextual data. If SVQ has no operation, it will be used to maintain
> > the VirtQueueElement pointer.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
> >   hw/virtio/vhost-shadow-virtqueue.c | 13 +++++++------
> >   2 files changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 0e434e9fd0..a811f90e01 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -16,7 +16,8 @@
> >   #include "hw/virtio/vhost-iova-tree.h"
> >
> >   typedef struct SVQElement {
> > -    VirtQueueElement *elem;
> > +    /* Opaque data */
> > +    void *opaque;
>
>
> So I wonder if we can simply:
>
> 1) introduce a opaque to VirtQueueElement

(answered in other thread, pasting here for completion)

It does not work for messages that are not generated by the guest. For
example, the ones used to restore the device state at live migration
destination.

> 2) store pointers to ring_id_maps
>

I think you mean to keep storing VirtQueueElement at ring_id_maps?
Otherwise, looking for them will not be immediate.

> Since
>
> 1) VirtQueueElement's member looks general

Not general enough :).

> 2) help to reduce the tricky codes like vhost_svq_empty_elem() and
> vhost_svq_empty_elem().
>

I'm ok to change to whatever other method, but to allocate them
individually seems worse to me. Both performance wise and because
error paths are more complicated. Maybe it would be less tricky if I
try to move the use of them less "by value" and more "as pointers"?

Thanks!

> Thanks
>
>
> >
> >       /* Last descriptor of the chain */
> >       uint32_t last_chain_id;
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index c5e49e51c5..492bb12b5f 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -237,7 +237,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
> >    */
> >   static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> >                             size_t out_num, const struct iovec *in_sg,
> > -                          size_t in_num, VirtQueueElement *elem)
> > +                          size_t in_num, void *opaque)
> >   {
> >       SVQElement *svq_elem;
> >       unsigned qemu_head;
> > @@ -245,13 +245,12 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> >       bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
> >                                     &qemu_head);
> >       if (unlikely(!ok)) {
> > -        g_free(elem);
> >           return false;
> >       }
> >
> >       n = out_num + in_num;
> >       svq_elem = &svq->ring_id_maps[qemu_head];
> > -    svq_elem->elem = elem;
> > +    svq_elem->opaque = opaque;
> >       svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
> >       return true;
> >   }
> > @@ -277,6 +276,8 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
> >                               elem->in_num, elem);
> >       if (ok) {
> >           vhost_svq_kick(svq);
> > +    } else {
> > +        g_free(elem);
> >       }
> >
> >       return ok;
> > @@ -392,7 +393,7 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
> >
> >   static bool vhost_svq_is_empty_elem(SVQElement elem)
> >   {
> > -    return elem.elem == NULL;
> > +    return elem.opaque == NULL;
> >   }
> >
> >   static SVQElement vhost_svq_empty_elem(void)
> > @@ -483,7 +484,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                   break;
> >               }
> >
> > -            elem = g_steal_pointer(&svq_elem.elem);
> > +            elem = g_steal_pointer(&svq_elem.opaque);
> >               virtqueue_fill(vq, elem, len, i++);
> >           }
> >
> > @@ -651,7 +652,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >
> >       for (unsigned i = 0; i < svq->vring.num; ++i) {
> >           g_autofree VirtQueueElement *elem = NULL;
> > -        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
> > +        elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
> >           if (elem) {
> >               virtqueue_detach_element(svq->vq, elem, 0);
> >           }
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 14/23] vhost: add vhost_svq_poll
  2022-07-11  9:19   ` Jason Wang
@ 2022-07-11 17:52     ` Eugenio Perez Martin
  0 siblings, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-11 17:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Mon, Jul 11, 2022 at 11:19 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> > It allows the Shadow Control VirtQueue to wait the device to use the commands
> > that restore the net device state after a live migration.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  1 +
> >   hw/virtio/vhost-shadow-virtqueue.c | 54 ++++++++++++++++++++++++++++--
> >   2 files changed, 52 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index d01d2370db..c8668fbdd6 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -100,6 +100,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
> >                            const VirtQueueElement *elem, uint32_t len);
> >   int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> >                        size_t out_num, size_t in_num, void *opaque);
> > +ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq);
> >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >   void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
> >   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index bd9e34b413..ed7f1d0bc9 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -10,6 +10,8 @@
> >   #include "qemu/osdep.h"
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >
> > +#include <glib/gpoll.h>
> > +
> >   #include "qemu/error-report.h"
> >   #include "qapi/error.h"
> >   #include "qemu/main-loop.h"
> > @@ -490,10 +492,11 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
> >       }
> >   }
> >
> > -static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > -                            bool check_for_avail_queue)
> > +static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
> > +                              bool check_for_avail_queue)
> >   {
> >       VirtQueue *vq = svq->vq;
> > +    size_t ret = 0;
> >
> >       /* Forward as many used buffers as possible. */
> >       do {
> > @@ -510,7 +513,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                            "More than %u used buffers obtained in a %u size SVQ",
> >                            i, svq->vring.num);
> >                   virtqueue_flush(vq, svq->vring.num);
> > -                return;
> > +                return ret;
> >               }
> >
> >               svq_elem = vhost_svq_get_buf(svq, &len);
> > @@ -520,6 +523,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >
> >               elem = g_steal_pointer(&svq_elem.opaque);
> >               virtqueue_fill(vq, elem, len, i++);
> > +            ret++;
> >           }
> >
> >           virtqueue_flush(vq, i);
> > @@ -533,6 +537,50 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >               vhost_handle_guest_kick(svq);
> >           }
> >       } while (!vhost_svq_enable_notification(svq));
> > +
> > +    return ret;
> > +}
> > +
> > +/**
> > + * Poll the SVQ for device used buffers.
> > + *
> > + * This function race with main event loop SVQ polling, so extra
> > + * synchronization is needed.
> > + *
> > + * Return the number of descriptors read from the device.
> > + */
> > +ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq)
> > +{
> > +    int fd = event_notifier_get_fd(&svq->hdev_call);
> > +    GPollFD poll_fd = {
> > +        .fd = fd,
> > +        .events = G_IO_IN,
> > +    };
> > +    assert(fd >= 0);
> > +    int r = g_poll(&poll_fd, 1, -1);
>
>
> Any reason we can't simply (busy) polling the used ring here? It might
> help to reduce the latency (and it is what kernel driver uses).
>

Yes, I'll change to a busy polling. I forgot to change it.

Thanks!

> Thanks
>
>
> > +
> > +    if (unlikely(r < 0)) {
> > +        error_report("Cannot poll device call fd "G_POLLFD_FORMAT": (%d) %s",
> > +                     poll_fd.fd, errno, g_strerror(errno));
> > +        return -errno;
> > +    }
> > +
> > +    if (r == 0) {
> > +        return 0;
> > +    }
> > +
> > +    if (unlikely(poll_fd.revents & ~(G_IO_IN))) {
> > +        error_report(
> > +            "Error polling device call fd "G_POLLFD_FORMAT": revents=%d",
> > +            poll_fd.fd, poll_fd.revents);
> > +        return -1;
> > +    }
> > +
> > +    /*
> > +     * Max return value of vhost_svq_flush is (uint16_t)-1, so it's safe to
> > +     * convert to ssize_t.
> > +     */
> > +    return vhost_svq_flush(svq, false);
> >   }
> >
> >   /**
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 19/23] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  2022-07-06 18:40 ` [RFC PATCH v9 19/23] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
@ 2022-07-12  4:11   ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-12  4:11 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:40, Eugenio Pérez 写道:
> To know the device features is needed for CVQ SVQ, so SVQ knows if it
> can handle all commands or not. Extract from
> vhost_vdpa_get_max_queue_pairs so we can reuse it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   net/vhost-vdpa.c | 30 ++++++++++++++++++++----------
>   1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index df1e69ee72..b0158f625e 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -219,20 +219,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       return nc;
>   }
>   
> -static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
> +static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
> +{
> +    int ret = ioctl(fd, VHOST_GET_FEATURES, features);
> +    if (ret) {
> +        error_setg_errno(errp, errno,
> +                         "Fail to query features from vhost-vDPA device");
> +    }
> +    return ret;
> +}
> +
> +static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
> +                                          int *has_cvq, Error **errp)
>   {
>       unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
>       g_autofree struct vhost_vdpa_config *config = NULL;
>       __virtio16 *max_queue_pairs;
> -    uint64_t features;
>       int ret;
>   
> -    ret = ioctl(fd, VHOST_GET_FEATURES, &features);
> -    if (ret) {
> -        error_setg(errp, "Fail to query features from vhost-vDPA device");
> -        return ret;
> -    }
> -
>       if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
>           *has_cvq = 1;
>       } else {
> @@ -262,10 +266,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>                           NetClientState *peer, Error **errp)
>   {
>       const NetdevVhostVDPAOptions *opts;
> +    uint64_t features;
>       int vdpa_device_fd;
>       g_autofree NetClientState **ncs = NULL;
>       NetClientState *nc;
> -    int queue_pairs, i, has_cvq = 0;
> +    int queue_pairs, r, i, has_cvq = 0;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>       opts = &netdev->u.vhost_vdpa;
> @@ -279,7 +284,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           return -errno;
>       }
>   
> -    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
> +    r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
> +    if (r) {
> +        return r;
> +    }
> +
> +    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
>                                                    &has_cvq, errp);
>       if (queue_pairs < 0) {
>           qemu_close(vdpa_device_fd);



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue
  2022-07-06 18:40 ` [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue Eugenio Pérez
@ 2022-07-12  7:17   ` Jason Wang
  2022-07-12  9:47     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:17 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:40, Eugenio Pérez 写道:
> Introduce the control virtqueue support for vDPA shadow virtqueue. This
> is needed for advanced networking features like multiqueue.
>
> Virtio-net control VQ will copy the descriptors to qemu's VA, so we
> avoid TOCTOU with the guest's or device's memory every time there is a
> device model change.


Not sure this is a must since we currently do cvq passthrough. So we 
might already "suffer" from this.


> When address space isolation is implemented, this
> will allow, CVQ to only have access to control messages too.
>
> To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
> implemented.  If virtio-net driver changes MAC the virtio-net device
> model will be updated with the new one.
>
> Others cvq commands could be added here straightforwardly but they have
> been not tested.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-vdpa.h |   3 +
>   hw/virtio/vhost-vdpa.c         |   5 +-
>   net/vhost-vdpa.c               | 373 +++++++++++++++++++++++++++++++++
>   3 files changed, 379 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 7214eb47dc..1111d85643 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -15,6 +15,7 @@
>   #include <gmodule.h>
>   
>   #include "hw/virtio/vhost-iova-tree.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -35,6 +36,8 @@ typedef struct vhost_vdpa {
>       /* IOVA mapping used by the Shadow Virtqueue */
>       VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
> +    const VhostShadowVirtqueueOps *shadow_vq_ops;
> +    void *shadow_vq_ops_opaque;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 613c3483b0..94bda07b4d 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -417,9 +417,10 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>   
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree, NULL,
> -                                                            NULL);
> +        g_autoptr(VhostShadowVirtqueue) svq = NULL;


I don't see the reason of this assignment consider it will just be 
initialized in the following line.


>   
> +        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
> +                            v->shadow_vq_ops_opaque);
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);
>               return -1;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index b0158f625e..e415cc8de5 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -11,11 +11,15 @@
>   
>   #include "qemu/osdep.h"
>   #include "clients.h"
> +#include "hw/virtio/virtio-net.h"
>   #include "net/vhost_net.h"
>   #include "net/vhost-vdpa.h"
>   #include "hw/virtio/vhost-vdpa.h"
> +#include "qemu/buffer.h"
>   #include "qemu/config-file.h"
>   #include "qemu/error-report.h"
> +#include "qemu/log.h"
> +#include "qemu/memalign.h"
>   #include "qemu/option.h"
>   #include "qapi/error.h"
>   #include <linux/vhost.h>
> @@ -25,6 +29,26 @@
>   #include "monitor/monitor.h"
>   #include "hw/virtio/vhost.h"
>   
> +typedef struct CVQElement {
> +    /* Device's in and out buffer */
> +    void *in_buf, *out_buf;
> +
> +    /* Optional guest element from where this cvqelement was created */


Should be "cvq element".


> +    VirtQueueElement *guest_elem;
> +
> +    /* Control header sent by the guest. */
> +    struct virtio_net_ctrl_hdr ctrl;
> +
> +    /* vhost-vdpa device, for cleanup reasons */
> +    struct vhost_vdpa *vdpa;
> +
> +    /* Length of out data */
> +    size_t out_len;
> +
> +    /* Copy of the out data sent by the guest excluding ctrl. */
> +    uint8_t out_data[];
> +} CVQElement;
> +
>   /* Todo:need to add the multiqueue support here */
>   typedef struct VhostVDPAState {
>       NetClientState nc;
> @@ -187,6 +211,351 @@ static NetClientInfo net_vhost_vdpa_info = {
>           .check_peer_type = vhost_vdpa_check_peer_type,
>   };
>   
> +/**
> + * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
> + *
> + * @svq: Shadow VirtQueue
> + * @iova: SVQ IO Virtual address of descriptor
> + * @iov: Optional iovec to store device writable buffer
> + * @iov_cnt: iov length
> + * @buf_len: Length written by the device
> + *
> + * TODO: Use me! and adapt to net/vhost-vdpa format
> + * Print error message in case of error
> + */
> +static void vhost_vdpa_cvq_unmap_buf(CVQElement *elem, void *addr)
> +{
> +    struct vhost_vdpa *v = elem->vdpa;
> +    VhostIOVATree *tree = v->iova_tree;
> +    DMAMap needle = {
> +        /*
> +         * No need to specify size or to look for more translations since
> +         * this contiguous chunk was allocated by us.
> +         */
> +        .translated_addr = (hwaddr)(uintptr_t)addr,
> +    };
> +    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
> +    int r;
> +
> +    if (unlikely(!map)) {
> +        error_report("Cannot locate expected map");
> +        goto err;
> +    }
> +
> +    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
> +    if (unlikely(r != 0)) {
> +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> +    }
> +
> +    vhost_iova_tree_remove(tree, map);
> +
> +err:
> +    qemu_vfree(addr);
> +}
> +
> +static void vhost_vdpa_cvq_delete_elem(CVQElement *elem)
> +{
> +    if (elem->out_buf) {
> +        vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->out_buf));
> +    }
> +
> +    if (elem->in_buf) {
> +        vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->in_buf));
> +    }
> +
> +    /* Guest element must have been returned to the guest or free otherway */
> +    assert(!elem->guest_elem);
> +
> +    g_free(elem);
> +}
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(CVQElement, vhost_vdpa_cvq_delete_elem);
> +
> +static int vhost_vdpa_net_cvq_svq_inject(VhostShadowVirtqueue *svq,
> +                                         CVQElement *cvq_elem,
> +                                         size_t out_len)
> +{
> +    const struct iovec iov[] = {
> +        {
> +            .iov_base = cvq_elem->out_buf,
> +            .iov_len = out_len,
> +        },{
> +            .iov_base = cvq_elem->in_buf,
> +            .iov_len = sizeof(virtio_net_ctrl_ack),
> +        }
> +    };
> +
> +    return vhost_svq_inject(svq, iov, 1, 1, cvq_elem);
> +}
> +
> +static void *vhost_vdpa_cvq_alloc_buf(struct vhost_vdpa *v,
> +                                      const uint8_t *out_data, size_t data_len,
> +                                      bool write)
> +{
> +    DMAMap map = {};
> +    size_t buf_len = ROUND_UP(data_len, qemu_real_host_page_size());
> +    void *buf = qemu_memalign(qemu_real_host_page_size(), buf_len);
> +    int r;
> +
> +    if (!write) {
> +        memcpy(buf, out_data, data_len);
> +        memset(buf + data_len, 0, buf_len - data_len);
> +    } else {
> +        memset(buf, 0, data_len);
> +    }
> +
> +    map.translated_addr = (hwaddr)(uintptr_t)buf;
> +    map.size = buf_len - 1;
> +    map.perm = write ? IOMMU_RW : IOMMU_RO,
> +    r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
> +    if (unlikely(r != IOVA_OK)) {
> +        error_report("Cannot map injected element");
> +        goto err;
> +    }
> +
> +    r = vhost_vdpa_dma_map(v, map.iova, buf_len, buf, !write);
> +    /* TODO: Handle error */
> +    assert(r == 0);
> +
> +    return buf;
> +
> +err:
> +    qemu_vfree(buf);
> +    return NULL;
> +}
> +
> +/**
> + * Allocate an element suitable to be injected
> + *
> + * @iov: The iovec
> + * @out_num: Number of out elements, placed first in iov
> + * @in_num: Number of in elements, placed after out ones
> + * @elem: Optional guest element from where this one was created
> + *
> + * TODO: Do we need a sg for out_num? I think not
> + */
> +static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
> +                                             struct virtio_net_ctrl_hdr ctrl,
> +                                             const struct iovec *out_sg,
> +                                             size_t out_num, size_t out_size,
> +                                             VirtQueueElement *elem)
> +{
> +    g_autoptr(CVQElement) cvq_elem = g_malloc(sizeof(CVQElement) + out_size);
> +    uint8_t *out_cursor = cvq_elem->out_data;
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    /* Start with a clean base */
> +    memset(cvq_elem, 0, sizeof(*cvq_elem));
> +    cvq_elem->vdpa = &s->vhost_vdpa;
> +
> +    /*
> +     * Linearize element. If guest had a descriptor chain, we expose the device
> +     * a single buffer.
> +     */
> +    cvq_elem->out_len = out_size;
> +    memcpy(out_cursor, &ctrl, sizeof(ctrl));
> +    out_size -= sizeof(ctrl);
> +    out_cursor += sizeof(ctrl);
> +    iov_to_buf(out_sg, out_num, 0, out_cursor, out_size);
> +
> +    cvq_elem->out_buf = vhost_vdpa_cvq_alloc_buf(v, cvq_elem->out_data,
> +                                                 out_size, false);
> +    assert(cvq_elem->out_buf);
> +    cvq_elem->in_buf = vhost_vdpa_cvq_alloc_buf(v, NULL,
> +                                                sizeof(virtio_net_ctrl_ack),
> +                                                true);
> +    assert(cvq_elem->in_buf);
> +
> +    cvq_elem->guest_elem = elem;
> +    cvq_elem->ctrl = ctrl;
> +    return g_steal_pointer(&cvq_elem);
> +}
> +
> +/**
> + * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
> + * iov_size.
> + */
> +static uint64_t vhost_vdpa_net_iov_len(const struct iovec *iov,
> +                                       unsigned int iov_cnt, size_t max)
> +{
> +    uint64_t len = 0;
> +
> +    for (unsigned int i = 0; len < max && i < iov_cnt; i++) {
> +        bool overflow = uadd64_overflow(iov[i].iov_len, len, &len);
> +        if (unlikely(overflow)) {
> +            return UINT64_MAX;
> +        }


Let's use iov_size() here, and if you think we need to fix the overflow 
issue, we need fix it there then other user can benefit from that.


> +    }
> +
> +    return len;
> +}
> +
> +static CVQElement *vhost_vdpa_net_cvq_copy_elem(VhostVDPAState *s,
> +                                                VirtQueueElement *elem)
> +{
> +    struct virtio_net_ctrl_hdr ctrl;
> +    g_autofree struct iovec *iov = NULL;
> +    struct iovec *iov2;
> +    unsigned int out_num = elem->out_num;
> +    size_t n, out_size = 0;
> +
> +    /* TODO: in buffer MUST have only a single entry with a char? size */


I couldn't understand the question but we should not assume the layout 
of the control command.


> +    if (unlikely(vhost_vdpa_net_iov_len(elem->in_sg, elem->in_num,
> +                                        sizeof(virtio_net_ctrl_ack))
> +                                              < sizeof(virtio_net_ctrl_ack))) {
> +        return NULL;
> +    }


We don't have such check in virtio-net.c, anything make svq different?


> +
> +    n = iov_to_buf(elem->out_sg, out_num, 0, &ctrl, sizeof(ctrl));
> +    if (unlikely(n != sizeof(ctrl))) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid out size\n", __func__);
> +        return NULL;
> +    }
> +
> +    iov = iov2 = g_memdup2(elem->out_sg, sizeof(struct iovec) * elem->out_num);


Let's use iov_copy() here.

And I don't see how iov is used after this.


> +    iov_discard_front(&iov2, &out_num, sizeof(ctrl));
> +    switch (ctrl.class) {
> +    case VIRTIO_NET_CTRL_MAC:
> +        switch (ctrl.cmd) {
> +        case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> +            if (likely(vhost_vdpa_net_iov_len(iov2, out_num, 6))) {
> +                out_size += 6;
> +                break;
> +            }
> +
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac size\n", __func__);
> +            return NULL;


Note that we need to support VIRTIO_NET_CTRL_ANNOUNCE_ACK in order to 
support live migration.

But a more fundamental question, what's the value of having this kind of 
whitelist here?

Is it more simpler just have a sane limit of the buffer and simply 
forward everything to the vhost-vDPA?

And if we do this, instead of validating the inputs one by one we can 
simply doing validation only on VIRTIO_NET_CTRL_MAC_TABLE_SET which 
accepts variable length and simply fallback to alluni/allmulti if it 
contains too much entries.


> +        default:
> +            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n",
> +                          __func__, ctrl.cmd);
> +            return NULL;
> +        };
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n",
> +                      __func__, ctrl.class);
> +        return NULL;
> +    };
> +
> +    return vhost_vdpa_cvq_alloc_elem(s, ctrl, iov2, out_num,
> +                                     sizeof(ctrl) + out_size, elem);
> +}
> +
> +/**
> + * Validate and copy control virtqueue commands.
> + *
> + * Following QEMU guidelines, we offer a copy of the buffers to the device to
> + * prevent TOCTOU bugs.  This functions check that the buffers length are
> + * expected too.
> + */
> +static bool vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> +                                             VirtQueueElement *guest_elem,
> +                                             void *opaque)
> +{
> +    VhostVDPAState *s = opaque;
> +    g_autoptr(CVQElement) cvq_elem = NULL;
> +    g_autofree VirtQueueElement *elem = guest_elem;
> +    size_t out_size, in_len;
> +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> +    int r;
> +
> +    cvq_elem = vhost_vdpa_net_cvq_copy_elem(s, elem);
> +    if (unlikely(!cvq_elem)) {
> +        goto err;
> +    }
> +
> +    /* out size validated at vhost_vdpa_net_cvq_copy_elem */
> +    out_size = iov_size(elem->out_sg, elem->out_num);
> +    r = vhost_vdpa_net_cvq_svq_inject(svq, cvq_elem, out_size);
> +    if (unlikely(r != 0)) {
> +        goto err;
> +    }
> +
> +    cvq_elem->guest_elem = g_steal_pointer(&elem);
> +    /* Now CVQ elem belongs to SVQ */
> +    g_steal_pointer(&cvq_elem);
> +    return true;
> +
> +err:
> +    in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
> +                          sizeof(status));
> +    vhost_svq_push_elem(svq, elem, in_len);
> +    return true;
> +}
> +
> +static VirtQueueElement *vhost_vdpa_net_handle_ctrl_detach(void *elem_opaque)
> +{
> +    g_autoptr(CVQElement) cvq_elem = elem_opaque;
> +    return g_steal_pointer(&cvq_elem->guest_elem);
> +}
> +
> +static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
> +                                            void *vq_elem_opaque,
> +                                            uint32_t dev_written)
> +{
> +    g_autoptr(CVQElement) cvq_elem = vq_elem_opaque;
> +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> +    const struct iovec out = {
> +        .iov_base = cvq_elem->out_data,
> +        .iov_len = cvq_elem->out_len,
> +    };
> +    const DMAMap status_map_needle = {
> +        .translated_addr = (hwaddr)(uintptr_t)cvq_elem->in_buf,
> +        .size = sizeof(status),
> +    };
> +    const DMAMap *in_map;
> +    const struct iovec in = {
> +        .iov_base = &status,
> +        .iov_len = sizeof(status),
> +    };
> +    g_autofree VirtQueueElement *guest_elem = NULL;
> +
> +    if (unlikely(dev_written < sizeof(status))) {
> +        error_report("Insufficient written data (%llu)",
> +                     (long long unsigned)dev_written);
> +        goto out;
> +    }
> +
> +    in_map = vhost_iova_tree_find_iova(svq->iova_tree, &status_map_needle);
> +    if (unlikely(!in_map)) {
> +        error_report("Cannot locate out mapping");
> +        goto out;
> +    }
> +
> +    switch (cvq_elem->ctrl.class) {
> +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> +        break;
> +    default:
> +        error_report("Unexpected ctrl class %u", cvq_elem->ctrl.class);
> +        goto out;
> +    };
> +
> +    memcpy(&status, cvq_elem->in_buf, sizeof(status));
> +    if (status != VIRTIO_NET_OK) {
> +        goto out;
> +    }
> +
> +    status = VIRTIO_NET_ERR;
> +    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1);


I wonder if this is the best choice. It looks to me it might be better 
to extend the virtio_net_handle_ctrl_iov() logic:

virtio_net_handle_ctrl_iov() {
     if (svq enabled) {
          host_elem = iov_copy(guest_elem);
          vhost_svq_add(host_elem);
          vhost_svq_poll(host_elem);
     }
     // usersapce ctrl vq logic
}


This can help to avoid coupling too much logic in cvq (like the 
avail,used and detach ops).

Thanks


> +    if (status != VIRTIO_NET_OK) {
> +        error_report("Bad CVQ processing in model");
> +        goto out;
> +    }
> +
> +out:
> +    guest_elem = g_steal_pointer(&cvq_elem->guest_elem);
> +    if (guest_elem) {
> +        iov_from_buf(guest_elem->in_sg, guest_elem->in_num, 0, &status,
> +                     sizeof(status));
> +        vhost_svq_push_elem(svq, guest_elem, sizeof(status));
> +    }
> +}
> +
> +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> +    .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> +    .used_handler = vhost_vdpa_net_handle_ctrl_used,
> +    .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
> +};
> +
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                              const char *device,
>                                              const char *name,
> @@ -211,6 +580,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
> +    if (!is_datapath) {
> +        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> +        s->vhost_vdpa.shadow_vq_ops_opaque = s;
> +    }
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {
>           qemu_del_net_client(nc);



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq
  2022-07-06 18:40 ` [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
@ 2022-07-12  7:26   ` Jason Wang
  2022-07-17 10:30     ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:26 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Markus Armbruster, Harpreet Singh Anand,
	Eric Blake, Laurent Vivier, Parav Pandit, Cornelia Huck,
	Paolo Bonzini, Gautam Dawar, Eli Cohen, Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/7 02:40, Eugenio Pérez 写道:
> As a first step we only enable CVQ first than others. Future patches add
> state restore.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index e415cc8de5..77d013833f 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -370,6 +370,24 @@ static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
>       return g_steal_pointer(&cvq_elem);
>   }
>   
> +static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
> +                                        void *opaque)
> +{
> +    struct vhost_vring_state state = {
> +        .index = virtio_get_queue_index(svq->vq),
> +        .num = 1,
> +    };
> +    VhostVDPAState *s = opaque;
> +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> +    struct vhost_vdpa *v = dev->opaque;
> +    int r;
> +
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, &state);
> +    return r < 0 ? -errno : r;
> +}
> +
>   /**
>    * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
>    * iov_size.
> @@ -554,6 +572,7 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
>       .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
>       .used_handler = vhost_vdpa_net_handle_ctrl_used,
>       .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
> +    .start = vhost_vdpa_start_control_svq,
>   };


I wonder if vhost_net_start() is something better than here. It knows 
all virtqueues and it can do whatever it wants, we just need to make 
shadow virtqueue visible there?

Thanks


>   
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq
  2022-07-08 10:10     ` Eugenio Perez Martin
@ 2022-07-12  7:42       ` Jason Wang
  2022-07-12  9:42         ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:42 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/8 18:10, Eugenio Perez Martin 写道:
> On Fri, Jul 8, 2022 at 11:12 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>>> The used idx used to match with this, but it will not match from the
>>> moment we introduce svq_inject.
>> It might be better to explain what "svq_inject" means here.
>>
> Good point, I'll change for the next version.
>
>>> Rewind all the descriptors not used by
>>> vdpa device and get the vq state properly.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>   include/hw/virtio/virtio.h | 1 +
>>>   hw/virtio/vhost-vdpa.c     | 7 +++----
>>>   hw/virtio/virtio.c         | 5 +++++
>>>   3 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>> index db1c0ddf6b..4b51ab9d06 100644
>>> --- a/include/hw/virtio/virtio.h
>>> +++ b/include/hw/virtio/virtio.h
>>> @@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
>>>   hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
>>>   hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
>>>   unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
>>> +unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
>>>   void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
>>>                                        unsigned int idx);
>>>   void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 2ee8009594..de76128030 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>>>                                          struct vhost_vring_state *ring)
>>>   {
>>>       struct vhost_vdpa *v = dev->opaque;
>>> -    int vdpa_idx = ring->index - dev->vq_index;
>>>       int ret;
>>>
>>>       if (v->shadow_vqs_enabled) {
>>> -        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> -
>>> +        const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>>>           /*
>>>            * Setting base as last used idx, so destination will see as available
>>>            * all the entries that the device did not use, including the in-flight
>>> @@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>>>            * TODO: This is ok for networking, but other kinds of devices might
>>>            * have problems with these retransmissions.
>>>            */
>>> -        ring->num = svq->last_used_idx;
>>> +        ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index) -
>>> +                    virtio_queue_get_in_use(vq);
>> I think we need to change the above comment as well otherwise readers
>> might get confused.
>>
> Re-thinking this: This part has always been buggy, so this is actually
> a fix. I'll tag it for next versions or, even better, send it
> separately.
>
> But the comment still holds: We cannot use the device's used idx since
> it could not match with the guest visible one. This is actually easy
> to trigger if we migrate a guest many times with traffic.


I may miss someting, maybe you can give me an example on this (I assume 
the size of the svq is the same as what guest can see).


>
> Maybe it's cleaner to export directly used_idx from VirtQueue? Extra
> care is needed with packed vq, but SVQ still does not support it. I
> didn't want to duplicate that logic in virtio ring handling.


So two more questions here:

1) what's the reason of rewinding via virtio_queue_get_in_use()?

2) it looks like we could end up with underflow with the above math?

Thanks


>
>> I wonder why we need to bother at this time. Is this an issue for
>> networking devices?
> Every device has this issue when migrating as soon as the device's
> used index is not the same as the guest's one.
>
>> And for block device, it's not sufficient since
>> there's no guarantee that the descriptor is handled in order?
>>
> Right, that part still hold here.
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement
  2022-07-11  8:27     ` Eugenio Perez Martin
@ 2022-07-12  7:43       ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/11 16:27, Eugenio Perez Martin 写道:
> On Mon, Jul 11, 2022 at 10:00 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/7/7 02:39, Eugenio Pérez 写道:
>>> VirtQueueElement comes from the guest, but we're heading SVQ to be able
>>> to inject element without the guest's knowledge.
>>>
>>> To do so, make this accept sg buffers directly, instead of using
>>> VirtQueueElement.
>>>
>>> Add vhost_svq_add_element to maintain element convenience
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++---------
>>>    1 file changed, 27 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 2fc5789b73..46d3c1d74f 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -172,30 +172,32 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>>>    }
>>>
>>>    static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>>> -                                VirtQueueElement *elem, unsigned *head)
>>> +                                const struct iovec *out_sg, size_t out_num,
>>> +                                const struct iovec *in_sg, size_t in_num,
>>> +                                unsigned *head)
>>>    {
>>>        unsigned avail_idx;
>>>        vring_avail_t *avail = svq->vring.avail;
>>>        bool ok;
>>> -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
>>> +    g_autofree hwaddr *sgs = NULL;
>>
>> Is this change a must for this patch? (looks not related to the
>> decoupling anyhow)
>>
> Right, the delay on the variable assignment is an artifact I missed in
> the cleaning. I can revert for the next version if any.
>
> With that reverted, can I add the acked-by tag from you?


Yes.

Thanks


>
> Thanks!
>
>> Other looks good.
>>
>> Thanks
>>
>>
>>>        *head = svq->free_head;
>>>
>>>        /* We need some descriptors here */
>>> -    if (unlikely(!elem->out_num && !elem->in_num)) {
>>> +    if (unlikely(!out_num && !in_num)) {
>>>            qemu_log_mask(LOG_GUEST_ERROR,
>>>                          "Guest provided element with no descriptors");
>>>            return false;
>>>        }
>>>
>>> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
>>> -                                     elem->in_num > 0, false);
>>> +    sgs = g_new(hwaddr, MAX(out_num, in_num));
>>> +    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
>>> +                                     false);
>>>        if (unlikely(!ok)) {
>>>            return false;
>>>        }
>>>
>>> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
>>> -                                     true);
>>> +    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
>>>        if (unlikely(!ok)) {
>>>            /* TODO unwind out_sg */
>>>            return false;
>>> @@ -223,10 +225,13 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>>>     * takes ownership of the element: In case of failure, it is free and the SVQ
>>>     * is considered broken.
>>>     */
>>> -static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>>> +static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>>> +                          size_t out_num, const struct iovec *in_sg,
>>> +                          size_t in_num, VirtQueueElement *elem)
>>>    {
>>>        unsigned qemu_head;
>>> -    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
>>> +    bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
>>> +                                  &qemu_head);
>>>        if (unlikely(!ok)) {
>>>            g_free(elem);
>>>            return false;
>>> @@ -250,6 +255,18 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>>>        event_notifier_set(&svq->hdev_kick);
>>>    }
>>>
>>> +static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
>>> +                                  VirtQueueElement *elem)
>>> +{
>>> +    bool ok = vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
>>> +                            elem->in_num, elem);
>>> +    if (ok) {
>>> +        vhost_svq_kick(svq);
>>> +    }
>>> +
>>> +    return ok;
>>> +}
>>> +
>>>    /**
>>>     * Forward available buffers.
>>>     *
>>> @@ -302,12 +319,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>>>                    return;
>>>                }
>>>
>>> -            ok = vhost_svq_add(svq, elem);
>>> +            ok = vhost_svq_add_element(svq, g_steal_pointer(&elem));
>>>                if (unlikely(!ok)) {
>>>                    /* VQ is broken, just return and ignore any other kicks */
>>>                    return;
>>>                }
>>> -            vhost_svq_kick(svq);
>>>            }
>>>
>>>            virtio_queue_set_notification(svq->vq, true);



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 09/23] vhost: Add SVQElement
  2022-07-11  9:33     ` Eugenio Perez Martin
@ 2022-07-12  7:49       ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:49 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/11 17:33, Eugenio Perez Martin 写道:
> On Mon, Jul 11, 2022 at 11:00 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/7/7 02:39, Eugenio Pérez 写道:
>>> This will allow SVQ to add metadata to the different queue elements. To
>>> simplify changes, only store actual element at this patch.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |  8 ++++--
>>>    hw/virtio/vhost-shadow-virtqueue.c | 41 ++++++++++++++++++++----------
>>>    2 files changed, 33 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 0fbdd69153..e434dc63b0 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -15,6 +15,10 @@
>>>    #include "standard-headers/linux/vhost_types.h"
>>>    #include "hw/virtio/vhost-iova-tree.h"
>>>
>>> +typedef struct SVQElement {
>>> +    VirtQueueElement *elem;
>>> +} SVQElement;
>>> +
>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>    typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
>>>                                        void *opaque);
>>> @@ -55,8 +59,8 @@ typedef struct VhostShadowVirtqueue {
>>>        /* IOVA mapping */
>>>        VhostIOVATree *iova_tree;
>>>
>>> -    /* Map for use the guest's descriptors */
>>> -    VirtQueueElement **ring_id_maps;
>>> +    /* Each element context */
>>> +    SVQElement *ring_id_maps;
>>>
>>>        /* Next VirtQueue element that guest made available */
>>>        VirtQueueElement *next_guest_avail_elem;
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 46d3c1d74f..913bca8769 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -237,7 +237,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>>>            return false;
>>>        }
>>>
>>> -    svq->ring_id_maps[qemu_head] = elem;
>>> +    svq->ring_id_maps[qemu_head].elem = elem;
>>>        return true;
>>>    }
>>>
>>> @@ -385,15 +385,25 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
>>>        return i;
>>>    }
>>>
>>> -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>>> -                                           uint32_t *len)
>>> +static bool vhost_svq_is_empty_elem(SVQElement elem)
>>> +{
>>> +    return elem.elem == NULL;
>>> +}
>>> +
>>> +static SVQElement vhost_svq_empty_elem(void)
>>> +{
>>> +    return (SVQElement){};
>>> +}
>>
>> I wonder what's the benefit of using this instead of passing pointer to
>> SVQElement and using memset().
>>
> It was a more direct translation of the previous workflow but we can
> use memset here for sure.
>
>>> +
>>> +static SVQElement vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
>>>    {
>>>        const vring_used_t *used = svq->vring.used;
>>>        vring_used_elem_t used_elem;
>>> +    SVQElement svq_elem = vhost_svq_empty_elem();
>>>        uint16_t last_used, last_used_chain, num;
>>>
>>>        if (!vhost_svq_more_used(svq)) {
>>> -        return NULL;
>>> +        return svq_elem;
>>>        }
>>>
>>>        /* Only get used array entries after they have been exposed by dev */
>>> @@ -406,24 +416,25 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>>>        if (unlikely(used_elem.id >= svq->vring.num)) {
>>>            qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
>>>                          svq->vdev->name, used_elem.id);
>>> -        return NULL;
>>> +        return svq_elem;
>>>        }
>>>
>>> -    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
>>> +    svq_elem = svq->ring_id_maps[used_elem.id];
>>> +    svq->ring_id_maps[used_elem.id] = vhost_svq_empty_elem();
>>> +    if (unlikely(vhost_svq_is_empty_elem(svq_elem))) {
>>
>> Any reason we can't simply assign NULL to ring_id_maps[used_elem.id]?
>>
> It simply avoids allocating more memory, so error code paths are
> simplified, etc. In the kernel, vring_desc_state_split, desc_extra and
> similar are not an array of pointers but an array of states, so we
> apply the same here. Returning them by value it's not so common
> though.


Yes, but kernel validate the used id through a pointer to data (token). 
This is the elem we used before this patch.

The code here looks more like a workaround of adding indirection level 
for elem. We'd better try to avoid that.

Thanks


>
> But we can allocate a state per in-flight descriptor for sure.
>
> Thanks!
>
>
>> Thanks
>>
>>
>>>            qemu_log_mask(LOG_GUEST_ERROR,
>>>                "Device %s says index %u is used, but it was not available",
>>>                svq->vdev->name, used_elem.id);
>>> -        return NULL;
>>> +        return svq_elem;
>>>        }
>>>
>>> -    num = svq->ring_id_maps[used_elem.id]->in_num +
>>> -          svq->ring_id_maps[used_elem.id]->out_num;
>>> +    num = svq_elem.elem->in_num + svq_elem.elem->out_num;
>>>        last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
>>>        svq->desc_next[last_used_chain] = svq->free_head;
>>>        svq->free_head = used_elem.id;
>>>
>>>        *len = used_elem.len;
>>> -    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>>> +    return svq_elem;
>>>    }
>>>
>>>    /**
>>> @@ -454,6 +465,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>>            vhost_svq_disable_notification(svq);
>>>            while (true) {
>>>                uint32_t len;
>>> +            SVQElement svq_elem;
>>>                g_autofree VirtQueueElement *elem = NULL;
>>>
>>>                if (unlikely(i >= svq->vring.num)) {
>>> @@ -464,11 +476,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>>                    return;
>>>                }
>>>
>>> -            elem = vhost_svq_get_buf(svq, &len);
>>> -            if (!elem) {
>>> +            svq_elem = vhost_svq_get_buf(svq, &len);
>>> +            if (vhost_svq_is_empty_elem(svq_elem)) {
>>>                    break;
>>>                }
>>>
>>> +            elem = g_steal_pointer(&svq_elem.elem);
>>>                virtqueue_fill(vq, elem, len, i++);
>>>            }
>>>
>>> @@ -611,7 +624,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>>        memset(svq->vring.desc, 0, driver_size);
>>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
>>>        memset(svq->vring.used, 0, device_size);
>>> -    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
>>> +    svq->ring_id_maps = g_new0(SVQElement, svq->vring.num);
>>>        svq->desc_next = g_new0(uint16_t, svq->vring.num);
>>>        for (unsigned i = 0; i < svq->vring.num - 1; i++) {
>>>            svq->desc_next[i] = cpu_to_le16(i + 1);
>>> @@ -636,7 +649,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>
>>>        for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>            g_autofree VirtQueueElement *elem = NULL;
>>> -        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
>>>            if (elem) {
>>>                virtqueue_detach_element(svq->vq, elem, 0);
>>>            }



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement
  2022-07-11  9:56     ` Eugenio Perez Martin
@ 2022-07-12  7:53       ` Jason Wang
  2022-07-12  8:32         ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:53 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/11 17:56, Eugenio Perez Martin 写道:
> On Mon, Jul 11, 2022 at 11:05 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/7/7 02:39, Eugenio Pérez 写道:
>>> When qemu injects buffers to the vdpa device it will be used to maintain
>>> contextual data. If SVQ has no operation, it will be used to maintain
>>> the VirtQueueElement pointer.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
>>>    hw/virtio/vhost-shadow-virtqueue.c | 13 +++++++------
>>>    2 files changed, 9 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 0e434e9fd0..a811f90e01 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -16,7 +16,8 @@
>>>    #include "hw/virtio/vhost-iova-tree.h"
>>>
>>>    typedef struct SVQElement {
>>> -    VirtQueueElement *elem;
>>> +    /* Opaque data */
>>> +    void *opaque;
>>
>> So I wonder if we can simply:
>>
>> 1) introduce a opaque to VirtQueueElement
> (answered in other thread, pasting here for completion)
>
> It does not work for messages that are not generated by the guest. For
> example, the ones used to restore the device state at live migration
> destination.


For the ones that requires more metadata, we can store it in elem->opaque?


>
>> 2) store pointers to ring_id_maps
>>
> I think you mean to keep storing VirtQueueElement at ring_id_maps?


Yes and introduce a pointer to metadata in VirtQueueElement


> Otherwise, looking for them will not be immediate.
>
>> Since
>>
>> 1) VirtQueueElement's member looks general
> Not general enough :).
>
>> 2) help to reduce the tricky codes like vhost_svq_empty_elem() and
>> vhost_svq_empty_elem().
>>
> I'm ok to change to whatever other method, but to allocate them
> individually seems worse to me. Both performance wise and because
> error paths are more complicated. Maybe it would be less tricky if I
> try to move the use of them less "by value" and more "as pointers"?


Or let's having a dedicated arrays (like desc_state/desc_extra in 
kernel) instead of trying to reuse ring_id_maps.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>        /* Last descriptor of the chain */
>>>        uint32_t last_chain_id;
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index c5e49e51c5..492bb12b5f 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -237,7 +237,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
>>>     */
>>>    static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>>>                              size_t out_num, const struct iovec *in_sg,
>>> -                          size_t in_num, VirtQueueElement *elem)
>>> +                          size_t in_num, void *opaque)
>>>    {
>>>        SVQElement *svq_elem;
>>>        unsigned qemu_head;
>>> @@ -245,13 +245,12 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>>>        bool ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num,
>>>                                      &qemu_head);
>>>        if (unlikely(!ok)) {
>>> -        g_free(elem);
>>>            return false;
>>>        }
>>>
>>>        n = out_num + in_num;
>>>        svq_elem = &svq->ring_id_maps[qemu_head];
>>> -    svq_elem->elem = elem;
>>> +    svq_elem->opaque = opaque;
>>>        svq_elem->last_chain_id = vhost_svq_last_desc_of_chain(svq, n, qemu_head);
>>>        return true;
>>>    }
>>> @@ -277,6 +276,8 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
>>>                                elem->in_num, elem);
>>>        if (ok) {
>>>            vhost_svq_kick(svq);
>>> +    } else {
>>> +        g_free(elem);
>>>        }
>>>
>>>        return ok;
>>> @@ -392,7 +393,7 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
>>>
>>>    static bool vhost_svq_is_empty_elem(SVQElement elem)
>>>    {
>>> -    return elem.elem == NULL;
>>> +    return elem.opaque == NULL;
>>>    }
>>>
>>>    static SVQElement vhost_svq_empty_elem(void)
>>> @@ -483,7 +484,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>>                    break;
>>>                }
>>>
>>> -            elem = g_steal_pointer(&svq_elem.elem);
>>> +            elem = g_steal_pointer(&svq_elem.opaque);
>>>                virtqueue_fill(vq, elem, len, i++);
>>>            }
>>>
>>> @@ -651,7 +652,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>
>>>        for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>            g_autofree VirtQueueElement *elem = NULL;
>>> -        elem = g_steal_pointer(&svq->ring_id_maps[i].elem);
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i].opaque);
>>>            if (elem) {
>>>                virtqueue_detach_element(svq->vq, elem, 0);
>>>            }



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject
  2022-07-11  9:43     ` Eugenio Perez Martin
@ 2022-07-12  7:58       ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-12  7:58 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu


在 2022/7/11 17:43, Eugenio Perez Martin 写道:
> On Mon, Jul 11, 2022 at 11:14 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/7/7 02:39, Eugenio Pérez 写道:
>>> This allows qemu to inject buffers to the device.
>>
>> Not a native speaker but we probably need a better terminology than
>> inject here.
>>
>> Since the CVQ is totally under the control of the Qemu anyhow.
>>
> I'm totally fine to change terminology
>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>>>    hw/virtio/vhost-shadow-virtqueue.c | 34 ++++++++++++++++++++++++++++++
>>>    2 files changed, 36 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index a811f90e01..d01d2370db 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -98,6 +98,8 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
>>>
>>>    void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
>>>                             const VirtQueueElement *elem, uint32_t len);
>>> +int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
>>> +                     size_t out_num, size_t in_num, void *opaque);
>>>    void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>    void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
>>>    void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 492bb12b5f..bd9e34b413 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -283,6 +283,40 @@ static bool vhost_svq_add_element(VhostShadowVirtqueue *svq,
>>>        return ok;
>>>    }
>>>
>>> +/**
>>> + * Inject a chain of buffers to the device
>>> + *
>>> + * @svq: Shadow VirtQueue
>>> + * @iov: I/O vector
>>> + * @out_num: Number of front out descriptors
>>> + * @in_num: Number of last input descriptors
>>> + * @opaque: Contextual data to store in descriptor
>>> + *
>>> + * Return 0 on success, -ENOMEM if cannot inject
>>> + */
>>> +int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
>>> +                     size_t out_num, size_t in_num, void *opaque)
>>
>> If we manage to embed opaque into VirtqueueElement, we can simply use
>> vhost_svq_add() here.
>>
> That works fine as long as SVQ only forwards elements, but it needs to
> do more than that: We need to inject new elements without guest
> notice.
>
> How could we track elements that do not have corresponding
> VirtQueueElement, like the elements sent to restore the status at the
> LM destination?


Having a token for each VirtQueueElement will work? Or maybe I can ask 
differently, what kind of extra state that need to be tracked here?

(For virtio state it should be handled by shadow virtqueue core).

Thanks


>
> I'll try to make it clearer in the patch message.
>
> Thanks!
>
>> Thanks
>>
>>
>>> +{
>>> +    bool ok;
>>> +
>>> +    /*
>>> +     * All vhost_svq_inject calls are controlled by qemu so we won't hit this
>>> +     * assertions.
>>> +     */
>>> +    assert(out_num || in_num);
>>> +    assert(svq->ops);
>>> +
>>> +    if (unlikely(svq->next_guest_avail_elem)) {
>>> +        error_report("Injecting in a full queue");
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    ok = vhost_svq_add(svq, iov, out_num, iov + out_num, in_num, opaque);
>>> +    assert(ok);
>>> +    vhost_svq_kick(svq);
>>> +    return 0;
>>> +}
>>> +
>>>    /**
>>>     * Forward available buffers.
>>>     *



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement
  2022-07-12  7:53       ` Jason Wang
@ 2022-07-12  8:32         ` Eugenio Perez Martin
  2022-07-12  8:43           ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-12  8:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Tue, Jul 12, 2022 at 9:53 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/11 17:56, Eugenio Perez Martin 写道:
> > On Mon, Jul 11, 2022 at 11:05 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> >>> When qemu injects buffers to the vdpa device it will be used to maintain
> >>> contextual data. If SVQ has no operation, it will be used to maintain
> >>> the VirtQueueElement pointer.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
> >>>    hw/virtio/vhost-shadow-virtqueue.c | 13 +++++++------
> >>>    2 files changed, 9 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> >>> index 0e434e9fd0..a811f90e01 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> >>> @@ -16,7 +16,8 @@
> >>>    #include "hw/virtio/vhost-iova-tree.h"
> >>>
> >>>    typedef struct SVQElement {
> >>> -    VirtQueueElement *elem;
> >>> +    /* Opaque data */
> >>> +    void *opaque;
> >>
> >> So I wonder if we can simply:
> >>
> >> 1) introduce a opaque to VirtQueueElement
> > (answered in other thread, pasting here for completion)
> >
> > It does not work for messages that are not generated by the guest. For
> > example, the ones used to restore the device state at live migration
> > destination.
>
>
> For the ones that requires more metadata, we can store it in elem->opaque?
>

But there is no VirtQueueElem there. VirtQueueElem is allocated by
virtqueue_pop, but state restoring messages are not received by this
function. If we allocate an artificial one, a lot of members do not
make sense (like in_addr / out_addr), and we should never use them
with virtqueue_push / fill / flush and similar.

>
> >
> >> 2) store pointers to ring_id_maps
> >>
> > I think you mean to keep storing VirtQueueElement at ring_id_maps?
>
>
> Yes and introduce a pointer to metadata in VirtQueueElement
>
>
> > Otherwise, looking for them will not be immediate.
> >
> >> Since
> >>
> >> 1) VirtQueueElement's member looks general
> > Not general enough :).
> >
> >> 2) help to reduce the tricky codes like vhost_svq_empty_elem() and
> >> vhost_svq_empty_elem().
> >>
> > I'm ok to change to whatever other method, but to allocate them
> > individually seems worse to me. Both performance wise and because
> > error paths are more complicated. Maybe it would be less tricky if I
> > try to move the use of them less "by value" and more "as pointers"?
>
>
> Or let's having a dedicated arrays (like desc_state/desc_extra in
> kernel) instead of trying to reuse ring_id_maps.
>

Sure, it looks to me like:
* renaming ring_id_maps to desc_state/desc_extra/something similar,
since now it's used to store more state that only the guest mapping
* Rename "opaque" to "data"
* Forget the wrapper and assume data == NULL is an invalid head /
empty. To me they serve as a doc, but I guess it's fine to use them
directly. The kernel works that way anyway.

Does this look better? It's definitely closer to the kernel so I guess
it's an advantage.

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement
  2022-07-12  8:32         ` Eugenio Perez Martin
@ 2022-07-12  8:43           ` Jason Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Wang @ 2022-07-12  8:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Tue, Jul 12, 2022 at 4:33 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jul 12, 2022 at 9:53 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/7/11 17:56, Eugenio Perez Martin 写道:
> > > On Mon, Jul 11, 2022 at 11:05 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/7/7 02:39, Eugenio Pérez 写道:
> > >>> When qemu injects buffers to the vdpa device it will be used to maintain
> > >>> contextual data. If SVQ has no operation, it will be used to maintain
> > >>> the VirtQueueElement pointer.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-shadow-virtqueue.h |  3 ++-
> > >>>    hw/virtio/vhost-shadow-virtqueue.c | 13 +++++++------
> > >>>    2 files changed, 9 insertions(+), 7 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> index 0e434e9fd0..a811f90e01 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> @@ -16,7 +16,8 @@
> > >>>    #include "hw/virtio/vhost-iova-tree.h"
> > >>>
> > >>>    typedef struct SVQElement {
> > >>> -    VirtQueueElement *elem;
> > >>> +    /* Opaque data */
> > >>> +    void *opaque;
> > >>
> > >> So I wonder if we can simply:
> > >>
> > >> 1) introduce a opaque to VirtQueueElement
> > > (answered in other thread, pasting here for completion)
> > >
> > > It does not work for messages that are not generated by the guest. For
> > > example, the ones used to restore the device state at live migration
> > > destination.
> >
> >
> > For the ones that requires more metadata, we can store it in elem->opaque?
> >
>
> But there is no VirtQueueElem there. VirtQueueElem is allocated by
> virtqueue_pop, but state restoring messages are not received by this
> function. If we allocate an artificial one, a lot of members do not
> make sense (like in_addr / out_addr), and we should never use them
> with virtqueue_push / fill / flush and similar.

Ok.

>
> >
> > >
> > >> 2) store pointers to ring_id_maps
> > >>
> > > I think you mean to keep storing VirtQueueElement at ring_id_maps?
> >
> >
> > Yes and introduce a pointer to metadata in VirtQueueElement
> >
> >
> > > Otherwise, looking for them will not be immediate.
> > >
> > >> Since
> > >>
> > >> 1) VirtQueueElement's member looks general
> > > Not general enough :).
> > >
> > >> 2) help to reduce the tricky codes like vhost_svq_empty_elem() and
> > >> vhost_svq_empty_elem().
> > >>
> > > I'm ok to change to whatever other method, but to allocate them
> > > individually seems worse to me. Both performance wise and because
> > > error paths are more complicated. Maybe it would be less tricky if I
> > > try to move the use of them less "by value" and more "as pointers"?
> >
> >
> > Or let's having a dedicated arrays (like desc_state/desc_extra in
> > kernel) instead of trying to reuse ring_id_maps.
> >
>
> Sure, it looks to me like:
> * renaming ring_id_maps to desc_state/desc_extra/something similar,
> since now it's used to store more state that only the guest mapping
> * Rename "opaque" to "data"
> * Forget the wrapper and assume data == NULL is an invalid head /
> empty. To me they serve as a doc, but I guess it's fine to use them
> directly. The kernel works that way anyway.
>
> Does this look better?

Yes.

> It's definitely closer to the kernel so I guess
> it's an advantage.

I think the advantage is that it decouples the dynamic allocated
metadata (VirtQueueElem) out of the static allocated ones.

>
> Thanks!
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq
  2022-07-12  7:42       ` Jason Wang
@ 2022-07-12  9:42         ` Eugenio Perez Martin
  0 siblings, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-12  9:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Tue, Jul 12, 2022 at 9:42 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/8 18:10, Eugenio Perez Martin 写道:
> > On Fri, Jul 8, 2022 at 11:12 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Thu, Jul 7, 2022 at 2:40 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >>> The used idx used to match with this, but it will not match from the
> >>> moment we introduce svq_inject.
> >> It might be better to explain what "svq_inject" means here.
> >>
> > Good point, I'll change for the next version.
> >
> >>> Rewind all the descriptors not used by
> >>> vdpa device and get the vq state properly.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>   include/hw/virtio/virtio.h | 1 +
> >>>   hw/virtio/vhost-vdpa.c     | 7 +++----
> >>>   hw/virtio/virtio.c         | 5 +++++
> >>>   3 files changed, 9 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >>> index db1c0ddf6b..4b51ab9d06 100644
> >>> --- a/include/hw/virtio/virtio.h
> >>> +++ b/include/hw/virtio/virtio.h
> >>> @@ -302,6 +302,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
> >>>   hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
> >>>   hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
> >>>   unsigned int virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
> >>> +unsigned int virtio_queue_get_in_use(const VirtQueue *vq);
> >>>   void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n,
> >>>                                        unsigned int idx);
> >>>   void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index 2ee8009594..de76128030 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -1189,12 +1189,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >>>                                          struct vhost_vring_state *ring)
> >>>   {
> >>>       struct vhost_vdpa *v = dev->opaque;
> >>> -    int vdpa_idx = ring->index - dev->vq_index;
> >>>       int ret;
> >>>
> >>>       if (v->shadow_vqs_enabled) {
> >>> -        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> >>> -
> >>> +        const VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> >>>           /*
> >>>            * Setting base as last used idx, so destination will see as available
> >>>            * all the entries that the device did not use, including the in-flight
> >>> @@ -1203,7 +1201,8 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> >>>            * TODO: This is ok for networking, but other kinds of devices might
> >>>            * have problems with these retransmissions.
> >>>            */
> >>> -        ring->num = svq->last_used_idx;
> >>> +        ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index) -
> >>> +                    virtio_queue_get_in_use(vq);
> >> I think we need to change the above comment as well otherwise readers
> >> might get confused.
> >>
> > Re-thinking this: This part has always been buggy, so this is actually
> > a fix. I'll tag it for next versions or, even better, send it
> > separately.
> >
> > But the comment still holds: We cannot use the device's used idx since
> > it could not match with the guest visible one. This is actually easy
> > to trigger if we migrate a guest many times with traffic.
>
>
> I may miss someting, maybe you can give me an example on this (I assume
> the size of the svq is the same as what guest can see).
>

The code assumes that the device's last_used_idx will be the same as
the guest one. This was true as long as the guest has booted the
device, because one used descriptor in the device always forward to
one used descriptor to the guest.

However, now we're injecting buffers to the device so we can restore
the status. These buffers only count on the device's avail / used
rings, not in the guest's one. So we got the invalid one (device's).
We want to migrate the guest's visible vring state.

>
> >
> > Maybe it's cleaner to export directly used_idx from VirtQueue? Extra
> > care is needed with packed vq, but SVQ still does not support it. I
> > didn't want to duplicate that logic in virtio ring handling.
>
>
> So two more questions here:
>
> 1) what's the reason of rewinding via virtio_queue_get_in_use()?
>

We don't want to count in-flight descriptors, like rx ones, in the vq state.

Re-thinking about this, maybe we could get enough information about
them only with the VirtIODevice, and expose them as new available ones
in the destination?

> 2) it looks like we could end up with underflow with the above math?
>

I don't think so, we're using the same inuse variable both times.

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue
  2022-07-12  7:17   ` Jason Wang
@ 2022-07-12  9:47     ` Eugenio Perez Martin
  2022-07-14  6:54       ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-12  9:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Tue, Jul 12, 2022 at 9:18 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:40, Eugenio Pérez 写道:
> > Introduce the control virtqueue support for vDPA shadow virtqueue. This
> > is needed for advanced networking features like multiqueue.
> >
> > Virtio-net control VQ will copy the descriptors to qemu's VA, so we
> > avoid TOCTOU with the guest's or device's memory every time there is a
> > device model change.
>
>
> Not sure this is a must since we currently do cvq passthrough. So we
> might already "suffer" from this.
>

Since we currently do cvq passthrough, we don't update qemu's device
model. So there are only one element checking and using the cvq buffer
(vhost device), not two. The device itself may suffer from TOCTOU but
this is not something we can solve from qemu.

Now, we're adding the update of qemu device model at this patch, we're
opening a window where the guest could present some data to qemu, and
then some other data to the vhost device, making both of them
different udpates. To do a local copy on qemu is the solution to that,
but it can be others for sure.

>
> > When address space isolation is implemented, this
> > will allow, CVQ to only have access to control messages too.
> >
> > To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
> > implemented.  If virtio-net driver changes MAC the virtio-net device
> > model will be updated with the new one.
> >
> > Others cvq commands could be added here straightforwardly but they have
> > been not tested.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/vhost-vdpa.h |   3 +
> >   hw/virtio/vhost-vdpa.c         |   5 +-
> >   net/vhost-vdpa.c               | 373 +++++++++++++++++++++++++++++++++
> >   3 files changed, 379 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 7214eb47dc..1111d85643 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -15,6 +15,7 @@
> >   #include <gmodule.h>
> >
> >   #include "hw/virtio/vhost-iova-tree.h"
> > +#include "hw/virtio/vhost-shadow-virtqueue.h"
> >   #include "hw/virtio/virtio.h"
> >   #include "standard-headers/linux/vhost_types.h"
> >
> > @@ -35,6 +36,8 @@ typedef struct vhost_vdpa {
> >       /* IOVA mapping used by the Shadow Virtqueue */
> >       VhostIOVATree *iova_tree;
> >       GPtrArray *shadow_vqs;
> > +    const VhostShadowVirtqueueOps *shadow_vq_ops;
> > +    void *shadow_vq_ops_opaque;
> >       struct vhost_dev *dev;
> >       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >   } VhostVDPA;
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 613c3483b0..94bda07b4d 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -417,9 +417,10 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> >
> >       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
> >       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> > -        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree, NULL,
> > -                                                            NULL);
> > +        g_autoptr(VhostShadowVirtqueue) svq = NULL;
>
>
> I don't see the reason of this assignment consider it will just be
> initialized in the following line.
>

It can be deleted for sure, as long as we don't add code that return
in between or something like that. Compiler should both squash the
writes and warn us if we mistakenly add a return, so I'm fine with
deleting it.

>
> >
> > +        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
> > +                            v->shadow_vq_ops_opaque);
> >           if (unlikely(!svq)) {
> >               error_setg(errp, "Cannot create svq %u", n);
> >               return -1;
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index b0158f625e..e415cc8de5 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -11,11 +11,15 @@
> >
> >   #include "qemu/osdep.h"
> >   #include "clients.h"
> > +#include "hw/virtio/virtio-net.h"
> >   #include "net/vhost_net.h"
> >   #include "net/vhost-vdpa.h"
> >   #include "hw/virtio/vhost-vdpa.h"
> > +#include "qemu/buffer.h"
> >   #include "qemu/config-file.h"
> >   #include "qemu/error-report.h"
> > +#include "qemu/log.h"
> > +#include "qemu/memalign.h"
> >   #include "qemu/option.h"
> >   #include "qapi/error.h"
> >   #include <linux/vhost.h>
> > @@ -25,6 +29,26 @@
> >   #include "monitor/monitor.h"
> >   #include "hw/virtio/vhost.h"
> >
> > +typedef struct CVQElement {
> > +    /* Device's in and out buffer */
> > +    void *in_buf, *out_buf;
> > +
> > +    /* Optional guest element from where this cvqelement was created */
>
>
> Should be "cvq element".
>

I'll fix it, thanks!

>
> > +    VirtQueueElement *guest_elem;
> > +
> > +    /* Control header sent by the guest. */
> > +    struct virtio_net_ctrl_hdr ctrl;
> > +
> > +    /* vhost-vdpa device, for cleanup reasons */
> > +    struct vhost_vdpa *vdpa;
> > +
> > +    /* Length of out data */
> > +    size_t out_len;
> > +
> > +    /* Copy of the out data sent by the guest excluding ctrl. */
> > +    uint8_t out_data[];
> > +} CVQElement;
> > +
> >   /* Todo:need to add the multiqueue support here */
> >   typedef struct VhostVDPAState {
> >       NetClientState nc;
> > @@ -187,6 +211,351 @@ static NetClientInfo net_vhost_vdpa_info = {
> >           .check_peer_type = vhost_vdpa_check_peer_type,
> >   };
> >
> > +/**
> > + * Unmap a descriptor chain of a SVQ element, optionally copying its in buffers
> > + *
> > + * @svq: Shadow VirtQueue
> > + * @iova: SVQ IO Virtual address of descriptor
> > + * @iov: Optional iovec to store device writable buffer
> > + * @iov_cnt: iov length
> > + * @buf_len: Length written by the device
> > + *
> > + * TODO: Use me! and adapt to net/vhost-vdpa format
> > + * Print error message in case of error
> > + */
> > +static void vhost_vdpa_cvq_unmap_buf(CVQElement *elem, void *addr)
> > +{
> > +    struct vhost_vdpa *v = elem->vdpa;
> > +    VhostIOVATree *tree = v->iova_tree;
> > +    DMAMap needle = {
> > +        /*
> > +         * No need to specify size or to look for more translations since
> > +         * this contiguous chunk was allocated by us.
> > +         */
> > +        .translated_addr = (hwaddr)(uintptr_t)addr,
> > +    };
> > +    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
> > +    int r;
> > +
> > +    if (unlikely(!map)) {
> > +        error_report("Cannot locate expected map");
> > +        goto err;
> > +    }
> > +
> > +    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
> > +    if (unlikely(r != 0)) {
> > +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> > +    }
> > +
> > +    vhost_iova_tree_remove(tree, map);
> > +
> > +err:
> > +    qemu_vfree(addr);
> > +}
> > +
> > +static void vhost_vdpa_cvq_delete_elem(CVQElement *elem)
> > +{
> > +    if (elem->out_buf) {
> > +        vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->out_buf));
> > +    }
> > +
> > +    if (elem->in_buf) {
> > +        vhost_vdpa_cvq_unmap_buf(elem, g_steal_pointer(&elem->in_buf));
> > +    }
> > +
> > +    /* Guest element must have been returned to the guest or free otherway */
> > +    assert(!elem->guest_elem);
> > +
> > +    g_free(elem);
> > +}
> > +G_DEFINE_AUTOPTR_CLEANUP_FUNC(CVQElement, vhost_vdpa_cvq_delete_elem);
> > +
> > +static int vhost_vdpa_net_cvq_svq_inject(VhostShadowVirtqueue *svq,
> > +                                         CVQElement *cvq_elem,
> > +                                         size_t out_len)
> > +{
> > +    const struct iovec iov[] = {
> > +        {
> > +            .iov_base = cvq_elem->out_buf,
> > +            .iov_len = out_len,
> > +        },{
> > +            .iov_base = cvq_elem->in_buf,
> > +            .iov_len = sizeof(virtio_net_ctrl_ack),
> > +        }
> > +    };
> > +
> > +    return vhost_svq_inject(svq, iov, 1, 1, cvq_elem);
> > +}
> > +
> > +static void *vhost_vdpa_cvq_alloc_buf(struct vhost_vdpa *v,
> > +                                      const uint8_t *out_data, size_t data_len,
> > +                                      bool write)
> > +{
> > +    DMAMap map = {};
> > +    size_t buf_len = ROUND_UP(data_len, qemu_real_host_page_size());
> > +    void *buf = qemu_memalign(qemu_real_host_page_size(), buf_len);
> > +    int r;
> > +
> > +    if (!write) {
> > +        memcpy(buf, out_data, data_len);
> > +        memset(buf + data_len, 0, buf_len - data_len);
> > +    } else {
> > +        memset(buf, 0, data_len);
> > +    }
> > +
> > +    map.translated_addr = (hwaddr)(uintptr_t)buf;
> > +    map.size = buf_len - 1;
> > +    map.perm = write ? IOMMU_RW : IOMMU_RO,
> > +    r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
> > +    if (unlikely(r != IOVA_OK)) {
> > +        error_report("Cannot map injected element");
> > +        goto err;
> > +    }
> > +
> > +    r = vhost_vdpa_dma_map(v, map.iova, buf_len, buf, !write);
> > +    /* TODO: Handle error */
> > +    assert(r == 0);
> > +
> > +    return buf;
> > +
> > +err:
> > +    qemu_vfree(buf);
> > +    return NULL;
> > +}
> > +
> > +/**
> > + * Allocate an element suitable to be injected
> > + *
> > + * @iov: The iovec
> > + * @out_num: Number of out elements, placed first in iov
> > + * @in_num: Number of in elements, placed after out ones
> > + * @elem: Optional guest element from where this one was created
> > + *
> > + * TODO: Do we need a sg for out_num? I think not
> > + */
> > +static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
> > +                                             struct virtio_net_ctrl_hdr ctrl,
> > +                                             const struct iovec *out_sg,
> > +                                             size_t out_num, size_t out_size,
> > +                                             VirtQueueElement *elem)
> > +{
> > +    g_autoptr(CVQElement) cvq_elem = g_malloc(sizeof(CVQElement) + out_size);
> > +    uint8_t *out_cursor = cvq_elem->out_data;
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +
> > +    /* Start with a clean base */
> > +    memset(cvq_elem, 0, sizeof(*cvq_elem));
> > +    cvq_elem->vdpa = &s->vhost_vdpa;
> > +
> > +    /*
> > +     * Linearize element. If guest had a descriptor chain, we expose the device
> > +     * a single buffer.
> > +     */
> > +    cvq_elem->out_len = out_size;
> > +    memcpy(out_cursor, &ctrl, sizeof(ctrl));
> > +    out_size -= sizeof(ctrl);
> > +    out_cursor += sizeof(ctrl);
> > +    iov_to_buf(out_sg, out_num, 0, out_cursor, out_size);
> > +
> > +    cvq_elem->out_buf = vhost_vdpa_cvq_alloc_buf(v, cvq_elem->out_data,
> > +                                                 out_size, false);
> > +    assert(cvq_elem->out_buf);
> > +    cvq_elem->in_buf = vhost_vdpa_cvq_alloc_buf(v, NULL,
> > +                                                sizeof(virtio_net_ctrl_ack),
> > +                                                true);
> > +    assert(cvq_elem->in_buf);
> > +
> > +    cvq_elem->guest_elem = elem;
> > +    cvq_elem->ctrl = ctrl;
> > +    return g_steal_pointer(&cvq_elem);
> > +}
> > +
> > +/**
> > + * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
> > + * iov_size.
> > + */
> > +static uint64_t vhost_vdpa_net_iov_len(const struct iovec *iov,
> > +                                       unsigned int iov_cnt, size_t max)
> > +{
> > +    uint64_t len = 0;
> > +
> > +    for (unsigned int i = 0; len < max && i < iov_cnt; i++) {
> > +        bool overflow = uadd64_overflow(iov[i].iov_len, len, &len);
> > +        if (unlikely(overflow)) {
> > +            return UINT64_MAX;
> > +        }
>
>
> Let's use iov_size() here, and if you think we need to fix the overflow
> issue, we need fix it there then other user can benefit from that.
>

I think it can be solved with iov_size, let me rethink about it.

>
> > +    }
> > +
> > +    return len;
> > +}
> > +
> > +static CVQElement *vhost_vdpa_net_cvq_copy_elem(VhostVDPAState *s,
> > +                                                VirtQueueElement *elem)
> > +{
> > +    struct virtio_net_ctrl_hdr ctrl;
> > +    g_autofree struct iovec *iov = NULL;
> > +    struct iovec *iov2;
> > +    unsigned int out_num = elem->out_num;
> > +    size_t n, out_size = 0;
> > +
> > +    /* TODO: in buffer MUST have only a single entry with a char? size */
>
>
> I couldn't understand the question but we should not assume the layout
> of the control command.
>

This is a leftover TODO, sorry. It's not for the cvq command but for
the input buffer reserved for the answer.

>
> > +    if (unlikely(vhost_vdpa_net_iov_len(elem->in_sg, elem->in_num,
> > +                                        sizeof(virtio_net_ctrl_ack))
> > +                                              < sizeof(virtio_net_ctrl_ack))) {
> > +        return NULL;
> > +    }
>
>
> We don't have such check in virtio-net.c, anything make svq different?
>

It's the first conditional after the if(!elem), but it doesn't check
for overflow:
    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
...
        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
            ...) {
            virtio_error(vdev, "virtio-net ctrl missing headers");
            ...
            break;
        }

>
> > +
> > +    n = iov_to_buf(elem->out_sg, out_num, 0, &ctrl, sizeof(ctrl));
> > +    if (unlikely(n != sizeof(ctrl))) {
> > +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid out size\n", __func__);
> > +        return NULL;
> > +    }
> > +
> > +    iov = iov2 = g_memdup2(elem->out_sg, sizeof(struct iovec) * elem->out_num);
>
>
> Let's use iov_copy() here.
>

I'm fine with moving it to iov_copy, but it has some disadvantages.
For example, I need to know the size before, so I need to transverse
the iovec twice.

But maybe it could be better if we delete the first checks of the size.

> And I don't see how iov is used after this.
>

This is copied from virtio_net_handle_ctrl, but I just realized I
reversed the uses.

iov and iov2 are copies (memdup) of out iovec, so we don't modify
original ones. This is, VirtQueueElement's one.

Since they're memduped, they must be freed. One of them is used to be
able to free with the memduped addr. iov2 is the one explicitely freed
at virtio_net_handle_ctrl. iov is the one freed by g_autofree in the
new code.

The other one is used as a cursor to iterate the output.

>
> > +    iov_discard_front(&iov2, &out_num, sizeof(ctrl));
> > +    switch (ctrl.class) {
> > +    case VIRTIO_NET_CTRL_MAC:
> > +        switch (ctrl.cmd) {
> > +        case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > +            if (likely(vhost_vdpa_net_iov_len(iov2, out_num, 6))) {
> > +                out_size += 6;
> > +                break;
> > +            }
> > +
> > +            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac size\n", __func__);
> > +            return NULL;
>
>
> Note that we need to support VIRTIO_NET_CTRL_ANNOUNCE_ACK in order to
> support live migration.
>

I thought it was valid to add it on top of this series.

> But a more fundamental question, what's the value of having this kind of
> whitelist here?
>

What should qemu do if it forwards a command it does not understand
and the device returns VIRTIO_NET_OK? Migration is not possible from
that moment (unless we consider all of CVQ features best effort).

I think it's simpler to sanitize it before copying it to the device,
although it requires some pre-copy validations.

> Is it more simpler just have a sane limit of the buffer and simply
> forward everything to the vhost-vDPA?
>

I'm ok with exploring this. Should we return VIRTIO_NET_ERR if the
guest's output buffer is bigger than that limit, not forwarding it to
the device?

> And if we do this, instead of validating the inputs one by one we can
> simply doing validation only on VIRTIO_NET_CTRL_MAC_TABLE_SET which
> accepts variable length and simply fallback to alluni/allmulti if it
> contains too much entries.
>

So let's simulate the guest issues a cmd with a MAC table of >
MAC_TABLE_ENTRIES. QEMU should modify the request and enable alluni
and/or allmulti. We return VIRTIO_NET_OK to the guest, and the rx
filtering change event / query returns the right status.

This makes equal the behavior of the emulated virtio-net device model
and the vhost-vdpa one. The guest will receive packets that should
have been filtered out, rx filter event is right, and migration will
preserve the filtering behavior.

Otherwise, the device will not receive all the MAC filters, so the
guest will not receive frames that it wants to receive. As a
disadvantage, we're effectively capping the MAC table size, because
the device's size could be many times MAC_TABLE_ENTRIES.

>
> > +        default:
> > +            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n",
> > +                          __func__, ctrl.cmd);
> > +            return NULL;
> > +        };
> > +        break;
> > +    default:
> > +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n",
> > +                      __func__, ctrl.class);
> > +        return NULL;
> > +    };
> > +
> > +    return vhost_vdpa_cvq_alloc_elem(s, ctrl, iov2, out_num,
> > +                                     sizeof(ctrl) + out_size, elem);
> > +}
> > +
> > +/**
> > + * Validate and copy control virtqueue commands.
> > + *
> > + * Following QEMU guidelines, we offer a copy of the buffers to the device to
> > + * prevent TOCTOU bugs.  This functions check that the buffers length are
> > + * expected too.
> > + */
> > +static bool vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> > +                                             VirtQueueElement *guest_elem,
> > +                                             void *opaque)
> > +{
> > +    VhostVDPAState *s = opaque;
> > +    g_autoptr(CVQElement) cvq_elem = NULL;
> > +    g_autofree VirtQueueElement *elem = guest_elem;
> > +    size_t out_size, in_len;
> > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > +    int r;
> > +
> > +    cvq_elem = vhost_vdpa_net_cvq_copy_elem(s, elem);
> > +    if (unlikely(!cvq_elem)) {
> > +        goto err;
> > +    }
> > +
> > +    /* out size validated at vhost_vdpa_net_cvq_copy_elem */
> > +    out_size = iov_size(elem->out_sg, elem->out_num);
> > +    r = vhost_vdpa_net_cvq_svq_inject(svq, cvq_elem, out_size);
> > +    if (unlikely(r != 0)) {
> > +        goto err;
> > +    }
> > +
> > +    cvq_elem->guest_elem = g_steal_pointer(&elem);
> > +    /* Now CVQ elem belongs to SVQ */
> > +    g_steal_pointer(&cvq_elem);
> > +    return true;
> > +
> > +err:
> > +    in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
> > +                          sizeof(status));
> > +    vhost_svq_push_elem(svq, elem, in_len);
> > +    return true;
> > +}
> > +
> > +static VirtQueueElement *vhost_vdpa_net_handle_ctrl_detach(void *elem_opaque)
> > +{
> > +    g_autoptr(CVQElement) cvq_elem = elem_opaque;
> > +    return g_steal_pointer(&cvq_elem->guest_elem);
> > +}
> > +
> > +static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
> > +                                            void *vq_elem_opaque,
> > +                                            uint32_t dev_written)
> > +{
> > +    g_autoptr(CVQElement) cvq_elem = vq_elem_opaque;
> > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > +    const struct iovec out = {
> > +        .iov_base = cvq_elem->out_data,
> > +        .iov_len = cvq_elem->out_len,
> > +    };
> > +    const DMAMap status_map_needle = {
> > +        .translated_addr = (hwaddr)(uintptr_t)cvq_elem->in_buf,
> > +        .size = sizeof(status),
> > +    };
> > +    const DMAMap *in_map;
> > +    const struct iovec in = {
> > +        .iov_base = &status,
> > +        .iov_len = sizeof(status),
> > +    };
> > +    g_autofree VirtQueueElement *guest_elem = NULL;
> > +
> > +    if (unlikely(dev_written < sizeof(status))) {
> > +        error_report("Insufficient written data (%llu)",
> > +                     (long long unsigned)dev_written);
> > +        goto out;
> > +    }
> > +
> > +    in_map = vhost_iova_tree_find_iova(svq->iova_tree, &status_map_needle);
> > +    if (unlikely(!in_map)) {
> > +        error_report("Cannot locate out mapping");
> > +        goto out;
> > +    }
> > +
> > +    switch (cvq_elem->ctrl.class) {
> > +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > +        break;
> > +    default:
> > +        error_report("Unexpected ctrl class %u", cvq_elem->ctrl.class);
> > +        goto out;
> > +    };
> > +
> > +    memcpy(&status, cvq_elem->in_buf, sizeof(status));
> > +    if (status != VIRTIO_NET_OK) {
> > +        goto out;
> > +    }
> > +
> > +    status = VIRTIO_NET_ERR;
> > +    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1);
>
>
> I wonder if this is the best choice. It looks to me it might be better
> to extend the virtio_net_handle_ctrl_iov() logic:
>
> virtio_net_handle_ctrl_iov() {
>      if (svq enabled) {
>           host_elem = iov_copy(guest_elem);
>           vhost_svq_add(host_elem);
>           vhost_svq_poll(host_elem);
>      }
>      // usersapce ctrl vq logic
> }
>
>
> This can help to avoid coupling too much logic in cvq (like the
> avail,used and detach ops).
>

Let me try that way and I'll come back to you.

Thanks!

> Thanks
>
>
> > +    if (status != VIRTIO_NET_OK) {
> > +        error_report("Bad CVQ processing in model");
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    guest_elem = g_steal_pointer(&cvq_elem->guest_elem);
> > +    if (guest_elem) {
> > +        iov_from_buf(guest_elem->in_sg, guest_elem->in_num, 0, &status,
> > +                     sizeof(status));
> > +        vhost_svq_push_elem(svq, guest_elem, sizeof(status));
> > +    }
> > +}
> > +
> > +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> > +    .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> > +    .used_handler = vhost_vdpa_net_handle_ctrl_used,
> > +    .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
> > +};
> > +
> >   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                              const char *device,
> >                                              const char *name,
> > @@ -211,6 +580,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> > +    if (!is_datapath) {
> > +        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> > +        s->vhost_vdpa.shadow_vq_ops_opaque = s;
> > +    }
> >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >       if (ret) {
> >           qemu_del_net_client(nc);
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK
  2022-07-06 18:39 ` [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
  2022-07-08  9:06   ` Jason Wang
@ 2022-07-13  5:51   ` Michael S. Tsirkin
  2022-07-13  6:18     ` Eugenio Perez Martin
  1 sibling, 1 reply; 65+ messages in thread
From: Michael S. Tsirkin @ 2022-07-13  5:51 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Cindy Lu, Jason Wang

On Wed, Jul 06, 2022 at 08:39:48PM +0200, Eugenio Pérez wrote:
> To restore the device in the destination of a live migration we send the
> commands through control virtqueue. For a device to read CVQ it must
> have received DRIVER_OK status bit.
> 
> However this open a window where the device could start receiving
> packets in rx queue 0 before it receive the RSS configuration. To avoid
> that, we will not send vring_enable until all configuration is used by
> the device.
> 
> As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Not a comment on this patch specifically, but generally:

You should know that lots of existing drivers are buggy and
try to poke at the VQs before DRIVER_OK. We are doing our best
to fix them but it's taking forever. For now it's a good
idea to support such drivers even if they are out of spec.
You do that by starting on the first kick in absence of DRIVER_OK.
Further, adding buffers before DRIVER_OK is actually allowed,
as long as you don't kick.


> ---
>  hw/virtio/vhost-vdpa.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 66f054a12c..2ee8009594 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
>      return idx;
>  }
>  
> +/**
> + * Set ready all vring of the device
> + *
> + * @dev: Vhost device
> + */
>  static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>  {
>      int i;
>      trace_vhost_vdpa_set_vring_ready(dev);
> -    for (i = 0; i < dev->nvqs; ++i) {
> +    for (i = 0; i < dev->vq_index_end; ++i) {
>          struct vhost_vring_state state = {
> -            .index = dev->vq_index + i,
> +            .index = i,
>              .num = 1,
>          };
>          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> @@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          if (unlikely(!ok)) {
>              return -1;
>          }
> -        vhost_vdpa_set_vring_ready(dev);
>      } else {
>          ok = vhost_vdpa_svqs_stop(dev);
>          if (unlikely(!ok)) {
> @@ -1111,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>      }
>  
>      if (started) {
> +        int r;
> +
>          memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        if (unlikely(r)) {
> +            return r;
> +        }
> +        vhost_vdpa_set_vring_ready(dev);
>      } else {
>          vhost_vdpa_reset_device(dev);
>          vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                     VIRTIO_CONFIG_S_DRIVER);
>          memory_listener_unregister(&v->listener);
> -
> -        return 0;
>      }
> +
> +    return 0;
>  }
>  
>  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> -- 
> 2.31.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK
  2022-07-13  5:51   ` Michael S. Tsirkin
@ 2022-07-13  6:18     ` Eugenio Perez Martin
  0 siblings, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-13  6:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Cindy Lu, Jason Wang

On Wed, Jul 13, 2022 at 7:52 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 06, 2022 at 08:39:48PM +0200, Eugenio Pérez wrote:
> > To restore the device in the destination of a live migration we send the
> > commands through control virtqueue. For a device to read CVQ it must
> > have received DRIVER_OK status bit.
> >
> > However this open a window where the device could start receiving
> > packets in rx queue 0 before it receive the RSS configuration. To avoid
> > that, we will not send vring_enable until all configuration is used by
> > the device.
> >
> > As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
> Not a comment on this patch specifically, but generally:
>
> You should know that lots of existing drivers are buggy and
> try to poke at the VQs before DRIVER_OK. We are doing our best
> to fix them but it's taking forever. For now it's a good
> idea to support such drivers even if they are out of spec.

I think vhost-vdpa should not need to explicitly handle it, since it
is started after DRIVER_OK. But I think it's a good idea to perform a
fast test. I think those kicks will go to the device's ioeventfd and
the specific virtqueue's handle_output callback.

> You do that by starting on the first kick in absence of DRIVER_OK.
> Further, adding buffers before DRIVER_OK is actually allowed,
> as long as you don't kick.
>

SVQ adds all the buffers after the guest's driver_ok and after set
driver_ok to the device. What it does is to send CVQ buffers before
enabling the data queues with VHOST_VDPA_SET_VRING_ENABLE. Only CVQ is
enabled at this point, but DRIVER_OK has already been sent.

Or am I missing something?

Thanks!

>
> > ---
> >  hw/virtio/vhost-vdpa.c | 22 ++++++++++++++++------
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 66f054a12c..2ee8009594 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -728,13 +728,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> >      return idx;
> >  }
> >
> > +/**
> > + * Set ready all vring of the device
> > + *
> > + * @dev: Vhost device
> > + */
> >  static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> >  {
> >      int i;
> >      trace_vhost_vdpa_set_vring_ready(dev);
> > -    for (i = 0; i < dev->nvqs; ++i) {
> > +    for (i = 0; i < dev->vq_index_end; ++i) {
> >          struct vhost_vring_state state = {
> > -            .index = dev->vq_index + i,
> > +            .index = i,
> >              .num = 1,
> >          };
> >          vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > @@ -1097,7 +1102,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >          if (unlikely(!ok)) {
> >              return -1;
> >          }
> > -        vhost_vdpa_set_vring_ready(dev);
> >      } else {
> >          ok = vhost_vdpa_svqs_stop(dev);
> >          if (unlikely(!ok)) {
> > @@ -1111,16 +1115,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >      }
> >
> >      if (started) {
> > +        int r;
> > +
> >          memory_listener_register(&v->listener, &address_space_memory);
> > -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        if (unlikely(r)) {
> > +            return r;
> > +        }
> > +        vhost_vdpa_set_vring_ready(dev);
> >      } else {
> >          vhost_vdpa_reset_device(dev);
> >          vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> >                                     VIRTIO_CONFIG_S_DRIVER);
> >          memory_listener_unregister(&v->listener);
> > -
> > -        return 0;
> >      }
> > +
> > +    return 0;
> >  }
> >
> >  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > --
> > 2.31.1
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue
  2022-07-12  9:47     ` Eugenio Perez Martin
@ 2022-07-14  6:54       ` Eugenio Perez Martin
  2022-07-14  7:04         ` Jason Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-14  6:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

> > > +static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
> > > +                                            void *vq_elem_opaque,
> > > +                                            uint32_t dev_written)
> > > +{
> > > +    g_autoptr(CVQElement) cvq_elem = vq_elem_opaque;
> > > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > > +    const struct iovec out = {
> > > +        .iov_base = cvq_elem->out_data,
> > > +        .iov_len = cvq_elem->out_len,
> > > +    };
> > > +    const DMAMap status_map_needle = {
> > > +        .translated_addr = (hwaddr)(uintptr_t)cvq_elem->in_buf,
> > > +        .size = sizeof(status),
> > > +    };
> > > +    const DMAMap *in_map;
> > > +    const struct iovec in = {
> > > +        .iov_base = &status,
> > > +        .iov_len = sizeof(status),
> > > +    };
> > > +    g_autofree VirtQueueElement *guest_elem = NULL;
> > > +
> > > +    if (unlikely(dev_written < sizeof(status))) {
> > > +        error_report("Insufficient written data (%llu)",
> > > +                     (long long unsigned)dev_written);
> > > +        goto out;
> > > +    }
> > > +
> > > +    in_map = vhost_iova_tree_find_iova(svq->iova_tree, &status_map_needle);
> > > +    if (unlikely(!in_map)) {
> > > +        error_report("Cannot locate out mapping");
> > > +        goto out;
> > > +    }
> > > +
> > > +    switch (cvq_elem->ctrl.class) {
> > > +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > > +        break;
> > > +    default:
> > > +        error_report("Unexpected ctrl class %u", cvq_elem->ctrl.class);
> > > +        goto out;
> > > +    };
> > > +
> > > +    memcpy(&status, cvq_elem->in_buf, sizeof(status));
> > > +    if (status != VIRTIO_NET_OK) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    status = VIRTIO_NET_ERR;
> > > +    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1);
> >
> >
> > I wonder if this is the best choice. It looks to me it might be better
> > to extend the virtio_net_handle_ctrl_iov() logic:
> >
> > virtio_net_handle_ctrl_iov() {
> >      if (svq enabled) {
> >           host_elem = iov_copy(guest_elem);
> >           vhost_svq_add(host_elem);
> >           vhost_svq_poll(host_elem);
> >      }
> >      // usersapce ctrl vq logic
> > }
> >
> >
> > This can help to avoid coupling too much logic in cvq (like the
> > avail,used and detach ops).
> >
>
> Let me try that way and I'll come back to you.
>

The problem with that approach is that virtio_net_handle_ctrl_iov is
called from the SVQ used handler. How could we call it otherwise? I
find it pretty hard to do unless we return SVQ to the model where we
used VirtQueue.handle_output, discarded long ago.

I'm about to send a new version, but I still need to call
virtio_net_handle_ctrl_iov from the avail handler. The handlers used
and discard are removed at least.

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue
  2022-07-14  6:54       ` Eugenio Perez Martin
@ 2022-07-14  7:04         ` Jason Wang
  2022-07-14 17:37           ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Wang @ 2022-07-14  7:04 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Thu, Jul 14, 2022 at 2:54 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> > > > +static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
> > > > +                                            void *vq_elem_opaque,
> > > > +                                            uint32_t dev_written)
> > > > +{
> > > > +    g_autoptr(CVQElement) cvq_elem = vq_elem_opaque;
> > > > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > > > +    const struct iovec out = {
> > > > +        .iov_base = cvq_elem->out_data,
> > > > +        .iov_len = cvq_elem->out_len,
> > > > +    };
> > > > +    const DMAMap status_map_needle = {
> > > > +        .translated_addr = (hwaddr)(uintptr_t)cvq_elem->in_buf,
> > > > +        .size = sizeof(status),
> > > > +    };
> > > > +    const DMAMap *in_map;
> > > > +    const struct iovec in = {
> > > > +        .iov_base = &status,
> > > > +        .iov_len = sizeof(status),
> > > > +    };
> > > > +    g_autofree VirtQueueElement *guest_elem = NULL;
> > > > +
> > > > +    if (unlikely(dev_written < sizeof(status))) {
> > > > +        error_report("Insufficient written data (%llu)",
> > > > +                     (long long unsigned)dev_written);
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    in_map = vhost_iova_tree_find_iova(svq->iova_tree, &status_map_needle);
> > > > +    if (unlikely(!in_map)) {
> > > > +        error_report("Cannot locate out mapping");
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    switch (cvq_elem->ctrl.class) {
> > > > +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > > > +        break;
> > > > +    default:
> > > > +        error_report("Unexpected ctrl class %u", cvq_elem->ctrl.class);
> > > > +        goto out;
> > > > +    };
> > > > +
> > > > +    memcpy(&status, cvq_elem->in_buf, sizeof(status));
> > > > +    if (status != VIRTIO_NET_OK) {
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    status = VIRTIO_NET_ERR;
> > > > +    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1);
> > >
> > >
> > > I wonder if this is the best choice. It looks to me it might be better
> > > to extend the virtio_net_handle_ctrl_iov() logic:
> > >
> > > virtio_net_handle_ctrl_iov() {
> > >      if (svq enabled) {
> > >           host_elem = iov_copy(guest_elem);
> > >           vhost_svq_add(host_elem);
> > >           vhost_svq_poll(host_elem);
> > >      }
> > >      // usersapce ctrl vq logic
> > > }
> > >
> > >
> > > This can help to avoid coupling too much logic in cvq (like the
> > > avail,used and detach ops).
> > >
> >
> > Let me try that way and I'll come back to you.
> >
>
> The problem with that approach is that virtio_net_handle_ctrl_iov is
> called from the SVQ used handler. How could we call it otherwise? I
> find it pretty hard to do unless we return SVQ to the model where we
> used VirtQueue.handle_output, discarded long ago.

I'm not sure I get this. Can we simply let the cvq to be trapped as
the current userspace datapath did?

Thanks

>
> I'm about to send a new version, but I still need to call
> virtio_net_handle_ctrl_iov from the avail handler. The handlers used
> and discard are removed at least.
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue
  2022-07-14  7:04         ` Jason Wang
@ 2022-07-14 17:37           ` Eugenio Perez Martin
  0 siblings, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-14 17:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Thu, Jul 14, 2022 at 9:04 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Jul 14, 2022 at 2:54 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > > > > +static void vhost_vdpa_net_handle_ctrl_used(VhostShadowVirtqueue *svq,
> > > > > +                                            void *vq_elem_opaque,
> > > > > +                                            uint32_t dev_written)
> > > > > +{
> > > > > +    g_autoptr(CVQElement) cvq_elem = vq_elem_opaque;
> > > > > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > > > > +    const struct iovec out = {
> > > > > +        .iov_base = cvq_elem->out_data,
> > > > > +        .iov_len = cvq_elem->out_len,
> > > > > +    };
> > > > > +    const DMAMap status_map_needle = {
> > > > > +        .translated_addr = (hwaddr)(uintptr_t)cvq_elem->in_buf,
> > > > > +        .size = sizeof(status),
> > > > > +    };
> > > > > +    const DMAMap *in_map;
> > > > > +    const struct iovec in = {
> > > > > +        .iov_base = &status,
> > > > > +        .iov_len = sizeof(status),
> > > > > +    };
> > > > > +    g_autofree VirtQueueElement *guest_elem = NULL;
> > > > > +
> > > > > +    if (unlikely(dev_written < sizeof(status))) {
> > > > > +        error_report("Insufficient written data (%llu)",
> > > > > +                     (long long unsigned)dev_written);
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    in_map = vhost_iova_tree_find_iova(svq->iova_tree, &status_map_needle);
> > > > > +    if (unlikely(!in_map)) {
> > > > > +        error_report("Cannot locate out mapping");
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    switch (cvq_elem->ctrl.class) {
> > > > > +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > > > > +        break;
> > > > > +    default:
> > > > > +        error_report("Unexpected ctrl class %u", cvq_elem->ctrl.class);
> > > > > +        goto out;
> > > > > +    };
> > > > > +
> > > > > +    memcpy(&status, cvq_elem->in_buf, sizeof(status));
> > > > > +    if (status != VIRTIO_NET_OK) {
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    status = VIRTIO_NET_ERR;
> > > > > +    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1);
> > > >
> > > >
> > > > I wonder if this is the best choice. It looks to me it might be better
> > > > to extend the virtio_net_handle_ctrl_iov() logic:
> > > >
> > > > virtio_net_handle_ctrl_iov() {
> > > >      if (svq enabled) {
> > > >           host_elem = iov_copy(guest_elem);
> > > >           vhost_svq_add(host_elem);
> > > >           vhost_svq_poll(host_elem);
> > > >      }
> > > >      // usersapce ctrl vq logic
> > > > }
> > > >
> > > >
> > > > This can help to avoid coupling too much logic in cvq (like the
> > > > avail,used and detach ops).
> > > >
> > >
> > > Let me try that way and I'll come back to you.
> > >
> >
> > The problem with that approach is that virtio_net_handle_ctrl_iov is
> > called from the SVQ used handler. How could we call it otherwise? I
> > find it pretty hard to do unless we return SVQ to the model where we
> > used VirtQueue.handle_output, discarded long ago.
>
> I'm not sure I get this. Can we simply let the cvq to be trapped as
> the current userspace datapath did?
>

Sending a very early draft RFC with that method, so we can compare if
it is worth the trouble

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq
  2022-07-12  7:26   ` Jason Wang
@ 2022-07-17 10:30     ` Eugenio Perez Martin
  2022-07-17 11:00       ` Eugenio Perez Martin
  0 siblings, 1 reply; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-17 10:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Tue, Jul 12, 2022 at 9:26 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/7/7 02:40, Eugenio Pérez 写道:
> > As a first step we only enable CVQ first than others. Future patches add
> > state restore.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index e415cc8de5..77d013833f 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -370,6 +370,24 @@ static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
> >       return g_steal_pointer(&cvq_elem);
> >   }
> >
> > +static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
> > +                                        void *opaque)
> > +{
> > +    struct vhost_vring_state state = {
> > +        .index = virtio_get_queue_index(svq->vq),
> > +        .num = 1,
> > +    };
> > +    VhostVDPAState *s = opaque;
> > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    int r;
> > +
> > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > +
> > +    r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > +    return r < 0 ? -errno : r;
> > +}
> > +
> >   /**
> >    * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
> >    * iov_size.
> > @@ -554,6 +572,7 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> >       .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> >       .used_handler = vhost_vdpa_net_handle_ctrl_used,
> >       .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
> > +    .start = vhost_vdpa_start_control_svq,
> >   };
>
>
> I wonder if vhost_net_start() is something better than here. It knows
> all virtqueues and it can do whatever it wants, we just need to make
> shadow virtqueue visible there?
>

But this needs to be called after the set of DRIVER_OK and before
VHOST_VRING_ENABLE.

I also think vhost_net_start is a better place, but to achieve it we
need to split vhost_vdpa_dev_start to call VHOST_VRING_ENABLE after
it. Maybe through .vhost_set_vring_enable? Why wasn't it done that way
from the beginning?

After that, we need to modify the vhost_net_start sequence. Currently,
vhost_net is calling VHOST_VRING_ENABLE right after each vhost_dev
vhost_dev_start. Vdpa would need to call vhost_dev_start for each
device, and then call .vhost_set_vring_enable for each device again.
And to add the vdpa_cvq_start in the middle.

It's not a lot of code change but I think we're safer self containing
it in vdpa at the moment, and then we can move to vhost_net
immediately for the development cycle. If the vhost-user backend
should support this other sequence immediately, I'm ok with sending a
new version before Tuesday.

Thanks!



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq
  2022-07-17 10:30     ` Eugenio Perez Martin
@ 2022-07-17 11:00       ` Eugenio Perez Martin
  0 siblings, 0 replies; 65+ messages in thread
From: Eugenio Perez Martin @ 2022-07-17 11:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Liuxiangdong, Markus Armbruster,
	Harpreet Singh Anand, Eric Blake, Laurent Vivier, Parav Pandit,
	Cornelia Huck, Paolo Bonzini, Gautam Dawar, Eli Cohen,
	Gonglei (Arei),
	Zhu Lingshan, Michael S. Tsirkin, Cindy Lu

On Sun, Jul 17, 2022 at 12:30 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jul 12, 2022 at 9:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/7/7 02:40, Eugenio Pérez 写道:
> > > As a first step we only enable CVQ first than others. Future patches add
> > > state restore.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   net/vhost-vdpa.c | 19 +++++++++++++++++++
> > >   1 file changed, 19 insertions(+)
> > >
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index e415cc8de5..77d013833f 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -370,6 +370,24 @@ static CVQElement *vhost_vdpa_cvq_alloc_elem(VhostVDPAState *s,
> > >       return g_steal_pointer(&cvq_elem);
> > >   }
> > >
> > > +static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
> > > +                                        void *opaque)
> > > +{
> > > +    struct vhost_vring_state state = {
> > > +        .index = virtio_get_queue_index(svq->vq),
> > > +        .num = 1,
> > > +    };
> > > +    VhostVDPAState *s = opaque;
> > > +    struct vhost_dev *dev = s->vhost_vdpa.dev;
> > > +    struct vhost_vdpa *v = dev->opaque;
> > > +    int r;
> > > +
> > > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > > +
> > > +    r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > > +    return r < 0 ? -errno : r;
> > > +}
> > > +
> > >   /**
> > >    * iov_size with an upper limit. It's assumed UINT64_MAX is an invalid
> > >    * iov_size.
> > > @@ -554,6 +572,7 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> > >       .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
> > >       .used_handler = vhost_vdpa_net_handle_ctrl_used,
> > >       .detach_handler = vhost_vdpa_net_handle_ctrl_detach,
> > > +    .start = vhost_vdpa_start_control_svq,
> > >   };
> >
> >
> > I wonder if vhost_net_start() is something better than here. It knows
> > all virtqueues and it can do whatever it wants, we just need to make
> > shadow virtqueue visible there?
> >
>
> But this needs to be called after the set of DRIVER_OK and before
> VHOST_VRING_ENABLE.
>
> I also think vhost_net_start is a better place, but to achieve it we
> need to split vhost_vdpa_dev_start to call VHOST_VRING_ENABLE after
> it. Maybe through .vhost_set_vring_enable? Why wasn't it done that way
> from the beginning?
>
> After that, we need to modify the vhost_net_start sequence. Currently,
> vhost_net is calling VHOST_VRING_ENABLE right after each vhost_dev
> vhost_dev_start. Vdpa would need to call vhost_dev_start for each
> device, and then call .vhost_set_vring_enable for each device again.
> And to add the vdpa_cvq_start in the middle.
>
> It's not a lot of code change but I think we're safer self containing
> it in vdpa at the moment, and then we can move to vhost_net
> immediately for the development cycle. If the vhost-user backend
> should support this other sequence immediately, I'm ok with sending a
> new version before Tuesday.
>

+= Moved to vhost_vdpa in the already sent RFC [1].

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02856.html



^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2022-07-17 11:02 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 18:39 [RFC PATCH v9 00/23] Net Control VQ support in SVQ Eugenio Pérez
2022-07-06 18:39 ` [RFC PATCH v9 01/23] vhost: Return earlier if used buffers overrun Eugenio Pérez
2022-07-08  8:52   ` Jason Wang
2022-07-06 18:39 ` [RFC PATCH v9 02/23] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
2022-07-06 18:39 ` [RFC PATCH v9 03/23] vdpa: delay set_vring_ready after DRIVER_OK Eugenio Pérez
2022-07-08  9:06   ` Jason Wang
2022-07-08  9:56     ` Eugenio Perez Martin
2022-07-08  9:59     ` Eugenio Perez Martin
2022-07-13  5:51   ` Michael S. Tsirkin
2022-07-13  6:18     ` Eugenio Perez Martin
2022-07-06 18:39 ` [RFC PATCH v9 04/23] vhost: Get vring base from vq, not svq Eugenio Pérez
2022-07-08  9:12   ` Jason Wang
2022-07-08 10:10     ` Eugenio Perez Martin
2022-07-12  7:42       ` Jason Wang
2022-07-12  9:42         ` Eugenio Perez Martin
2022-07-06 18:39 ` [RFC PATCH v9 05/23] vhost: Add ShadowVirtQueueStart operation Eugenio Pérez
2022-07-06 18:39 ` [RFC PATCH v9 06/23] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
2022-07-06 18:39 ` [RFC PATCH v9 07/23] vhost: add vhost_svq_push_elem Eugenio Pérez
2022-07-06 18:39 ` [RFC PATCH v9 08/23] vhost: Decouple vhost_svq_add_split from VirtQueueElement Eugenio Pérez
2022-07-11  8:00   ` Jason Wang
2022-07-11  8:27     ` Eugenio Perez Martin
2022-07-12  7:43       ` Jason Wang
2022-07-06 18:39 ` [RFC PATCH v9 09/23] vhost: Add SVQElement Eugenio Pérez
2022-07-11  9:00   ` Jason Wang
2022-07-11  9:33     ` Eugenio Perez Martin
2022-07-12  7:49       ` Jason Wang
2022-07-06 18:39 ` [RFC PATCH v9 10/23] vhost: Reorder vhost_svq_last_desc_of_chain Eugenio Pérez
2022-07-06 18:39 ` [RFC PATCH v9 11/23] vhost: Move last chain id to SVQ element Eugenio Pérez
2022-07-11  9:02   ` Jason Wang
2022-07-06 18:39 ` [RFC PATCH v9 12/23] vhost: Add opaque member to SVQElement Eugenio Pérez
2022-07-11  9:05   ` Jason Wang
2022-07-11  9:56     ` Eugenio Perez Martin
2022-07-12  7:53       ` Jason Wang
2022-07-12  8:32         ` Eugenio Perez Martin
2022-07-12  8:43           ` Jason Wang
2022-07-06 18:39 ` [RFC PATCH v9 13/23] vhost: Add vhost_svq_inject Eugenio Pérez
2022-07-11  9:14   ` Jason Wang
2022-07-11  9:43     ` Eugenio Perez Martin
2022-07-12  7:58       ` Jason Wang
2022-07-06 18:39 ` [RFC PATCH v9 14/23] vhost: add vhost_svq_poll Eugenio Pérez
2022-07-11  9:19   ` Jason Wang
2022-07-11 17:52     ` Eugenio Perez Martin
2022-07-06 18:40 ` [RFC PATCH v9 15/23] vhost: Add custom used buffer callback Eugenio Pérez
2022-07-06 18:40 ` [RFC PATCH v9 16/23] vhost: Add svq avail_handler callback Eugenio Pérez
2022-07-06 18:40 ` [RFC PATCH v9 17/23] vhost: add detach SVQ operation Eugenio Pérez
2022-07-06 18:40 ` [RFC PATCH v9 18/23] vdpa: Export vhost_vdpa_dma_map and unmap calls Eugenio Pérez
2022-07-11  9:22   ` Jason Wang
2022-07-06 18:40 ` [RFC PATCH v9 19/23] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
2022-07-12  4:11   ` Jason Wang
2022-07-06 18:40 ` [RFC PATCH v9 20/23] vdpa: Buffer CVQ support on shadow virtqueue Eugenio Pérez
2022-07-12  7:17   ` Jason Wang
2022-07-12  9:47     ` Eugenio Perez Martin
2022-07-14  6:54       ` Eugenio Perez Martin
2022-07-14  7:04         ` Jason Wang
2022-07-14 17:37           ` Eugenio Perez Martin
2022-07-06 18:40 ` [RFC PATCH v9 21/23] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
2022-07-12  7:26   ` Jason Wang
2022-07-17 10:30     ` Eugenio Perez Martin
2022-07-17 11:00       ` Eugenio Perez Martin
2022-07-06 18:40 ` [RFC PATCH v9 22/23] vdpa: Inject virtio-net mac address via CVQ at start Eugenio Pérez
2022-07-06 18:40 ` [RFC PATCH v9 23/23] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
2022-07-07  6:23   ` Markus Armbruster
2022-07-08 10:53     ` Eugenio Perez Martin
2022-07-08 12:51       ` Markus Armbruster
2022-07-11  7:14         ` Eugenio Perez Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.