All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ
@ 2022-08-02 17:57 Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 01/10] vhost: stop transfer elem ownership in vhost_handle_guest_kick Eugenio Pérez
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
virtio-net device model is updated. The migration was blocked because although
the state can be megrated between VMM it was not possible to restore on the
destination NIC.

This series add support for SVQ to inject external messages without the guest's
knowledge, so before the guest is resumed all the guest visible state is
restored. It is done using standard CVQ messages, so the vhost-vdpa device does
not need to learn how to restore it: As long as they have the feature, they
know how to handle it.

This series needs fixes [1], [2] and [3] to be applied to achieve full live
migration.

Thanks!

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02984.html
[2] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg03993.html
[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00325.html

v5:
- Rename s/start/load/
- Use independent NetClientInfo to only add load callback on cvq.
- Accept out sg instead of dev_buffers[] at vhost_vdpa_net_cvq_map_elem
- Use only out size instead of iovec dev_buffers to know if the descriptor is
  effectively available, allowing to delete artificial !NULL VirtQueueElement
  on vhost_svq_add call.

v4:
- Actually use NetClientInfo callback.

v3:
- Route vhost-vdpa start code through NetClientInfo callback.
- Delete extra vhost_net_stop_one() call.

v2:
- Fix SIGSEGV dereferencing SVQ when not in svq mode

v1 from RFC:
- Do not reorder DRIVER_OK & enable patches.
- Delete leftovers

Eugenio Pérez (10):
  vhost: stop transfer elem ownership in vhost_handle_guest_kick
  vhost: use SVQ element ndescs instead of opaque data for desc
    validation
  vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem
  vdpa: Extract vhost_vdpa_net_cvq_add from
    vhost_vdpa_net_handle_ctrl_avail
  vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  vdpa: add NetClientState->load() callback
  vdpa: add net_vhost_vdpa_cvq_info NetClientInfo
  vdpa: Add virtio-net mac address via CVQ at start
  vdpa: Delete CVQ migration blocker

 include/hw/virtio/vhost-vdpa.h     |   1 -
 include/net/net.h                  |   2 +
 hw/net/vhost_net.c                 |   7 ++
 hw/virtio/vhost-shadow-virtqueue.c |  31 +++---
 hw/virtio/vhost-vdpa.c             |  14 ---
 net/vhost-vdpa.c                   | 163 +++++++++++++++++++++--------
 6 files changed, 145 insertions(+), 73 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 01/10] vhost: stop transfer elem ownership in vhost_handle_guest_kick
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation Eugenio Pérez
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

It was easier to allow vhost_svq_add to handle the memory. Now that we
will allow qemu to add elements to a SVQ without the guest's knowledge,
it's better to handle it in the caller.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index e4956728dd..ffd2b2c972 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -233,9 +233,6 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 /**
  * Add an element to a SVQ.
  *
- * The caller must check that there is enough slots for the new element. It
- * takes ownership of the element: In case of failure not ENOSPC, it is free.
- *
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
@@ -252,7 +249,6 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
 
     ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
     if (unlikely(!ok)) {
-        g_free(elem);
         return -EINVAL;
     }
 
@@ -293,7 +289,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
         virtio_queue_set_notification(svq->vq, false);
 
         while (true) {
-            VirtQueueElement *elem;
+            g_autofree VirtQueueElement *elem;
             int r;
 
             if (svq->next_guest_avail_elem) {
@@ -324,12 +320,14 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                      * queue the current guest descriptor and ignore kicks
                      * until some elements are used.
                      */
-                    svq->next_guest_avail_elem = elem;
+                    svq->next_guest_avail_elem = g_steal_pointer(&elem);
                 }
 
                 /* VQ is full or broken, just return and ignore kicks */
                 return;
             }
+            /* elem belongs to SVQ or external caller now */
+            elem = NULL;
         }
 
         virtio_queue_set_notification(svq->vq, true);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 01/10] vhost: stop transfer elem ownership in vhost_handle_guest_kick Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-04  3:01   ` Jason Wang
  2022-08-02 17:57 ` [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush Eugenio Pérez
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

Since we're going to allow SVQ to add elements without the guest's
knowledge and without its own VirtQueueElement, it's easier to check if
an element is a valid head checking a different thing than the
VirtQueueElement.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index ffd2b2c972..e6eebd0e8d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -414,7 +414,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
         return NULL;
     }
 
-    if (unlikely(!svq->desc_state[used_elem.id].elem)) {
+    if (unlikely(!svq->desc_state[used_elem.id].ndescs)) {
         qemu_log_mask(LOG_GUEST_ERROR,
             "Device %s says index %u is used, but it was not available",
             svq->vdev->name, used_elem.id);
@@ -422,6 +422,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
     }
 
     num = svq->desc_state[used_elem.id].ndescs;
+    svq->desc_state[used_elem.id].ndescs = 0;
     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
     svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 01/10] vhost: stop transfer elem ownership in vhost_handle_guest_kick Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-04  3:14   ` Jason Wang
  2022-08-02 17:57 ` [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem Eugenio Pérez
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

Since QEMU will be able to inject new elements on CVQ to restore the
state, we need not to depend on a VirtQueueElement to know if a new
element has been used by the device or not. Instead of check that, check
if there are new elements only using used idx on vhost_svq_flush.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index e6eebd0e8d..fdb550c31b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -491,7 +491,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 /**
  * Poll the SVQ for one device used buffer.
  *
- * This function race with main event loop SVQ polling, so extra
+ * This function races with main event loop SVQ polling, so extra
  * synchronization is needed.
  *
  * Return the length written by the device.
@@ -499,20 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
 {
     int64_t start_us = g_get_monotonic_time();
-    do {
+    while (true) {
         uint32_t len;
-        VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
-        if (elem) {
-            return len;
-        }
 
         if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
             return 0;
         }
 
-        /* Make sure we read new used_idx */
-        smp_rmb();
-    } while (true);
+        if (!vhost_svq_more_used(svq)) {
+            continue;
+        }
+
+        vhost_svq_get_buf(svq, &len);
+        return len;
+    }
 }
 
 /**
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-04  4:01   ` Jason Wang
  2022-08-02 17:57 ` [PATCH v5 05/10] vdpa: Extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail Eugenio Pérez
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

There is no need to get them by parameter, since they're contained in
VhostVDPAState. The only useful information was the written length in
out.

Simplify the function removing those.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ac1810723c..c6699edfbc 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -303,34 +303,29 @@ dma_map_err:
 
 /**
  * Copy the guest element into a dedicated buffer suitable to be sent to NIC
- *
- * @iov: [0] is the out buffer, [1] is the in one
  */
 static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
                                         VirtQueueElement *elem,
-                                        struct iovec *iov)
+                                        size_t *out_len)
 {
     size_t in_copied;
     bool ok;
 
-    iov[0].iov_base = s->cvq_cmd_out_buffer;
     ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
-                                vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
-                                &iov[0].iov_len, false);
+                                vhost_vdpa_net_cvq_cmd_len(),
+                                s->cvq_cmd_out_buffer, out_len, false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    iov[1].iov_base = s->cvq_cmd_in_buffer;
     ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0,
-                                sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
-                                &in_copied, true);
+                                sizeof(virtio_net_ctrl_ack),
+                                s->cvq_cmd_in_buffer, &in_copied, true);
     if (unlikely(!ok)) {
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
         return false;
     }
 
-    iov[1].iov_len = sizeof(virtio_net_ctrl_ack);
     return true;
 }
 
@@ -395,7 +390,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
     int r = -EINVAL;
     bool ok;
 
-    ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers);
+    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
     if (unlikely(!ok)) {
         goto out;
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 05/10] vdpa: Extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg Eugenio Pérez
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

So we can reuse it to inject state messages.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
--
v5:
* Do not use an artificial !NULL VirtQueueElement
* Use only out size instead of iovec dev_buffers for these functions.
---
 net/vhost-vdpa.c | 73 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 24 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index c6699edfbc..33bf3d6409 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -329,6 +329,52 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
     return true;
 }
 
+static virtio_net_ctrl_ack vhost_vdpa_net_cvq_add(VhostVDPAState *s,
+                                                  size_t out_len)
+{
+    /* Buffers for the device */
+    const struct iovec out = {
+        .iov_base = s->cvq_cmd_out_buffer,
+        .iov_len = out_len,
+    };
+    const struct iovec in = {
+        .iov_base = s->cvq_cmd_in_buffer,
+        .iov_len = sizeof(virtio_net_ctrl_ack),
+    };
+    VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
+    /* in buffer used for device model */
+    virtio_net_ctrl_ack status;
+    size_t dev_written;
+    int r;
+
+    r = vhost_svq_add(svq, &out, 1, &in, 1, NULL);
+    if (unlikely(r != 0)) {
+        if (unlikely(r == -ENOSPC)) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+                          __func__);
+        }
+        return VIRTIO_NET_ERR;
+    }
+
+    /*
+     * We can poll here since we've had BQL from the time we sent the
+     * descriptor. Also, we need to take the answer before SVQ pulls by itself,
+     * when BQL is released
+     */
+    dev_written = vhost_svq_poll(svq);
+    if (unlikely(dev_written < sizeof(status))) {
+        error_report("Insufficient written data (%zu)", dev_written);
+        return VIRTIO_NET_ERR;
+    }
+
+    memcpy(&status, s->cvq_cmd_in_buffer, sizeof(status));
+    if (status != VIRTIO_NET_OK) {
+        return VIRTIO_NET_ERR;
+    }
+
+    return VIRTIO_NET_OK;
+}
+
 /**
  * Do not forward commands not supported by SVQ. Otherwise, the device could
  * accept it and qemu would not know how to update the device model.
@@ -375,7 +421,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
                                             void *opaque)
 {
     VhostVDPAState *s = opaque;
-    size_t in_len, dev_written;
+    size_t in_len;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
     /* out and in buffers sent to the device */
     struct iovec dev_buffers[2] = {
@@ -387,7 +433,6 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
         .iov_base = &status,
         .iov_len = sizeof(status),
     };
-    int r = -EINVAL;
     bool ok;
 
     ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
@@ -400,27 +445,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
         goto out;
     }
 
-    r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, elem);
-    if (unlikely(r != 0)) {
-        if (unlikely(r == -ENOSPC)) {
-            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
-                          __func__);
-        }
-        goto out;
-    }
-
-    /*
-     * We can poll here since we've had BQL from the time we sent the
-     * descriptor. Also, we need to take the answer before SVQ pulls by itself,
-     * when BQL is released
-     */
-    dev_written = vhost_svq_poll(svq);
-    if (unlikely(dev_written < sizeof(status))) {
-        error_report("Insufficient written data (%zu)", dev_written);
-        goto out;
-    }
-
-    memcpy(&status, dev_buffers[1].iov_base, sizeof(status));
+    status = vhost_vdpa_net_cvq_add(s, dev_buffers[0].iov_len);
     if (status != VIRTIO_NET_OK) {
         goto out;
     }
@@ -445,7 +470,7 @@ out:
     if (dev_buffers[1].iov_base) {
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[1].iov_base);
     }
-    return r;
+    return 0;
 }
 
 static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 05/10] vdpa: Extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-04  4:16   ` Jason Wang
  2022-08-02 17:57 ` [PATCH v5 07/10] vdpa: add NetClientState->load() callback Eugenio Pérez
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

So its generic enough to accept any out sg buffer and we can inject
NIC state messages.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v5: Accept out sg instead of dev_buffers[]
---
 net/vhost-vdpa.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 33bf3d6409..2421bca347 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -302,16 +302,16 @@ dma_map_err:
 }
 
 /**
- * Copy the guest element into a dedicated buffer suitable to be sent to NIC
+ * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
  */
-static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
-                                        VirtQueueElement *elem,
-                                        size_t *out_len)
+static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
+                                      const struct iovec *out, size_t out_num,
+                                      size_t *out_len)
 {
     size_t in_copied;
     bool ok;
 
-    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
+    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
                                 vhost_vdpa_net_cvq_cmd_len(),
                                 s->cvq_cmd_out_buffer, out_len, false);
     if (unlikely(!ok)) {
@@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
     };
     bool ok;
 
-    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
+    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
+                                   &dev_buffers[0].iov_len);
     if (unlikely(!ok)) {
         goto out;
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 07/10] vdpa: add NetClientState->load() callback
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 08/10] vdpa: add net_vhost_vdpa_cvq_info NetClientInfo Eugenio Pérez
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

It allows per-net client operations right after device's successful
start. In particular, to load the device status.

Vhost-vdpa net will use it to add the CVQ buffers to restore the device
status.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v5: Rename start / load, naming it more specifically.
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 523136c7ac..a8d47309cd 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
+typedef int (NetLoad)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
     NetReceive *receive_raw;
     NetReceiveIOV *receive_iov;
     NetCanReceive *can_receive;
+    NetLoad *load;
     NetCleanup *cleanup;
     LinkStatusChanged *link_status_changed;
     QueryRxFilter *query_rx_filter;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..a9bf72dcda 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -274,6 +274,13 @@ static int vhost_net_start_one(struct vhost_net *net,
             }
         }
     }
+
+    if (net->nc->info->load) {
+        r = net->nc->info->load(net->nc);
+        if (r < 0) {
+            goto fail;
+        }
+    }
     return 0;
 fail:
     file.fd = -1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 08/10] vdpa: add net_vhost_vdpa_cvq_info NetClientInfo
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 07/10] vdpa: add NetClientState->load() callback Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 09/10] vdpa: Add virtio-net mac address via CVQ at start Eugenio Pérez
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

Next patches will add a new info callback to restore NIC status through
CVQ. Since only the CVQ vhost device is needed, create it with a new
NetClientInfo.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v5: Create a new NetClientInfo instead of reusing the dataplane one.
---
 net/vhost-vdpa.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2421bca347..8d400f2dff 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -375,6 +375,16 @@ static virtio_net_ctrl_ack vhost_vdpa_net_cvq_add(VhostVDPAState *s,
     return VIRTIO_NET_OK;
 }
 
+static NetClientInfo net_vhost_vdpa_cvq_info = {
+    .type = NET_CLIENT_DRIVER_VHOST_VDPA,
+    .size = sizeof(VhostVDPAState),
+    .receive = vhost_vdpa_receive,
+    .cleanup = vhost_vdpa_cleanup,
+    .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
+    .has_ufo = vhost_vdpa_has_ufo,
+    .check_peer_type = vhost_vdpa_check_peer_type,
+};
+
 /**
  * Do not forward commands not supported by SVQ. Otherwise, the device could
  * accept it and qemu would not know how to update the device model.
@@ -496,7 +506,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
         nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device,
                                  name);
     } else {
-        nc = qemu_new_net_control_client(&net_vhost_vdpa_info, peer,
+        nc = qemu_new_net_control_client(&net_vhost_vdpa_cvq_info, peer,
                                          device, name);
     }
     snprintf(nc->info_str, sizeof(nc->info_str), TYPE_VHOST_VDPA);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 09/10] vdpa: Add virtio-net mac address via CVQ at start
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 08/10] vdpa: add net_vhost_vdpa_cvq_info NetClientInfo Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-02 17:57 ` [PATCH v5 10/10] vdpa: Delete CVQ migration blocker Eugenio Pérez
  2022-08-04  4:21 ` [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Jason Wang
  10 siblings, 0 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

This is needed so the destination vdpa device see the same state a the
guest set in the source.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v5:
* Rename s/start/load/
* Use independent NetClientInfo to only add load callback on cvq.
---
 net/vhost-vdpa.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 8d400f2dff..d489fcd91e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -375,10 +375,60 @@ static virtio_net_ctrl_ack vhost_vdpa_net_cvq_add(VhostVDPAState *s,
     return VIRTIO_NET_OK;
 }
 
+static int vhost_vdpa_net_load(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+    VirtIONet *n;
+    uint64_t features;
+
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    if (!v->shadow_vqs_enabled) {
+        return 0;
+    }
+
+    n = VIRTIO_NET(v->dev->vdev);
+    features = v->dev->vdev->host_features;
+    if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+        const struct virtio_net_ctrl_hdr ctrl = {
+            .class = VIRTIO_NET_CTRL_MAC,
+            .cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+        };
+        uint8_t mac[6];
+        const struct iovec out[] = {
+            {
+                .iov_base = (void *)&ctrl,
+                .iov_len = sizeof(ctrl),
+            },{
+                .iov_base = mac,
+                .iov_len = sizeof(mac),
+            },
+        };
+        size_t out_len;
+        bool ok;
+        virtio_net_ctrl_ack state;
+
+        ok = vhost_vdpa_net_cvq_map_sg(s, out, ARRAY_SIZE(out), &out_len);
+        if (unlikely(!ok)) {
+            return -1;
+        }
+
+        memcpy(mac, n->mac, sizeof(mac));
+        state = vhost_vdpa_net_cvq_add(s, out_len);
+        vhost_vdpa_cvq_unmap_buf(v, s->cvq_cmd_out_buffer);
+        vhost_vdpa_cvq_unmap_buf(v, s->cvq_cmd_in_buffer);
+        return state == VIRTIO_NET_OK ? 0 : 1;
+    }
+
+    return 0;
+}
+
 static NetClientInfo net_vhost_vdpa_cvq_info = {
     .type = NET_CLIENT_DRIVER_VHOST_VDPA,
     .size = sizeof(VhostVDPAState),
     .receive = vhost_vdpa_receive,
+    .load = vhost_vdpa_net_load,
     .cleanup = vhost_vdpa_cleanup,
     .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
     .has_ufo = vhost_vdpa_has_ufo,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 10/10] vdpa: Delete CVQ migration blocker
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 09/10] vdpa: Add virtio-net mac address via CVQ at start Eugenio Pérez
@ 2022-08-02 17:57 ` Eugenio Pérez
  2022-08-04  4:21 ` [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Jason Wang
  10 siblings, 0 replies; 26+ messages in thread
From: Eugenio Pérez @ 2022-08-02 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Jason Wang, Liuxiangdong, Zhu Lingshan

We can restore the device state in the destination via CVQ now. Remove
the migration blocker.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h |  1 -
 hw/virtio/vhost-vdpa.c         | 14 --------------
 net/vhost-vdpa.c               |  2 --
 3 files changed, 17 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index d10a89303e..1111d85643 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -35,7 +35,6 @@ typedef struct vhost_vdpa {
     bool shadow_vqs_enabled;
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
-    Error *migration_blocker;
     GPtrArray *shadow_vqs;
     const VhostShadowVirtqueueOps *shadow_vq_ops;
     void *shadow_vq_ops_opaque;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index e44c23dce5..8882077955 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1029,13 +1029,6 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
         return true;
     }
 
-    if (v->migration_blocker) {
-        int r = migrate_add_blocker(v->migration_blocker, &err);
-        if (unlikely(r < 0)) {
-            return false;
-        }
-    }
-
     for (i = 0; i < v->shadow_vqs->len; ++i) {
         VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -1078,10 +1071,6 @@ err:
         vhost_svq_stop(svq);
     }
 
-    if (v->migration_blocker) {
-        migrate_del_blocker(v->migration_blocker);
-    }
-
     return false;
 }
 
@@ -1101,9 +1090,6 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
         }
     }
 
-    if (v->migration_blocker) {
-        migrate_del_blocker(v->migration_blocker);
-    }
     return true;
 }
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index d489fcd91e..f933ba53a3 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -576,8 +576,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
         s->vhost_vdpa.shadow_vq_ops_opaque = s;
-        error_setg(&s->vhost_vdpa.migration_blocker,
-                   "Migration disabled: vhost-vdpa uses CVQ.");
     }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation
  2022-08-02 17:57 ` [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation Eugenio Pérez
@ 2022-08-04  3:01   ` Jason Wang
  2022-08-04  5:56     ` Eugenio Perez Martin
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2022-08-04  3:01 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan


在 2022/8/3 01:57, Eugenio Pérez 写道:
> Since we're going to allow SVQ to add elements without the guest's
> knowledge and without its own VirtQueueElement, it's easier to check if
> an element is a valid head checking a different thing than the
> VirtQueueElement.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


Patch looks good to me. But I spot several other issues:

1) vhost_svq_add() use size_t for in_num and out_num, is this intended?
2) do we need to fail vhost_svq_add() if in_num + out_num == 0?

Thanks


>   hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index ffd2b2c972..e6eebd0e8d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -414,7 +414,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>           return NULL;
>       }
>   
> -    if (unlikely(!svq->desc_state[used_elem.id].elem)) {
> +    if (unlikely(!svq->desc_state[used_elem.id].ndescs)) {
>           qemu_log_mask(LOG_GUEST_ERROR,
>               "Device %s says index %u is used, but it was not available",
>               svq->vdev->name, used_elem.id);
> @@ -422,6 +422,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>       }
>   
>       num = svq->desc_state[used_elem.id].ndescs;
> +    svq->desc_state[used_elem.id].ndescs = 0;
>       last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
>       svq->desc_next[last_used_chain] = svq->free_head;
>       svq->free_head = used_elem.id;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  2022-08-02 17:57 ` [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush Eugenio Pérez
@ 2022-08-04  3:14   ` Jason Wang
  2022-08-04  6:21     ` Eugenio Perez Martin
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2022-08-04  3:14 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan


在 2022/8/3 01:57, Eugenio Pérez 写道:
> Since QEMU will be able to inject new elements on CVQ to restore the
> state, we need not to depend on a VirtQueueElement to know if a new
> element has been used by the device or not. Instead of check that, check
> if there are new elements only using used idx on vhost_svq_flush.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.c | 18 +++++++++---------
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index e6eebd0e8d..fdb550c31b 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -491,7 +491,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>   /**
>    * Poll the SVQ for one device used buffer.
>    *
> - * This function race with main event loop SVQ polling, so extra
> + * This function races with main event loop SVQ polling, so extra
>    * synchronization is needed.
>    *
>    * Return the length written by the device.
> @@ -499,20 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>   size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
>   {
>       int64_t start_us = g_get_monotonic_time();
> -    do {
> +    while (true) {
>           uint32_t len;
> -        VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
> -        if (elem) {
> -            return len;
> -        }
>   
>           if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
>               return 0;
>           }
>   
> -        /* Make sure we read new used_idx */
> -        smp_rmb();
> -    } while (true);
> +        if (!vhost_svq_more_used(svq)) {
> +            continue;
> +        }
> +
> +        vhost_svq_get_buf(svq, &len);


I wonder if this means we won't worry about the infinite wait?

Thanks


> +        return len;
> +    }
>   }
>   
>   /**



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem
  2022-08-02 17:57 ` [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem Eugenio Pérez
@ 2022-08-04  4:01   ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2022-08-04  4:01 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan


在 2022/8/3 01:57, Eugenio Pérez 写道:
> There is no need to get them by parameter, since they're contained in
> VhostVDPAState. The only useful information was the written length in
> out.
>
> Simplify the function removing those.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 17 ++++++-----------
>   1 file changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index ac1810723c..c6699edfbc 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -303,34 +303,29 @@ dma_map_err:
>   
>   /**
>    * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> - *
> - * @iov: [0] is the out buffer, [1] is the in one
>    */
>   static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
>                                           VirtQueueElement *elem,
> -                                        struct iovec *iov)
> +                                        size_t *out_len)
>   {
>       size_t in_copied;
>       bool ok;
>   
> -    iov[0].iov_base = s->cvq_cmd_out_buffer;
>       ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> -                                vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
> -                                &iov[0].iov_len, false);
> +                                vhost_vdpa_net_cvq_cmd_len(),
> +                                s->cvq_cmd_out_buffer, out_len, false);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    iov[1].iov_base = s->cvq_cmd_in_buffer;
>       ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0,
> -                                sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
> -                                &in_copied, true);
> +                                sizeof(virtio_net_ctrl_ack),
> +                                s->cvq_cmd_in_buffer, &in_copied, true);


I'd suggest to do some tweak to make it easier for the reviewers:

- let vhost_vdpa_cvq_map_buf() and vhost_vdpa_net_cvq_map_elem() return 
ssize_t and drop the confusing written/out_len parameter of those 
functions.
- rename vhost_vdpa_net_cvq_map_elem() to 
vhost_vdpa_net_cvq_bounce_map() since it uses a bounce buffer actually

Thanks


>       if (unlikely(!ok)) {
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>           return false;
>       }
>   
> -    iov[1].iov_len = sizeof(virtio_net_ctrl_ack);
>       return true;
>   }
>   
> @@ -395,7 +390,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
>       int r = -EINVAL;
>       bool ok;
>   
> -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers);
> +    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
>       if (unlikely(!ok)) {
>           goto out;
>       }



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-02 17:57 ` [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg Eugenio Pérez
@ 2022-08-04  4:16   ` Jason Wang
  2022-08-04  7:38     ` Eugenio Perez Martin
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2022-08-04  4:16 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan


在 2022/8/3 01:57, Eugenio Pérez 写道:
> So its generic enough to accept any out sg buffer and we can inject
> NIC state messages.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v5: Accept out sg instead of dev_buffers[]
> ---
>   net/vhost-vdpa.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 33bf3d6409..2421bca347 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -302,16 +302,16 @@ dma_map_err:
>   }
>   
>   /**
> - * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> + * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
>    */
> -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
> -                                        VirtQueueElement *elem,
> -                                        size_t *out_len)
> +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
> +                                      const struct iovec *out, size_t out_num,
> +                                      size_t *out_len)


This still looks not genreal as there's no guarantee that we won't have 
command-in-specific-data. One example is that Ali is working on the 
virtio-net statistics fetching from the control virtqueue.

So it looks to me we'd better have a general bounce_map here that accepts:

1) out_sg and out_num
2) in_sg and in_num

In this level, we'd better not have any special care about the in as the 
ack. And we need do bouncing:

1) for out buffer, during map
2) for in buffer during unmap

Thanks


>   {
>       size_t in_copied;
>       bool ok;
>   
> -    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
>                                   vhost_vdpa_net_cvq_cmd_len(),
>                                   s->cvq_cmd_out_buffer, out_len, false);
>       if (unlikely(!ok)) {
> @@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
>       };
>       bool ok;
>   
> -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
> +    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
> +                                   &dev_buffers[0].iov_len);
>       if (unlikely(!ok)) {
>           goto out;
>       }



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ
  2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-08-02 17:57 ` [PATCH v5 10/10] vdpa: Delete CVQ migration blocker Eugenio Pérez
@ 2022-08-04  4:21 ` Jason Wang
  2022-08-04 18:17   ` Eugenio Perez Martin
  10 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2022-08-04  4:21 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Eli Cohen, Stefano Garzarella, Parav Pandit, Markus Armbruster,
	Gautam Dawar, Stefan Hajnoczi, Harpreet Singh Anand,
	Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan


在 2022/8/3 01:57, Eugenio Pérez 写道:
> CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
> virtio-net device model is updated. The migration was blocked because although
> the state can be megrated between VMM it was not possible to restore on the
> destination NIC.
>
> This series add support for SVQ to inject external messages without the guest's
> knowledge, so before the guest is resumed all the guest visible state is
> restored. It is done using standard CVQ messages, so the vhost-vdpa device does
> not need to learn how to restore it: As long as they have the feature, they
> know how to handle it.
>
> This series needs fixes [1], [2] and [3] to be applied to achieve full live
> migration.
>
> Thanks!
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02984.html
> [2] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg03993.html


Note that the above has been merged into master.

And the series looks good overall, just some comments to make the code 
easier to be read and maintained in the future.

Thanks


> [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00325.html
>
> v5:
> - Rename s/start/load/
> - Use independent NetClientInfo to only add load callback on cvq.
> - Accept out sg instead of dev_buffers[] at vhost_vdpa_net_cvq_map_elem
> - Use only out size instead of iovec dev_buffers to know if the descriptor is
>    effectively available, allowing to delete artificial !NULL VirtQueueElement
>    on vhost_svq_add call.
>
> v4:
> - Actually use NetClientInfo callback.
>
> v3:
> - Route vhost-vdpa start code through NetClientInfo callback.
> - Delete extra vhost_net_stop_one() call.
>
> v2:
> - Fix SIGSEGV dereferencing SVQ when not in svq mode
>
> v1 from RFC:
> - Do not reorder DRIVER_OK & enable patches.
> - Delete leftovers
>
> Eugenio Pérez (10):
>    vhost: stop transfer elem ownership in vhost_handle_guest_kick
>    vhost: use SVQ element ndescs instead of opaque data for desc
>      validation
>    vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
>    vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem
>    vdpa: Extract vhost_vdpa_net_cvq_add from
>      vhost_vdpa_net_handle_ctrl_avail
>    vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
>    vdpa: add NetClientState->load() callback
>    vdpa: add net_vhost_vdpa_cvq_info NetClientInfo
>    vdpa: Add virtio-net mac address via CVQ at start
>    vdpa: Delete CVQ migration blocker
>
>   include/hw/virtio/vhost-vdpa.h     |   1 -
>   include/net/net.h                  |   2 +
>   hw/net/vhost_net.c                 |   7 ++
>   hw/virtio/vhost-shadow-virtqueue.c |  31 +++---
>   hw/virtio/vhost-vdpa.c             |  14 ---
>   net/vhost-vdpa.c                   | 163 +++++++++++++++++++++--------
>   6 files changed, 145 insertions(+), 73 deletions(-)
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation
  2022-08-04  3:01   ` Jason Wang
@ 2022-08-04  5:56     ` Eugenio Perez Martin
  2022-08-04  7:28       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Eugenio Perez Martin @ 2022-08-04  5:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 5:01 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > Since we're going to allow SVQ to add elements without the guest's
> > knowledge and without its own VirtQueueElement, it's easier to check if
> > an element is a valid head checking a different thing than the
> > VirtQueueElement.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
>
>
> Patch looks good to me. But I spot several other issues:
>
> 1) vhost_svq_add() use size_t for in_num and out_num, is this intended?

Would you prefer them to be unsigned? To me size_t fits better, but
VirtQueueElement uses unsigned anyway.

> 2) do we need to fail vhost_svq_add() if in_num + out_num == 0?
>

We can recover from it, but it's a failure of qemu code.
- In the case of loading the state to the destination device, we
already know the layout (it's always 1 out, 1 in).
- In the case of forwarding buffers, there is no way to get a
VirtQueueElement with 0 out and 0 in descriptors, due to the virtqueue
way of working.

Would you prefer to return success in this case?

Thanks!



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  2022-08-04  3:14   ` Jason Wang
@ 2022-08-04  6:21     ` Eugenio Perez Martin
  2022-08-04  7:30       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Eugenio Perez Martin @ 2022-08-04  6:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 5:14 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > Since QEMU will be able to inject new elements on CVQ to restore the
> > state, we need not to depend on a VirtQueueElement to know if a new
> > element has been used by the device or not. Instead of check that, check
> > if there are new elements only using used idx on vhost_svq_flush.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.c | 18 +++++++++---------
> >   1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index e6eebd0e8d..fdb550c31b 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -491,7 +491,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >   /**
> >    * Poll the SVQ for one device used buffer.
> >    *
> > - * This function race with main event loop SVQ polling, so extra
> > + * This function races with main event loop SVQ polling, so extra
> >    * synchronization is needed.
> >    *
> >    * Return the length written by the device.
> > @@ -499,20 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >   size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
> >   {
> >       int64_t start_us = g_get_monotonic_time();
> > -    do {
> > +    while (true) {
> >           uint32_t len;
> > -        VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
> > -        if (elem) {
> > -            return len;
> > -        }
> >
> >           if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
> >               return 0;
> >           }
> >
> > -        /* Make sure we read new used_idx */
> > -        smp_rmb();
> > -    } while (true);
> > +        if (!vhost_svq_more_used(svq)) {
> > +            continue;
> > +        }
> > +
> > +        vhost_svq_get_buf(svq, &len);
>
>
> I wonder if this means we won't worry about the infinite wait?
>

vhost_svq_get_buf call doesn't block, and the check for the timeout is
immediately above the check for new descriptors. Am I missing
something?

> Thanks
>
>
> > +        return len;
> > +    }
> >   }
> >
> >   /**
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation
  2022-08-04  5:56     ` Eugenio Perez Martin
@ 2022-08-04  7:28       ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2022-08-04  7:28 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 1:57 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 5:01 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > > Since we're going to allow SVQ to add elements without the guest's
> > > knowledge and without its own VirtQueueElement, it's easier to check if
> > > an element is a valid head checking a different thing than the
> > > VirtQueueElement.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> >
> >
> > Patch looks good to me. But I spot several other issues:
> >
> > 1) vhost_svq_add() use size_t for in_num and out_num, is this intended?
>
> Would you prefer them to be unsigned? To me size_t fits better, but
> VirtQueueElement uses unsigned anyway.

Yes, the problem I ask is because the in_num were from
VirtQueueElement which is unsigned and cast to size_t and then cast
back to unsigned.

>
> > 2) do we need to fail vhost_svq_add() if in_num + out_num == 0?
> >
>
> We can recover from it, but it's a failure of qemu code.
> - In the case of loading the state to the destination device, we
> already know the layout (it's always 1 out, 1 in).
> - In the case of forwarding buffers, there is no way to get a
> VirtQueueElement with 0 out and 0 in descriptors, due to the virtqueue
> way of working.

So I think it's better to fail the svq_add in this case.

Thanks

>
> Would you prefer to return success in this case?
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush
  2022-08-04  6:21     ` Eugenio Perez Martin
@ 2022-08-04  7:30       ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2022-08-04  7:30 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 2:21 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 5:14 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > > Since QEMU will be able to inject new elements on CVQ to restore the
> > > state, we need not to depend on a VirtQueueElement to know if a new
> > > element has been used by the device or not. Instead of check that, check
> > > if there are new elements only using used idx on vhost_svq_flush.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   hw/virtio/vhost-shadow-virtqueue.c | 18 +++++++++---------
> > >   1 file changed, 9 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index e6eebd0e8d..fdb550c31b 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -491,7 +491,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >   /**
> > >    * Poll the SVQ for one device used buffer.
> > >    *
> > > - * This function race with main event loop SVQ polling, so extra
> > > + * This function races with main event loop SVQ polling, so extra
> > >    * synchronization is needed.
> > >    *
> > >    * Return the length written by the device.
> > > @@ -499,20 +499,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >   size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
> > >   {
> > >       int64_t start_us = g_get_monotonic_time();
> > > -    do {
> > > +    while (true) {
> > >           uint32_t len;
> > > -        VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
> > > -        if (elem) {
> > > -            return len;
> > > -        }
> > >
> > >           if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
> > >               return 0;
> > >           }
> > >
> > > -        /* Make sure we read new used_idx */
> > > -        smp_rmb();
> > > -    } while (true);
> > > +        if (!vhost_svq_more_used(svq)) {
> > > +            continue;
> > > +        }
> > > +
> > > +        vhost_svq_get_buf(svq, &len);
> >
> >
> > I wonder if this means we won't worry about the infinite wait?
> >
>
> vhost_svq_get_buf call doesn't block, and the check for the timeout is
> immediately above the check for new descriptors. Am I missing
> something?

No, I misread the code.

Sorry.

Thanks

>
> > Thanks
> >
> >
> > > +        return len;
> > > +    }
> > >   }
> > >
> > >   /**
> >
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-04  4:16   ` Jason Wang
@ 2022-08-04  7:38     ` Eugenio Perez Martin
  2022-08-04  7:51       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Eugenio Perez Martin @ 2022-08-04  7:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 6:17 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > So its generic enough to accept any out sg buffer and we can inject
> > NIC state messages.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v5: Accept out sg instead of dev_buffers[]
> > ---
> >   net/vhost-vdpa.c | 13 +++++++------
> >   1 file changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 33bf3d6409..2421bca347 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -302,16 +302,16 @@ dma_map_err:
> >   }
> >
> >   /**
> > - * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> > + * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
> >    */
> > -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
> > -                                        VirtQueueElement *elem,
> > -                                        size_t *out_len)
> > +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
> > +                                      const struct iovec *out, size_t out_num,
> > +                                      size_t *out_len)
>
>
> This still looks not genreal as there's no guarantee that we won't have
> command-in-specific-data. One example is that Ali is working on the
> virtio-net statistics fetching from the control virtqueue.
>
> So it looks to me we'd better have a general bounce_map here that accepts:
>
> 1) out_sg and out_num
> 2) in_sg and in_num
>

We don't need to pass in_sg for that: The only useful information is
its size. Since the exposed buffer is an in one, it's enough to expose
the s->cvq_cmd_in_buffer buffer in full. The caller already knows the
device will write to it, so the only missing piece is to return the
written length at vhost_vdpa_net_cvq_add.

Is one page the right buffer size for the in buffer? Is it worth
worrying about it before implementing the stat control command in qemu
virtio-net?

> In this level, we'd better not have any special care about the in as the
> ack.

We need to care about it. If a property has not been updated in the
vdpa device (it returned VIRTIO_NET_ERR), we must not update the
device model.

We can move the processing from vhost_vdpa_net_cvq_add to
vhost_vdpa_net_load and vhost_vdpa_net_handle_ctrl_avail, but the code
gets duplicated then.

> And we need do bouncing:
>
> 1) for out buffer, during map
> 2) for in buffer during unmap
>

We can move the copy of the in_buffer to the unmap for sure.

Thanks!

> Thanks
>
>
> >   {
> >       size_t in_copied;
> >       bool ok;
> >
> > -    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> > +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
> >                                   vhost_vdpa_net_cvq_cmd_len(),
> >                                   s->cvq_cmd_out_buffer, out_len, false);
> >       if (unlikely(!ok)) {
> > @@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> >       };
> >       bool ok;
> >
> > -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
> > +    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
> > +                                   &dev_buffers[0].iov_len);
> >       if (unlikely(!ok)) {
> >           goto out;
> >       }
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-04  7:38     ` Eugenio Perez Martin
@ 2022-08-04  7:51       ` Jason Wang
  2022-08-04  8:19         ` Eugenio Perez Martin
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2022-08-04  7:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 6:17 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > > So its generic enough to accept any out sg buffer and we can inject
> > > NIC state messages.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > v5: Accept out sg instead of dev_buffers[]
> > > ---
> > >   net/vhost-vdpa.c | 13 +++++++------
> > >   1 file changed, 7 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > index 33bf3d6409..2421bca347 100644
> > > --- a/net/vhost-vdpa.c
> > > +++ b/net/vhost-vdpa.c
> > > @@ -302,16 +302,16 @@ dma_map_err:
> > >   }
> > >
> > >   /**
> > > - * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> > > + * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
> > >    */
> > > -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
> > > -                                        VirtQueueElement *elem,
> > > -                                        size_t *out_len)
> > > +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
> > > +                                      const struct iovec *out, size_t out_num,
> > > +                                      size_t *out_len)
> >
> >
> > This still looks not genreal as there's no guarantee that we won't have
> > command-in-specific-data. One example is that Ali is working on the
> > virtio-net statistics fetching from the control virtqueue.
> >
> > So it looks to me we'd better have a general bounce_map here that accepts:
> >
> > 1) out_sg and out_num
> > 2) in_sg and in_num
> >
>
> We don't need to pass in_sg for that: The only useful information is
> its size.

What if we support stats in the future where it extends the ctrl command:

         u8 command;
         u8 command-specific-data[];
         u8 ack;
+        u8 command-specific-data-reply[];

in

https://lists.oasis-open.org/archives/virtio-dev/202203/msg00000.html

> Since the exposed buffer is an in one, it's enough to expose
> the s->cvq_cmd_in_buffer buffer in full. The caller already knows the
> device will write to it, so the only missing piece is to return the
> written length at vhost_vdpa_net_cvq_add.
>
> Is one page the right buffer size for the in buffer?

We can start from this.

> Is it worth
> worrying about it before implementing the stat control command in qemu
> virtio-net?

If it's not complex, it's better to do that from the beginning,
otherwise the user may be surprised and we need extra work. Anyhow, we
should support at least ack which is an in_sg.

>
> > In this level, we'd better not have any special care about the in as the
> > ack.
>
> We need to care about it. If a property has not been updated in the
> vdpa device (it returned VIRTIO_NET_ERR), we must not update the
> device model.

Yes, but what I meant is at the level of bouncing itself. If we met
VIRTIO_NET_ERR, we should propagate it to the guest as well?

Thanks

>
> We can move the processing from vhost_vdpa_net_cvq_add to
> vhost_vdpa_net_load and vhost_vdpa_net_handle_ctrl_avail, but the code
> gets duplicated then.
>
> > And we need do bouncing:
> >
> > 1) for out buffer, during map
> > 2) for in buffer during unmap
> >
>
> We can move the copy of the in_buffer to the unmap for sure.
>
> Thanks!
>
> > Thanks
> >
> >
> > >   {
> > >       size_t in_copied;
> > >       bool ok;
> > >
> > > -    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> > > +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
> > >                                   vhost_vdpa_net_cvq_cmd_len(),
> > >                                   s->cvq_cmd_out_buffer, out_len, false);
> > >       if (unlikely(!ok)) {
> > > @@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> > >       };
> > >       bool ok;
> > >
> > > -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
> > > +    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
> > > +                                   &dev_buffers[0].iov_len);
> > >       if (unlikely(!ok)) {
> > >           goto out;
> > >       }
> >
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-04  7:51       ` Jason Wang
@ 2022-08-04  8:19         ` Eugenio Perez Martin
  2022-08-04  8:51           ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Eugenio Perez Martin @ 2022-08-04  8:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 9:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Thu, Aug 4, 2022 at 6:17 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > > > So its generic enough to accept any out sg buffer and we can inject
> > > > NIC state messages.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > > v5: Accept out sg instead of dev_buffers[]
> > > > ---
> > > >   net/vhost-vdpa.c | 13 +++++++------
> > > >   1 file changed, 7 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > index 33bf3d6409..2421bca347 100644
> > > > --- a/net/vhost-vdpa.c
> > > > +++ b/net/vhost-vdpa.c
> > > > @@ -302,16 +302,16 @@ dma_map_err:
> > > >   }
> > > >
> > > >   /**
> > > > - * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> > > > + * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
> > > >    */
> > > > -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
> > > > -                                        VirtQueueElement *elem,
> > > > -                                        size_t *out_len)
> > > > +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
> > > > +                                      const struct iovec *out, size_t out_num,
> > > > +                                      size_t *out_len)
> > >
> > >
> > > This still looks not genreal as there's no guarantee that we won't have
> > > command-in-specific-data. One example is that Ali is working on the
> > > virtio-net statistics fetching from the control virtqueue.
> > >
> > > So it looks to me we'd better have a general bounce_map here that accepts:
> > >
> > > 1) out_sg and out_num
> > > 2) in_sg and in_num
> > >
> >
> > We don't need to pass in_sg for that: The only useful information is
> > its size.
>
> What if we support stats in the future where it extends the ctrl command:
>
>          u8 command;
>          u8 command-specific-data[];
>          u8 ack;
> +        u8 command-specific-data-reply[];
>
> in
>
> https://lists.oasis-open.org/archives/virtio-dev/202203/msg00000.html
>

The guest will expose an in descriptor in whatever layout it wants.
QEMU reads its layout into a VirtQueueElement in_num and in_sg
members, in qemu's (and SVQ) address space.

Since we don't want the guest to be able to modify the in buffer
maliciously, we offer the device a bounce in buffer. Since the in
buffer still contains no information, we only need its size, so we can
"allocate and map" an equivalent one for the device and memset to 0.
For simplicity, we allocate one page, so no need for iovec
complexities.

After the device has written in it, we get the written len and verify
the information. If VIRTIO_NET_OK, we copy that to the guest's in
buffer, using the iov_from_buf right after out: tag at
vhost_vdpa_net_handle_ctrl_avail. Instead of copy from the stack
"(status, sizeof(status))" variable, we copy from
(s->cvq_cmd_in_buffer, written_len).

Note that this is still not enough for stats. We also need a way to:
* Update the virtio-net device model stats. virtio_net_handle_ctrl_iov
would try to write the virtio-net stats to in buffer, not to update
the device model stat.
* Update the stats on the destination. Another ctrl command? Intercept
them via svq and simply sum source stats + current device stats? I'd
say the second is better as it saves effort to the device, but maybe
it's not.

That's why I think this command should be left out at the moment, to
do the modifications is not hard but we should agree on how to do them
first.

> > Since the exposed buffer is an in one, it's enough to expose
> > the s->cvq_cmd_in_buffer buffer in full. The caller already knows the
> > device will write to it, so the only missing piece is to return the
> > written length at vhost_vdpa_net_cvq_add.
> >
> > Is one page the right buffer size for the in buffer?
>
> We can start from this.
>
> > Is it worth
> > worrying about it before implementing the stat control command in qemu
> > virtio-net?
>
> If it's not complex, it's better to do that from the beginning,
> otherwise the user may be surprised and we need extra work. Anyhow, we
> should support at least ack which is an in_sg.
>
> >
> > > In this level, we'd better not have any special care about the in as the
> > > ack.
> >
> > We need to care about it. If a property has not been updated in the
> > vdpa device (it returned VIRTIO_NET_ERR), we must not update the
> > device model.
>
> Yes, but what I meant is at the level of bouncing itself. If we met
> VIRTIO_NET_ERR, we should propagate it to the guest as well?
>

Yes we have, if not the guest thinks the command succeeds. Isn't it?

Thanks!

> Thanks
>
> >
> > We can move the processing from vhost_vdpa_net_cvq_add to
> > vhost_vdpa_net_load and vhost_vdpa_net_handle_ctrl_avail, but the code
> > gets duplicated then.
> >
> > > And we need do bouncing:
> > >
> > > 1) for out buffer, during map
> > > 2) for in buffer during unmap
> > >
> >
> > We can move the copy of the in_buffer to the unmap for sure.
> >
> > Thanks!
> >
> > > Thanks
> > >
> > >
> > > >   {
> > > >       size_t in_copied;
> > > >       bool ok;
> > > >
> > > > -    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> > > > +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
> > > >                                   vhost_vdpa_net_cvq_cmd_len(),
> > > >                                   s->cvq_cmd_out_buffer, out_len, false);
> > > >       if (unlikely(!ok)) {
> > > > @@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> > > >       };
> > > >       bool ok;
> > > >
> > > > -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
> > > > +    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
> > > > +                                   &dev_buffers[0].iov_len);
> > > >       if (unlikely(!ok)) {
> > > >           goto out;
> > > >       }
> > >
> >
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-04  8:19         ` Eugenio Perez Martin
@ 2022-08-04  8:51           ` Jason Wang
  2022-08-04 11:21             ` Eugenio Perez Martin
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2022-08-04  8:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 4:19 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 9:51 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Thu, Aug 4, 2022 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Thu, Aug 4, 2022 at 6:17 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > > > > So its generic enough to accept any out sg buffer and we can inject
> > > > > NIC state messages.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > > v5: Accept out sg instead of dev_buffers[]
> > > > > ---
> > > > >   net/vhost-vdpa.c | 13 +++++++------
> > > > >   1 file changed, 7 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > index 33bf3d6409..2421bca347 100644
> > > > > --- a/net/vhost-vdpa.c
> > > > > +++ b/net/vhost-vdpa.c
> > > > > @@ -302,16 +302,16 @@ dma_map_err:
> > > > >   }
> > > > >
> > > > >   /**
> > > > > - * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> > > > > + * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
> > > > >    */
> > > > > -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
> > > > > -                                        VirtQueueElement *elem,
> > > > > -                                        size_t *out_len)
> > > > > +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
> > > > > +                                      const struct iovec *out, size_t out_num,
> > > > > +                                      size_t *out_len)
> > > >
> > > >
> > > > This still looks not genreal as there's no guarantee that we won't have
> > > > command-in-specific-data. One example is that Ali is working on the
> > > > virtio-net statistics fetching from the control virtqueue.
> > > >
> > > > So it looks to me we'd better have a general bounce_map here that accepts:
> > > >
> > > > 1) out_sg and out_num
> > > > 2) in_sg and in_num
> > > >
> > >
> > > We don't need to pass in_sg for that: The only useful information is
> > > its size.
> >
> > What if we support stats in the future where it extends the ctrl command:
> >
> >          u8 command;
> >          u8 command-specific-data[];
> >          u8 ack;
> > +        u8 command-specific-data-reply[];
> >
> > in
> >
> > https://lists.oasis-open.org/archives/virtio-dev/202203/msg00000.html
> >
>
> The guest will expose an in descriptor in whatever layout it wants.
> QEMU reads its layout into a VirtQueueElement in_num and in_sg
> members, in qemu's (and SVQ) address space.

Yes, but current code did:

1) map seems based on sg but not unmap, result an non-symmetry API
2) NULL is passed to vhost_vdpa_cvq_map_buf()

    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0,
                                sizeof(virtio_net_ctrl_ack),
                                s->cvq_cmd_in_buffer, &in_copied, true);
    if (unlikely(!ok)) {
        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
        return false;
    }

So NULL to be used for iov_to_buf() which is tricky (not even sure it can work).

And this won't work for commands that have in-data in the future.

>
> Since we don't want the guest to be able to modify the in buffer
> maliciously, we offer the device a bounce in buffer. Since the in
> buffer still contains no information, we only need its size, so we can
> "allocate and map" an equivalent one for the device and memset to 0.
> For simplicity, we allocate one page, so no need for iovec
> complexities.
>
> After the device has written in it, we get the written len and verify
> the information. If VIRTIO_NET_OK, we copy that to the guest's in
> buffer, using the iov_from_buf right after out: tag at
> vhost_vdpa_net_handle_ctrl_avail. Instead of copy from the stack
> "(status, sizeof(status))" variable, we copy from
> (s->cvq_cmd_in_buffer, written_len).

Another asymmetry, the iov_to_buf() was hidden in map() but the
iov_from_buf() was exposed to the vhost_vdpa_net_handle_ctrl_avail.
I'd suggest tweaking the code otherwise it's not easy to read and
maintain:

map_sg()
validate_cmd()
add_to_svq()
handle_ctrl()
unmap_sg()

Or since we know the location of the bounce buffer, we can simply do
bouncing/iov_from_buf() explicitly before map_sg().

>
> Note that this is still not enough for stats. We also need a way to:
> * Update the virtio-net device model stats. virtio_net_handle_ctrl_iov
> would try to write the virtio-net stats to in buffer, not to update
> the device model stat.
> * Update the stats on the destination. Another ctrl command? Intercept
> them via svq and simply sum source stats + current device stats? I'd
> say the second is better as it saves effort to the device, but maybe
> it's not.

It should be sufficient to maintain a delta for each counter?

Thanks

>
> That's why I think this command should be left out at the moment, to
> do the modifications is not hard but we should agree on how to do them
> first.
>
> > > Since the exposed buffer is an in one, it's enough to expose
> > > the s->cvq_cmd_in_buffer buffer in full. The caller already knows the
> > > device will write to it, so the only missing piece is to return the
> > > written length at vhost_vdpa_net_cvq_add.
> > >
> > > Is one page the right buffer size for the in buffer?
> >
> > We can start from this.
> >
> > > Is it worth
> > > worrying about it before implementing the stat control command in qemu
> > > virtio-net?
> >
> > If it's not complex, it's better to do that from the beginning,
> > otherwise the user may be surprised and we need extra work. Anyhow, we
> > should support at least ack which is an in_sg.
> >
> > >
> > > > In this level, we'd better not have any special care about the in as the
> > > > ack.
> > >
> > > We need to care about it. If a property has not been updated in the
> > > vdpa device (it returned VIRTIO_NET_ERR), we must not update the
> > > device model.
> >
> > Yes, but what I meant is at the level of bouncing itself. If we met
> > VIRTIO_NET_ERR, we should propagate it to the guest as well?
> >
>
> Yes we have, if not the guest thinks the command succeeds. Isn't it?
>
> Thanks!
>
> > Thanks
> >
> > >
> > > We can move the processing from vhost_vdpa_net_cvq_add to
> > > vhost_vdpa_net_load and vhost_vdpa_net_handle_ctrl_avail, but the code
> > > gets duplicated then.
> > >
> > > > And we need do bouncing:
> > > >
> > > > 1) for out buffer, during map
> > > > 2) for in buffer during unmap
> > > >
> > >
> > > We can move the copy of the in_buffer to the unmap for sure.
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > >
> > > > >   {
> > > > >       size_t in_copied;
> > > > >       bool ok;
> > > > >
> > > > > -    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> > > > > +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
> > > > >                                   vhost_vdpa_net_cvq_cmd_len(),
> > > > >                                   s->cvq_cmd_out_buffer, out_len, false);
> > > > >       if (unlikely(!ok)) {
> > > > > @@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> > > > >       };
> > > > >       bool ok;
> > > > >
> > > > > -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
> > > > > +    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
> > > > > +                                   &dev_buffers[0].iov_len);
> > > > >       if (unlikely(!ok)) {
> > > > >           goto out;
> > > > >       }
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg
  2022-08-04  8:51           ` Jason Wang
@ 2022-08-04 11:21             ` Eugenio Perez Martin
  0 siblings, 0 replies; 26+ messages in thread
From: Eugenio Perez Martin @ 2022-08-04 11:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 10:52 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 4:19 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Thu, Aug 4, 2022 at 9:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Thu, Aug 4, 2022 at 3:39 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Thu, Aug 4, 2022 at 6:17 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > >
> > > > > 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > > > > > So its generic enough to accept any out sg buffer and we can inject
> > > > > > NIC state messages.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > ---
> > > > > > v5: Accept out sg instead of dev_buffers[]
> > > > > > ---
> > > > > >   net/vhost-vdpa.c | 13 +++++++------
> > > > > >   1 file changed, 7 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > > index 33bf3d6409..2421bca347 100644
> > > > > > --- a/net/vhost-vdpa.c
> > > > > > +++ b/net/vhost-vdpa.c
> > > > > > @@ -302,16 +302,16 @@ dma_map_err:
> > > > > >   }
> > > > > >
> > > > > >   /**
> > > > > > - * Copy the guest element into a dedicated buffer suitable to be sent to NIC
> > > > > > + * Maps out sg and in buffer into dedicated buffers suitable to be sent to NIC
> > > > > >    */
> > > > > > -static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
> > > > > > -                                        VirtQueueElement *elem,
> > > > > > -                                        size_t *out_len)
> > > > > > +static bool vhost_vdpa_net_cvq_map_sg(VhostVDPAState *s,
> > > > > > +                                      const struct iovec *out, size_t out_num,
> > > > > > +                                      size_t *out_len)
> > > > >
> > > > >
> > > > > This still looks not genreal as there's no guarantee that we won't have
> > > > > command-in-specific-data. One example is that Ali is working on the
> > > > > virtio-net statistics fetching from the control virtqueue.
> > > > >
> > > > > So it looks to me we'd better have a general bounce_map here that accepts:
> > > > >
> > > > > 1) out_sg and out_num
> > > > > 2) in_sg and in_num
> > > > >
> > > >
> > > > We don't need to pass in_sg for that: The only useful information is
> > > > its size.
> > >
> > > What if we support stats in the future where it extends the ctrl command:
> > >
> > >          u8 command;
> > >          u8 command-specific-data[];
> > >          u8 ack;
> > > +        u8 command-specific-data-reply[];
> > >
> > > in
> > >
> > > https://lists.oasis-open.org/archives/virtio-dev/202203/msg00000.html
> > >
> >
> > The guest will expose an in descriptor in whatever layout it wants.
> > QEMU reads its layout into a VirtQueueElement in_num and in_sg
> > members, in qemu's (and SVQ) address space.
>
> Yes, but current code did:
>
> 1) map seems based on sg but not unmap, result an non-symmetry API

Ok I get this part now. More on this below.

> 2) NULL is passed to vhost_vdpa_cvq_map_buf()
>
>     ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0,
>                                 sizeof(virtio_net_ctrl_ack),
>                                 s->cvq_cmd_in_buffer, &in_copied, true);
>     if (unlikely(!ok)) {
>         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>         return false;
>     }
>
> So NULL to be used for iov_to_buf() which is tricky (not even sure it can work).
>

We can add a conditional to be sure.

> And this won't work for commands that have in-data in the future.
>
> >
> > Since we don't want the guest to be able to modify the in buffer
> > maliciously, we offer the device a bounce in buffer. Since the in
> > buffer still contains no information, we only need its size, so we can
> > "allocate and map" an equivalent one for the device and memset to 0.
> > For simplicity, we allocate one page, so no need for iovec
> > complexities.
> >
> > After the device has written in it, we get the written len and verify
> > the information. If VIRTIO_NET_OK, we copy that to the guest's in
> > buffer, using the iov_from_buf right after out: tag at
> > vhost_vdpa_net_handle_ctrl_avail. Instead of copy from the stack
> > "(status, sizeof(status))" variable, we copy from
> > (s->cvq_cmd_in_buffer, written_len).
>
> Another asymmetry, the iov_to_buf() was hidden in map() but the
> iov_from_buf() was exposed to the vhost_vdpa_net_handle_ctrl_avail.
> I'd suggest tweaking the code otherwise it's not easy to read and
> maintain:
>
> map_sg()
> validate_cmd()
> add_to_svq()
> handle_ctrl()
> unmap_sg()
>
> Or since we know the location of the bounce buffer, we can simply do
> bouncing/iov_from_buf() explicitly before map_sg().
>

The main target of creating this helper is to avoid duplicating the
code (especially the fallback on error paths) between the code that
bounces the guest's buffers and the code that sends the commands on
the net device load.

While what you propose is possible, maybe it's better to simply map
the buffer at the beginning and at the end of the device cycle at
last. To do so, I'll bring the prepare() callback from asid and I'll
create a new stop() callback right after calling vhost_dev_stop.

> >
> > Note that this is still not enough for stats. We also need a way to:
> > * Update the virtio-net device model stats. virtio_net_handle_ctrl_iov
> > would try to write the virtio-net stats to in buffer, not to update
> > the device model stat.
> > * Update the stats on the destination. Another ctrl command? Intercept
> > them via svq and simply sum source stats + current device stats? I'd
> > say the second is better as it saves effort to the device, but maybe
> > it's not.
>
> It should be sufficient to maintain a delta for each counter?
>

I think it's the best approach for 2, but we still have to develop for 1st.

Thanks!

> Thanks
>
> >
> > That's why I think this command should be left out at the moment, to
> > do the modifications is not hard but we should agree on how to do them
> > first.
> >
> > > > Since the exposed buffer is an in one, it's enough to expose
> > > > the s->cvq_cmd_in_buffer buffer in full. The caller already knows the
> > > > device will write to it, so the only missing piece is to return the
> > > > written length at vhost_vdpa_net_cvq_add.
> > > >
> > > > Is one page the right buffer size for the in buffer?
> > >
> > > We can start from this.
> > >
> > > > Is it worth
> > > > worrying about it before implementing the stat control command in qemu
> > > > virtio-net?
> > >
> > > If it's not complex, it's better to do that from the beginning,
> > > otherwise the user may be surprised and we need extra work. Anyhow, we
> > > should support at least ack which is an in_sg.
> > >
> > > >
> > > > > In this level, we'd better not have any special care about the in as the
> > > > > ack.
> > > >
> > > > We need to care about it. If a property has not been updated in the
> > > > vdpa device (it returned VIRTIO_NET_ERR), we must not update the
> > > > device model.
> > >
> > > Yes, but what I meant is at the level of bouncing itself. If we met
> > > VIRTIO_NET_ERR, we should propagate it to the guest as well?
> > >
> >
> > Yes we have, if not the guest thinks the command succeeds. Isn't it?
> >
> > Thanks!
> >
> > > Thanks
> > >
> > > >
> > > > We can move the processing from vhost_vdpa_net_cvq_add to
> > > > vhost_vdpa_net_load and vhost_vdpa_net_handle_ctrl_avail, but the code
> > > > gets duplicated then.
> > > >
> > > > > And we need do bouncing:
> > > > >
> > > > > 1) for out buffer, during map
> > > > > 2) for in buffer during unmap
> > > > >
> > > >
> > > > We can move the copy of the in_buffer to the unmap for sure.
> > > >
> > > > Thanks!
> > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >   {
> > > > > >       size_t in_copied;
> > > > > >       bool ok;
> > > > > >
> > > > > > -    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
> > > > > > +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, out, out_num,
> > > > > >                                   vhost_vdpa_net_cvq_cmd_len(),
> > > > > >                                   s->cvq_cmd_out_buffer, out_len, false);
> > > > > >       if (unlikely(!ok)) {
> > > > > > @@ -435,7 +435,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
> > > > > >       };
> > > > > >       bool ok;
> > > > > >
> > > > > > -    ok = vhost_vdpa_net_cvq_map_elem(s, elem, &dev_buffers[0].iov_len);
> > > > > > +    ok = vhost_vdpa_net_cvq_map_sg(s, elem->out_sg, elem->out_num,
> > > > > > +                                   &dev_buffers[0].iov_len);
> > > > > >       if (unlikely(!ok)) {
> > > > > >           goto out;
> > > > > >       }
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ
  2022-08-04  4:21 ` [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Jason Wang
@ 2022-08-04 18:17   ` Eugenio Perez Martin
  0 siblings, 0 replies; 26+ messages in thread
From: Eugenio Perez Martin @ 2022-08-04 18:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-level, Eli Cohen, Stefano Garzarella, Parav Pandit,
	Markus Armbruster, Gautam Dawar, Stefan Hajnoczi,
	Harpreet Singh Anand, Gonglei (Arei),
	Paolo Bonzini, Eric Blake, Michael S. Tsirkin, Laurent Vivier,
	Cornelia Huck, Cindy Lu, Liuxiangdong, Zhu Lingshan

On Thu, Aug 4, 2022 at 6:21 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/8/3 01:57, Eugenio Pérez 写道:
> > CVQ of net vhost-vdpa devices can be intercepted since the work of [1]. The
> > virtio-net device model is updated. The migration was blocked because although
> > the state can be megrated between VMM it was not possible to restore on the
> > destination NIC.
> >
> > This series add support for SVQ to inject external messages without the guest's
> > knowledge, so before the guest is resumed all the guest visible state is
> > restored. It is done using standard CVQ messages, so the vhost-vdpa device does
> > not need to learn how to restore it: As long as they have the feature, they
> > know how to handle it.
> >
> > This series needs fixes [1], [2] and [3] to be applied to achieve full live
> > migration.
> >
> > Thanks!
> >
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02984.html
> > [2] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg03993.html
>
>
> Note that the above has been merged into master.
>
> And the series looks good overall, just some comments to make the code
> easier to be read and maintained in the future.
>

I think I addressed all of them, plus some others that were decided to
leave for later. We can revert them if it's not fine.

Sending a new version.

Thanks!



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-08-04 18:24 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02 17:57 [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Eugenio Pérez
2022-08-02 17:57 ` [PATCH v5 01/10] vhost: stop transfer elem ownership in vhost_handle_guest_kick Eugenio Pérez
2022-08-02 17:57 ` [PATCH v5 02/10] vhost: use SVQ element ndescs instead of opaque data for desc validation Eugenio Pérez
2022-08-04  3:01   ` Jason Wang
2022-08-04  5:56     ` Eugenio Perez Martin
2022-08-04  7:28       ` Jason Wang
2022-08-02 17:57 ` [PATCH v5 03/10] vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush Eugenio Pérez
2022-08-04  3:14   ` Jason Wang
2022-08-04  6:21     ` Eugenio Perez Martin
2022-08-04  7:30       ` Jason Wang
2022-08-02 17:57 ` [PATCH v5 04/10] vdpa: Get buffers from VhostVDPAState on vhost_vdpa_net_cvq_map_elem Eugenio Pérez
2022-08-04  4:01   ` Jason Wang
2022-08-02 17:57 ` [PATCH v5 05/10] vdpa: Extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail Eugenio Pérez
2022-08-02 17:57 ` [PATCH v5 06/10] vdpa: Make vhost_vdpa_net_cvq_map_elem accept any out sg Eugenio Pérez
2022-08-04  4:16   ` Jason Wang
2022-08-04  7:38     ` Eugenio Perez Martin
2022-08-04  7:51       ` Jason Wang
2022-08-04  8:19         ` Eugenio Perez Martin
2022-08-04  8:51           ` Jason Wang
2022-08-04 11:21             ` Eugenio Perez Martin
2022-08-02 17:57 ` [PATCH v5 07/10] vdpa: add NetClientState->load() callback Eugenio Pérez
2022-08-02 17:57 ` [PATCH v5 08/10] vdpa: add net_vhost_vdpa_cvq_info NetClientInfo Eugenio Pérez
2022-08-02 17:57 ` [PATCH v5 09/10] vdpa: Add virtio-net mac address via CVQ at start Eugenio Pérez
2022-08-02 17:57 ` [PATCH v5 10/10] vdpa: Delete CVQ migration blocker Eugenio Pérez
2022-08-04  4:21 ` [PATCH v5 00/10] NIC vhost-vdpa state restore via Shadow CVQ Jason Wang
2022-08-04 18:17   ` Eugenio Perez Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.