All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ
@ 2022-12-05 17:04 Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 01/13] vhost: add available descriptor list in SVQ Eugenio Pérez
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

The state of the descriptors (avail or used) may not be recoverable just
looking at the guest memory.  Out of order used descriptor may override
previous avail ones in the descriptor table or avail vring.

Currently we're not migrating this status in net devices because virtio-net,
vhost-kernel etc use the descriptors in order, so the information always
recoverable from guest's memory.  However, vDPA devices may use them out of
order, and other kind of devices like block need this support.

Shadow virtqueue is able to track these and resend them at the destination.
Add them to the virtio-net migration description so they are not lose in the
process.

This is a very early RFC just to validate the first draft so expect leftovers.
To fetch and request the descriptors from a device without SVQ need to be
implemented on top. Some other notable pending items are:
* Do not send the descriptors actually recoverable from the guest memory.
* Properly version the migrate data.
* Properly abstract the descriptors access from virtio-net to SVQ.
* Do not use VirtQueueElementOld but migrate directly VirtQueueElement.
* Replace lots of assertions with runtime conditionals.
* Other TODOs in the patch message or code changes.

Thanks.

Eugenio Pérez (13):
  vhost: add available descriptor list in SVQ
  vhost: iterate only available descriptors at SVQ stop
  vhost: merge avail list and next avail descriptors detach
  vhost: add vhost_svq_save_inflight
  virtio: Specify uint32_t as VirtQueueElementOld members type
  virtio: refactor qemu_get_virtqueue_element
  virtio: refactor qemu_put_virtqueue_element
  virtio: expose VirtQueueElementOld
  virtio: add vmstate_virtqueue_element_old
  virtio-net: Migrate vhost inflight descriptors
  virtio-net: save inflight descriptors at vhost shutdown
  vhost: expose vhost_svq_add_element
  vdpa: Recover inflight descriptors

 hw/virtio/vhost-shadow-virtqueue.h |   9 ++
 include/hw/virtio/virtio-net.h     |   2 +
 include/hw/virtio/virtio.h         |  32 ++++++
 include/migration/vmstate.h        |  22 ++++
 hw/net/vhost_net.c                 |  56 ++++++++++
 hw/net/virtio-net.c                | 129 +++++++++++++++++++++++
 hw/virtio/vhost-shadow-virtqueue.c |  52 +++++++--
 hw/virtio/vhost-vdpa.c             |  11 --
 hw/virtio/virtio.c                 | 162 ++++++++++++++++++-----------
 9 files changed, 392 insertions(+), 83 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 01/13] vhost: add available descriptor list in SVQ
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 02/13] vhost: iterate only available descriptors at SVQ stop Eugenio Pérez
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

This helps to track the inflight buffers, make easier to transverse all
of them, and return them in order.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 6 ++++++
 hw/virtio/vhost-shadow-virtqueue.c | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index d04c34a589..a01a7d4a18 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -23,6 +23,9 @@ typedef struct SVQDescState {
      * guest's
      */
     unsigned int ndescs;
+
+    /* List to save or free inflight descriptors */
+    QTAILQ_ENTRY(SVQDescState) entry;
 } SVQDescState;
 
 typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
@@ -81,6 +84,9 @@ typedef struct VhostShadowVirtqueue {
     /* SVQ vring descriptors state */
     SVQDescState *desc_state;
 
+    /* Linked list to follow avail descriptors */
+    QTAILQ_HEAD(, SVQDescState) desc_state_avail;
+
     /* Next VirtQueue element that guest made available */
     VirtQueueElement *next_guest_avail_elem;
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 5bd14cad96..0da72cb0ec 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -265,6 +265,8 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
 
     svq->desc_state[qemu_head].elem = elem;
     svq->desc_state[qemu_head].ndescs = ndescs;
+    QTAILQ_INSERT_TAIL(&svq->desc_state_avail, &svq->desc_state[qemu_head],
+                       entry);
     vhost_svq_kick(svq);
     return 0;
 }
@@ -451,6 +453,8 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
     svq->free_head = used_elem.id;
 
     *len = used_elem.len;
+    QTAILQ_REMOVE(&svq->desc_state_avail, &svq->desc_state[used_elem.id],
+                  entry);
     return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
 }
 
@@ -665,6 +669,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
     memset(svq->vring.used, 0, device_size);
     svq->desc_state = g_new0(SVQDescState, svq->vring.num);
+    QTAILQ_INIT(&svq->desc_state_avail);
     svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
         svq->desc_next[i] = cpu_to_le16(i + 1);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 02/13] vhost: iterate only available descriptors at SVQ stop
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 01/13] vhost: add available descriptor list in SVQ Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 03/13] vhost: merge avail list and next avail descriptors detach Eugenio Pérez
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

While we're at it, simplify the free path making just transverse the
list instead of all of them.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 0da72cb0ec..1bda8ca4bf 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -692,12 +692,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
     /* Send all pending used descriptors to guest */
     vhost_svq_flush(svq, false);
 
-    for (unsigned i = 0; i < svq->vring.num; ++i) {
+    while (!QTAILQ_EMPTY(&svq->desc_state_avail)) {
+        SVQDescState *s = QTAILQ_FIRST(&svq->desc_state_avail);
         g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->desc_state[i].elem);
-        if (elem) {
-            virtqueue_detach_element(svq->vq, elem, 0);
-        }
+
+        elem = g_steal_pointer(&s->elem);
+        virtqueue_detach_element(svq->vq, elem, 0);
+        QTAILQ_REMOVE(&svq->desc_state_avail, s, entry);
     }
 
     next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 03/13] vhost: merge avail list and next avail descriptors detach
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 01/13] vhost: add available descriptor list in SVQ Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 02/13] vhost: iterate only available descriptors at SVQ stop Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 04/13] vhost: add vhost_svq_save_inflight Eugenio Pérez
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

We need to perform the same actions for the two.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 1bda8ca4bf..e50bfba6dc 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -692,18 +692,18 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
     /* Send all pending used descriptors to guest */
     vhost_svq_flush(svq, false);
 
-    while (!QTAILQ_EMPTY(&svq->desc_state_avail)) {
+    while (!QTAILQ_EMPTY(&svq->desc_state_avail)
+           || svq->next_guest_avail_elem) {
         SVQDescState *s = QTAILQ_FIRST(&svq->desc_state_avail);
         g_autofree VirtQueueElement *elem = NULL;
 
-        elem = g_steal_pointer(&s->elem);
-        virtqueue_detach_element(svq->vq, elem, 0);
-        QTAILQ_REMOVE(&svq->desc_state_avail, s, entry);
-    }
-
-    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
-    if (next_avail_elem) {
-        virtqueue_detach_element(svq->vq, next_avail_elem, 0);
+        elem = g_steal_pointer(s ? &s->elem : &svq->next_guest_avail_elem);
+        if (elem) {
+            virtqueue_detach_element(svq->vq, elem, 0);
+        }
+        if (s) {
+            QTAILQ_REMOVE(&svq->desc_state_avail, s, entry);
+        }
     }
     svq->vq = NULL;
     g_free(svq->desc_next);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 04/13] vhost: add vhost_svq_save_inflight
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 03/13] vhost: merge avail list and next avail descriptors detach Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 05/13] virtio: Specify uint32_t as VirtQueueElementOld members type Eugenio Pérez
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

This allows SVQ to return all the inflight descriptors in order.

TODO: Rewind descriptors that can be fetched from avail ring again.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index a01a7d4a18..e82da4b55c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -133,6 +133,8 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
 
 void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
+VirtQueueElement **vhost_svq_save_inflight(VhostShadowVirtqueue *svq,
+                                           uint32_t *num);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index e50bfba6dc..029ccee957 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -676,6 +676,26 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     }
 }
 
+VirtQueueElement **vhost_svq_save_inflight(VhostShadowVirtqueue *svq,
+                                           uint32_t *num)
+{
+    GPtrArray *aret = g_ptr_array_new();
+    SVQDescState *s, *tmp;
+
+    QTAILQ_FOREACH_SAFE(s, &svq->desc_state_avail, entry, tmp) {
+        VirtQueueElement *elem = s->elem;
+
+        virtqueue_detach_element(svq->vq, elem, 0);
+        g_ptr_array_add(aret, elem);
+        QTAILQ_REMOVE(&svq->desc_state_avail, s, entry);
+    }
+
+    *num = aret->len;
+
+    /* TODO: return g_ptr_array_steal(aret) when min glib >= 2.64 */
+    return (void *)g_ptr_array_free(aret, false);
+}
+
 /**
  * Stop the shadow virtqueue operation.
  * @svq: Shadow Virtqueue
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 05/13] virtio: Specify uint32_t as VirtQueueElementOld members type
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 04/13] vhost: add vhost_svq_save_inflight Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 06/13] virtio: refactor qemu_get_virtqueue_element Eugenio Pérez
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

This struct will be the one used as a base to serialize and deserialize
the virtqueue elements within this RFC. The VMState fields are way more
simpler using this struct.

TODO: Use VirtQueueElement to serialize and deserialize inflight
elements.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/virtio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index eb6347ab5d..6efff3d441 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2288,9 +2288,9 @@ unsigned int virtqueue_drop_all(VirtQueue *vq)
  * used before the change.
  */
 typedef struct VirtQueueElementOld {
-    unsigned int index;
-    unsigned int out_num;
-    unsigned int in_num;
+    uint32_t index;
+    uint32_t out_num;
+    uint32_t in_num;
     hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
     hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
     struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 06/13] virtio: refactor qemu_get_virtqueue_element
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 05/13] virtio: Specify uint32_t as VirtQueueElementOld members type Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 07/13] virtio: refactor qemu_put_virtqueue_element Eugenio Pérez
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

The core of the function is useful to transform from VirtQueueElementOld
to VirtQueueElement. Extract from qemu_get_virtqueue_element, and leave
there the handling of QEMUFile.

No functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/virtio.c | 68 ++++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6efff3d441..b0245ce4e8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2297,50 +2297,66 @@ typedef struct VirtQueueElementOld {
     struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
 } VirtQueueElementOld;
 
-void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz)
+/* Convert VirtQueueElementOld to VirtQueueElement */
+static void *qemu_get_virtqueue_element_from_old(VirtIODevice *vdev,
+                                               const VirtQueueElementOld *data,
+                                               size_t sz)
 {
-    VirtQueueElement *elem;
-    VirtQueueElementOld data;
-    int i;
-
-    qemu_get_buffer(f, (uint8_t *)&data, sizeof(VirtQueueElementOld));
-
-    /* TODO: teach all callers that this can fail, and return failure instead
-     * of asserting here.
-     * This is just one thing (there are probably more) that must be
-     * fixed before we can allow NDEBUG compilation.
-     */
-    assert(ARRAY_SIZE(data.in_addr) >= data.in_num);
-    assert(ARRAY_SIZE(data.out_addr) >= data.out_num);
-
-    elem = virtqueue_alloc_element(sz, data.out_num, data.in_num);
-    elem->index = data.index;
+    VirtQueueElement *elem = virtqueue_alloc_element(sz, data->out_num,
+                                                     data->in_num);
+    elem->index = data->index;
 
-    for (i = 0; i < elem->in_num; i++) {
-        elem->in_addr[i] = data.in_addr[i];
+    for (uint16_t i = 0; i < elem->in_num; i++) {
+        elem->in_addr[i] = data->in_addr[i];
     }
 
-    for (i = 0; i < elem->out_num; i++) {
-        elem->out_addr[i] = data.out_addr[i];
+    for (uint16_t i = 0; i < elem->out_num; i++) {
+        elem->out_addr[i] = data->out_addr[i];
     }
 
-    for (i = 0; i < elem->in_num; i++) {
+    for (uint16_t i = 0; i < elem->in_num; i++) {
         /* Base is overwritten by virtqueue_map.  */
         elem->in_sg[i].iov_base = 0;
-        elem->in_sg[i].iov_len = data.in_sg[i].iov_len;
+        elem->in_sg[i].iov_len = data->in_sg[i].iov_len;
     }
 
-    for (i = 0; i < elem->out_num; i++) {
+    for (uint16_t i = 0; i < elem->out_num; i++) {
         /* Base is overwritten by virtqueue_map.  */
         elem->out_sg[i].iov_base = 0;
-        elem->out_sg[i].iov_len = data.out_sg[i].iov_len;
+        elem->out_sg[i].iov_len = data->out_sg[i].iov_len;
     }
 
+    virtqueue_map(vdev, elem);
+    return elem;
+}
+
+static bool vq_element_in_range(void *opaque, int version_id)
+{
+    VirtQueueElementOld *data = opaque;
+
+    return ARRAY_SIZE(data->in_addr) >= data->in_num &&
+           ARRAY_SIZE(data->out_addr) >= data->out_num;
+}
+
+void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz)
+{
+    VirtQueueElement *elem;
+    VirtQueueElementOld data;
+
+    qemu_get_buffer(f, (uint8_t *)&data, sizeof(VirtQueueElementOld));
+
+    /* TODO: teach all callers that this can fail, and return failure instead
+     * of asserting here.
+     * This is just one thing (there are probably more) that must be
+     * fixed before we can allow NDEBUG compilation.
+     */
+    assert(vq_element_in_range(&data, 0));
+
+    elem = qemu_get_virtqueue_element_from_old(vdev, &data, sz);
     if (virtio_host_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
         qemu_get_be32s(f, &elem->ndescs);
     }
 
-    virtqueue_map(vdev, elem);
     return elem;
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 07/13] virtio: refactor qemu_put_virtqueue_element
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 06/13] virtio: refactor qemu_get_virtqueue_element Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 08/13] virtio: expose VirtQueueElementOld Eugenio Pérez
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

The core of the function is useful to transform from VirtQueueElement
to VirtQueueElementOld. Extract these from qemu_put_virtqueue_element,
and leave there the handling of QEMUFile.

No functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/virtio.c | 45 ++++++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b0245ce4e8..bc3b474065 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2360,36 +2360,43 @@ void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz)
     return elem;
 }
 
-void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
-                                VirtQueueElement *elem)
+/* Convert VirtQueueElement to VirtQueueElementOld */
+static void qemu_put_virtqueue_element_old(const VirtQueueElement *elem,
+                                           VirtQueueElementOld *data)
 {
-    VirtQueueElementOld data;
-    int i;
+    memset(data, 0, sizeof(*data));
+    data->index = elem->index;
+    data->in_num = elem->in_num;
+    data->out_num = elem->out_num;
 
-    memset(&data, 0, sizeof(data));
-    data.index = elem->index;
-    data.in_num = elem->in_num;
-    data.out_num = elem->out_num;
-
-    for (i = 0; i < elem->in_num; i++) {
-        data.in_addr[i] = elem->in_addr[i];
+    for (int i = 0; i < elem->in_num; i++) {
+        data->in_addr[i] = elem->in_addr[i];
     }
 
-    for (i = 0; i < elem->out_num; i++) {
-        data.out_addr[i] = elem->out_addr[i];
+    for (int i = 0; i < elem->out_num; i++) {
+        data->out_addr[i] = elem->out_addr[i];
     }
 
-    for (i = 0; i < elem->in_num; i++) {
-        /* Base is overwritten by virtqueue_map when loading.  Do not
-         * save it, as it would leak the QEMU address space layout.  */
-        data.in_sg[i].iov_len = elem->in_sg[i].iov_len;
+    for (int i = 0; i < elem->in_num; i++) {
+        /*
+         * Base is overwritten by virtqueue_map when loading.  Do not
+         * save it, as it would leak the QEMU address space layout.
+         */
+        data->in_sg[i].iov_len = elem->in_sg[i].iov_len;
     }
 
-    for (i = 0; i < elem->out_num; i++) {
+    for (int i = 0; i < elem->out_num; i++) {
         /* Do not save iov_base as above.  */
-        data.out_sg[i].iov_len = elem->out_sg[i].iov_len;
+        data->out_sg[i].iov_len = elem->out_sg[i].iov_len;
     }
+}
+
+void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
+                                VirtQueueElement *elem)
+{
+    VirtQueueElementOld data;
 
+    qemu_put_virtqueue_element_old(elem, &data);
     if (virtio_host_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
         qemu_put_be32s(f, &elem->ndescs);
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 08/13] virtio: expose VirtQueueElementOld
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 07/13] virtio: refactor qemu_put_virtqueue_element Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 09/13] virtio: add vmstate_virtqueue_element_old Eugenio Pérez
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

We will convert between VirtQueueElement and VirtQueueElementOld to
migrate the inflight descriptors, so we need virtio-net to see these.

This will not be exported in the final version, but working with
VirtQueueElementOld is way more easier than with VirtQueueElement and
VMState macros.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio.h | 31 +++++++++++++++++++++++++++++++
 hw/virtio/virtio.c         | 27 +++++----------------------
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index acfd4df125..b4c5163fb0 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -75,6 +75,37 @@ typedef struct VirtQueueElement
     struct iovec *out_sg;
 } VirtQueueElement;
 
+/*
+ * Reading and writing a structure directly to QEMUFile is *awful*, but
+ * it is what QEMU has always done by mistake.  We can change it sooner
+ * or later by bumping the version number of the affected vm states.
+ * In the meanwhile, since the in-memory layout of VirtQueueElement
+ * has changed, we need to marshal to and from the layout that was
+ * used before the change.
+ */
+typedef struct VirtQueueElementOld {
+    uint32_t index;
+    uint32_t out_num;
+    uint32_t in_num;
+    hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
+    hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
+    /* Unions help to serialize the descriptor using VMStateDescription */
+    union {
+        struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
+        uint64_t in_sg_64[VIRTQUEUE_MAX_SIZE * 2];
+    };
+    union {
+        struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
+        uint64_t out_sg_64[VIRTQUEUE_MAX_SIZE * 2];
+    };
+} VirtQueueElementOld;
+
+void *qemu_get_virtqueue_element_from_old(VirtIODevice *vdev,
+                                          const VirtQueueElementOld *data,
+                                          size_t sz);
+void qemu_put_virtqueue_element_old(const VirtQueueElement *elem,
+                                    VirtQueueElementOld *data);
+
 #define VIRTIO_QUEUE_MAX 1024
 
 #define VIRTIO_NO_VECTOR 0xffff
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bc3b474065..5ddc49610c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2280,27 +2280,10 @@ unsigned int virtqueue_drop_all(VirtQueue *vq)
     }
 }
 
-/* Reading and writing a structure directly to QEMUFile is *awful*, but
- * it is what QEMU has always done by mistake.  We can change it sooner
- * or later by bumping the version number of the affected vm states.
- * In the meanwhile, since the in-memory layout of VirtQueueElement
- * has changed, we need to marshal to and from the layout that was
- * used before the change.
- */
-typedef struct VirtQueueElementOld {
-    uint32_t index;
-    uint32_t out_num;
-    uint32_t in_num;
-    hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
-    hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
-    struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
-    struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
-} VirtQueueElementOld;
-
 /* Convert VirtQueueElementOld to VirtQueueElement */
-static void *qemu_get_virtqueue_element_from_old(VirtIODevice *vdev,
-                                               const VirtQueueElementOld *data,
-                                               size_t sz)
+void *qemu_get_virtqueue_element_from_old(VirtIODevice *vdev,
+                                          const VirtQueueElementOld *data,
+                                          size_t sz)
 {
     VirtQueueElement *elem = virtqueue_alloc_element(sz, data->out_num,
                                                      data->in_num);
@@ -2361,8 +2344,8 @@ void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz)
 }
 
 /* Convert VirtQueueElement to VirtQueueElementOld */
-static void qemu_put_virtqueue_element_old(const VirtQueueElement *elem,
-                                           VirtQueueElementOld *data)
+void qemu_put_virtqueue_element_old(const VirtQueueElement *elem,
+                                    VirtQueueElementOld *data)
 {
     memset(data, 0, sizeof(*data));
     data->index = elem->index;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 09/13] virtio: add vmstate_virtqueue_element_old
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 08/13] virtio: expose VirtQueueElementOld Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors Eugenio Pérez
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

VMStateDescription of an inflight descriptor represented in
VirtQueueElementOld.

TODO: Convert to VirtQueueElement.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio.h  |  1 +
 include/migration/vmstate.h | 11 +++++++++++
 hw/virtio/virtio.c          | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b4c5163fb0..f3485e5748 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -105,6 +105,7 @@ void *qemu_get_virtqueue_element_from_old(VirtIODevice *vdev,
                                           size_t sz);
 void qemu_put_virtqueue_element_old(const VirtQueueElement *elem,
                                     VirtQueueElementOld *data);
+extern const VMStateDescription vmstate_virtqueue_element_old;
 
 #define VIRTIO_QUEUE_MAX 1024
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..9726d2d09e 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -448,6 +448,17 @@ extern const VMStateInfo vmstate_info_qlist;
     .offset     = vmstate_offset_varray(_state, _field, _type),      \
 }
 
+#define VMSTATE_VARRAY_UINT32_UNSAFE(_field, _state, _field_num, _version, \
+                                     _info, _type) {\
+    .name       = (stringify(_field)),                               \
+    .version_id = (_version),                                        \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint32_t),\
+    .info       = &(_info),                                          \
+    .size       = sizeof(_type),                                     \
+    .flags      = VMS_VARRAY_UINT32,                                 \
+    .offset     = vmstate_offset_varray(_state, _field, _type),      \
+}
+
 #define VMSTATE_VSTRUCT_TEST(_field, _state, _test, _version, _vmsd, _type, _struct_version) { \
     .name         = (stringify(_field)),                             \
     .version_id   = (_version),                                      \
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5ddc49610c..7936fcfec2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3165,6 +3165,38 @@ static bool virtio_disabled_needed(void *opaque)
     return vdev->disabled;
 }
 
+const VMStateDescription vmstate_virtqueue_element_old = {
+    .name = "virtqueue_element",
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(index, VirtQueueElementOld),
+        VMSTATE_UINT32(in_num, VirtQueueElementOld),
+        VMSTATE_UINT32(out_num, VirtQueueElementOld),
+        /*
+         * TODO: Needed for packed
+         * VMSTATE_UINT16(ndescs, VirtQueueElement),
+         */
+
+        VMSTATE_VALIDATE("fit", vq_element_in_range),
+        VMSTATE_VARRAY_UINT32_UNSAFE(in_addr, VirtQueueElementOld, in_num, 0,
+                                     vmstate_info_uint64, hwaddr),
+        VMSTATE_VARRAY_UINT32_UNSAFE(out_addr, VirtQueueElementOld, out_num, 0,
+                                     vmstate_info_uint64, hwaddr),
+
+        /*
+         * Assume iovec[n] == uint64_t[n*2]
+         * TODO: Probably need to send each field individually because of
+         * endianess.
+         */
+        VMSTATE_VARRAY_MULTIPLY(in_sg_64, VirtQueueElementOld, in_num,
+                                sizeof(struct iovec) / sizeof(uint64_t),
+                                vmstate_info_uint64, uint64_t),
+        VMSTATE_VARRAY_MULTIPLY(out_sg_64, VirtQueueElementOld, out_num,
+                                sizeof(struct iovec) / sizeof(uint64_t),
+                                vmstate_info_uint64, uint64_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_virtqueue = {
     .name = "virtqueue_state",
     .version_id = 1,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 09/13] virtio: add vmstate_virtqueue_element_old Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 20:52   ` Parav Pandit
  2022-12-06  3:24   ` Jason Wang
  2022-12-05 17:04 ` [RFC PATCH for 8.0 11/13] virtio-net: save inflight descriptors at vhost shutdown Eugenio Pérez
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

There is currently no data to be migrated, since nothing populates or
read the fields on virtio-net.

The migration of in-flight descriptors is modelled after the migration
of requests in virtio-blk. With some differences:
* virtio-blk migrates queue number on each request. Here we only add a
  vq if it has descriptors to migrate, and then we make all descriptors
  in an array.
* Use of QTAILQ since it works similar to signal the end of the inflight
  descriptors: 1 for more data, 0 if end. But do it for each vq instead
  of for each descriptor.
* Usage of VMState macros.

The fields of descriptors would be way more complicated if we use the
VirtQueueElements directly, since there would be a few levels of
indirections. Using VirtQueueElementOld for the moment, and migrate to
VirtQueueElement for the final patch.

TODO: Proper migration versioning
TODO: Do not embed vhost-vdpa structs
TODO: Migrate the VirtQueueElement, not VirtQueueElementOld.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio-net.h |   2 +
 include/migration/vmstate.h    |  11 +++
 hw/net/virtio-net.c            | 129 +++++++++++++++++++++++++++++++++
 3 files changed, 142 insertions(+)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index ef234ffe7e..ae7c017ef0 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -151,9 +151,11 @@ typedef struct VirtIONetQueue {
     QEMUTimer *tx_timer;
     QEMUBH *tx_bh;
     uint32_t tx_waiting;
+    uint32_t tx_inflight_num, rx_inflight_num;
     struct {
         VirtQueueElement *elem;
     } async_tx;
+    VirtQueueElement **tx_inflight, **rx_inflight;
     struct VirtIONet *n;
 } VirtIONetQueue;
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9726d2d09e..9e0dfef9ee 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -626,6 +626,17 @@ extern const VMStateInfo vmstate_info_qlist;
     .offset     = vmstate_offset_varray(_state, _field, _type),      \
 }
 
+#define VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(_field, _state, _field_num,        \
+                                           _version, _vmsd, _type) {          \
+    .name       = (stringify(_field)),                                        \
+    .version_id = (_version),                                                 \
+    .vmsd       = &(_vmsd),                                                   \
+    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),         \
+    .size       = sizeof(_type),                                              \
+    .flags      = VMS_STRUCT | VMS_VARRAY_UINT16 | VMS_ALLOC | VMS_POINTER,   \
+    .offset     = vmstate_offset_pointer(_state, _field, _type),              \
+}
+
 #define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
     .name       = (stringify(_field)),                               \
     .version_id = (_version),                                        \
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index aba12759d5..ffd7bf1fc7 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3077,6 +3077,13 @@ static bool mac_table_doesnt_fit(void *opaque, int version_id)
     return !mac_table_fits(opaque, version_id);
 }
 
+typedef struct VirtIONetInflightQueue {
+    uint16_t idx;
+    uint16_t num;
+    QTAILQ_ENTRY(VirtIONetInflightQueue) entry;
+    VirtQueueElementOld *elems;
+} VirtIONetInflightQueue;
+
 /* This temporary type is shared by all the WITH_TMP methods
  * although only some fields are used by each.
  */
@@ -3086,6 +3093,7 @@ struct VirtIONetMigTmp {
     uint16_t        curr_queue_pairs_1;
     uint8_t         has_ufo;
     uint32_t        has_vnet_hdr;
+    QTAILQ_HEAD(, VirtIONetInflightQueue) queues_inflight;
 };
 
 /* The 2nd and subsequent tx_waiting flags are loaded later than
@@ -3231,6 +3239,124 @@ static const VMStateDescription vmstate_virtio_net_rss = {
     },
 };
 
+static const VMStateDescription vmstate_virtio_net_inflight_queue = {
+    .name      = "virtio-net-device/inflight/queue",
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT16(idx, VirtIONetInflightQueue),
+        VMSTATE_UINT16(num, VirtIONetInflightQueue),
+
+        VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(elems, VirtIONetInflightQueue, num,
+                                           0, vmstate_virtqueue_element_old,
+                                           VirtQueueElementOld),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static int virtio_net_inflight_init(void *opaque)
+{
+    struct VirtIONetMigTmp *tmp = opaque;
+
+    QTAILQ_INIT(&tmp->queues_inflight);
+    return 0;
+}
+
+static int virtio_net_inflight_pre_save(void *opaque)
+{
+    struct VirtIONetMigTmp *tmp = opaque;
+    VirtIONet *net = tmp->parent;
+    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs : 1;
+    VirtIONetInflightQueue *qi = g_new0(VirtIONetInflightQueue,
+                                        curr_queue_pairs * 2);
+
+    virtio_net_inflight_init(opaque);
+    for (uint16_t i = 0; i < curr_queue_pairs * 2; ++i) {
+        VirtIONetQueue *q = &net->vqs[vq2q(i)];
+        size_t n = i % 2 ? q->tx_inflight_num : q->rx_inflight_num;
+        VirtQueueElement **inflight = i % 2 ? q->tx_inflight : q->rx_inflight;
+
+        if (n == 0) {
+            continue;
+        }
+
+        qi[i].idx = i;
+        qi[i].num = n;
+        qi[i].elems = g_new0(VirtQueueElementOld, n);
+        for (uint16_t j = 0; j < n; ++j) {
+            qemu_put_virtqueue_element_old(inflight[j], &qi[i].elems[j]);
+        }
+        QTAILQ_INSERT_TAIL(&tmp->queues_inflight, &qi[i], entry);
+    }
+
+    return 0;
+}
+
+static int virtio_net_inflight_post_save(void *opaque)
+{
+    struct VirtIONetMigTmp *tmp = opaque;
+    VirtIONetInflightQueue *qi;
+
+    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
+        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
+        g_free(qi->elems);
+        g_free(qi);
+    }
+
+    return 0;
+}
+
+static int virtio_net_inflight_post_load(void *opaque, int version_id)
+{
+    struct VirtIONetMigTmp *tmp = opaque;
+    VirtIONet *net = tmp->parent;
+    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs : 1;
+    VirtIONetInflightQueue *qi;
+
+    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
+        VirtIONetQueue *q = &net->vqs[vq2q(qi->idx)];
+        uint32_t *n = qi->idx % 2 ? &q->tx_inflight_num : &q->rx_inflight_num;
+        VirtQueueElement ***inflight = qi->idx % 2 ?
+                                       &q->tx_inflight : &q->rx_inflight;
+        if (unlikely(qi->num == 0)) {
+            /* TODO: error message */
+            return -1;
+        }
+
+        if (unlikely(qi->idx > curr_queue_pairs * 2)) {
+            /* TODO: error message */
+            return -1;
+        }
+
+        *n = qi->num;
+        *inflight = g_new(VirtQueueElement *, *n);
+        for (uint16_t j = 0; j < *n; ++j) {
+            (*inflight)[j] = qemu_get_virtqueue_element_from_old(
+                &net->parent_obj, &qi->elems[j],
+                sizeof(VirtQueueElement));
+        }
+
+        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
+        g_free(qi->elems);
+        g_free(qi);
+    }
+
+    return 0;
+}
+
+/* TODO: Allocate a temporal per queue / queue element, not all of them! */
+static const VMStateDescription vmstate_virtio_net_inflight = {
+    .name      = "virtio-net-device/inflight",
+    .pre_save = virtio_net_inflight_pre_save,
+    .post_save = virtio_net_inflight_post_save,
+    .pre_load = virtio_net_inflight_init,
+    .post_load = virtio_net_inflight_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_QTAILQ_V(queues_inflight, struct VirtIONetMigTmp, 0,
+                         vmstate_virtio_net_inflight_queue,
+                         VirtIONetInflightQueue, entry),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_virtio_net_device = {
     .name = "virtio-net-device",
     .version_id = VIRTIO_NET_VM_VERSION,
@@ -3279,6 +3405,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
                          vmstate_virtio_net_tx_waiting),
         VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
                             has_ctrl_guest_offloads),
+        /* TODO: Move to subsection */
+        VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
+                         vmstate_virtio_net_inflight),
         VMSTATE_END_OF_LIST()
    },
     .subsections = (const VMStateDescription * []) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 11/13] virtio-net: save inflight descriptors at vhost shutdown
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 12/13] vhost: expose vhost_svq_add_element Eugenio Pérez
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

So they can be migrated in virtio-net

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/net/vhost_net.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 043058ff43..480f4ac0a1 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -21,6 +21,7 @@
 
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/virtio-net.h"
+#include "hw/virtio/vhost-vdpa.h" /* TODO remove me */
 #include "net/vhost_net.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
@@ -319,6 +320,7 @@ static void vhost_net_stop_one(struct vhost_net *net,
                                VirtIODevice *dev)
 {
     struct vhost_vring_file file = { .fd = -1 };
+    VirtIONet *n = VIRTIO_NET(dev);
 
     if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {
         for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
@@ -329,6 +331,26 @@ static void vhost_net_stop_one(struct vhost_net *net,
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
     }
+
+    for (size_t i = 0; i < net->dev.nvqs; ++i) {
+        struct vhost_vdpa *v = net->dev.opaque;
+
+        if (net->dev.nvqs != 2) {
+            continue;
+        }
+
+        if (!v->shadow_vqs_enabled) {
+            continue;
+        }
+
+        n->vqs[i].rx_inflight = vhost_svq_save_inflight(
+            g_ptr_array_index(v->shadow_vqs, 0),
+            &n->vqs[i].rx_inflight_num);
+        n->vqs[i].tx_inflight = vhost_svq_save_inflight(
+            g_ptr_array_index(v->shadow_vqs, 1),
+            &n->vqs[i].tx_inflight_num);
+    }
+
     vhost_dev_stop(&net->dev, dev, false);
     if (net->nc->info->stop) {
         net->nc->info->stop(net->nc);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 12/13] vhost: expose vhost_svq_add_element
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (10 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 11/13] virtio-net: save inflight descriptors at vhost shutdown Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-05 17:04 ` [RFC PATCH for 8.0 13/13] vdpa: Recover inflight descriptors Eugenio Pérez
  2022-12-06  7:07 ` [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Jason Wang
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

Needed to inject new inflight elements

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index e82da4b55c..69910597dd 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -122,6 +122,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
                   size_t out_num, const struct iovec *in_sg, size_t in_num,
                   VirtQueueElement *elem);
+int vhost_svq_add_element(VhostShadowVirtqueue *svq, VirtQueueElement *elem);
 size_t vhost_svq_poll(VhostShadowVirtqueue *svq);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 029ccee957..0668114897 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -271,9 +271,15 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
     return 0;
 }
 
-/* Convenience wrapper to add a guest's element to SVQ */
-static int vhost_svq_add_element(VhostShadowVirtqueue *svq,
-                                 VirtQueueElement *elem)
+/*
+ * Add a guest's element to SVQ. SVQ return the element When the device mark
+ * the descriptor as used, so make sure the guest will understand it.
+ *
+ * This function can be used to add the elements after a migration event or in
+ * the case the device is reset. To add elements that must not be seen by the
+ * guest use vhost_svq_add.
+ */
+int vhost_svq_add_element(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
 {
     return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
                          elem->in_num, elem);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH for 8.0 13/13] vdpa: Recover inflight descriptors
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (11 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 12/13] vhost: expose vhost_svq_add_element Eugenio Pérez
@ 2022-12-05 17:04 ` Eugenio Pérez
  2022-12-06  7:07 ` [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Jason Wang
  13 siblings, 0 replies; 30+ messages in thread
From: Eugenio Pérez @ 2022-12-05 17:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

Finally recover the inflight descriptors at vhost_net_start.

TODO: Abstract it properly instead of using SVQ directly.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/net/vhost_net.c     | 34 ++++++++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c | 11 -----------
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 480f4ac0a1..9a49046c53 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -428,6 +428,40 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         }
     }
 
+    for (i = 0; i < data_queue_pairs; i++) {
+        struct vhost_vdpa *v;
+
+        peer = qemu_get_peer(ncs, i);
+        if (peer->info->type != NET_CLIENT_DRIVER_VHOST_VDPA) {
+            continue;
+        }
+        net = get_vhost_net(peer);
+        v = net->dev.opaque;
+
+        if (!v->shadow_vqs_enabled) {
+            return 0;
+        }
+
+        for (size_t i = 0; i < v->dev->nvqs; ++i) {
+            VirtIONetQueue *q = &n->vqs[(i + v->dev->vq_index) / 2];
+            size_t num = i % 2 ? q->tx_inflight_num : q->rx_inflight_num;
+            g_autofree VirtQueueElement **inflight = NULL;
+
+            assert(v->dev->vq_index % 2 == 0);
+            inflight = g_steal_pointer(i % 2 ? &q->tx_inflight
+                                             : &q->rx_inflight);
+            for (size_t j = 0; j < num; ++j) {
+                int r;
+
+                r = vhost_svq_add_element(g_ptr_array_index(v->shadow_vqs, i),
+                                          inflight[j]);
+
+                /* TODO: Proper error handling */
+                assert(r == 0);
+            }
+        }
+    }
+
     return 0;
 
 err_start:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7468e44b87..c54cb82cb5 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1167,18 +1167,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
     struct vhost_vdpa *v = dev->opaque;
-    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
 
-    /*
-     * vhost-vdpa devices does not support in-flight requests. Set all of them
-     * as available.
-     *
-     * TODO: This is ok for networking, but other kinds of devices might
-     * have problems with these retransmissions.
-     */
-    while (virtqueue_rewind(vq, 1)) {
-        continue;
-    }
     if (v->shadow_vqs_enabled) {
         /*
          * Device vring base was set at device start. SVQ base is handled by
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-05 17:04 ` [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors Eugenio Pérez
@ 2022-12-05 20:52   ` Parav Pandit
  2022-12-07  8:40     ` Eugenio Perez Martin
  2022-12-06  3:24   ` Jason Wang
  1 sibling, 1 reply; 30+ messages in thread
From: Parav Pandit @ 2022-12-05 20:52 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Liuxiangdong, Stefan Hajnoczi, Jason Wang, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela


> From: Eugenio Pérez <eperezma@redhat.com>
> Sent: Monday, December 5, 2022 12:05 PM
> 
> There is currently no data to be migrated, since nothing populates or read
> the fields on virtio-net.
> 
> The migration of in-flight descriptors is modelled after the migration of
> requests in virtio-blk. With some differences:
> * virtio-blk migrates queue number on each request. Here we only add a
>   vq if it has descriptors to migrate, and then we make all descriptors
>   in an array.
> * Use of QTAILQ since it works similar to signal the end of the inflight
>   descriptors: 1 for more data, 0 if end. But do it for each vq instead
>   of for each descriptor.
> * Usage of VMState macros.
> 
> The fields of descriptors would be way more complicated if we use the
> VirtQueueElements directly, since there would be a few levels of
> indirections. Using VirtQueueElementOld for the moment, and migrate to
> VirtQueueElement for the final patch.
> 
> TODO: Proper migration versioning
> TODO: Do not embed vhost-vdpa structs
> TODO: Migrate the VirtQueueElement, not VirtQueueElementOld.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/hw/virtio/virtio-net.h |   2 +
>  include/migration/vmstate.h    |  11 +++
>  hw/net/virtio-net.c            | 129 +++++++++++++++++++++++++++++++++
>  3 files changed, 142 insertions(+)
> 
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index ef234ffe7e..ae7c017ef0 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -151,9 +151,11 @@ typedef struct VirtIONetQueue {
>      QEMUTimer *tx_timer;
>      QEMUBH *tx_bh;
>      uint32_t tx_waiting;
> +    uint32_t tx_inflight_num, rx_inflight_num;
>      struct {
>          VirtQueueElement *elem;
>      } async_tx;
> +    VirtQueueElement **tx_inflight, **rx_inflight;
>      struct VirtIONet *n;
>  } VirtIONetQueue;
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 9726d2d09e..9e0dfef9ee 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -626,6 +626,17 @@ extern const VMStateInfo vmstate_info_qlist;
>      .offset     = vmstate_offset_varray(_state, _field, _type),      \
>  }
> 
> +#define VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(_field, _state,
> _field_num,        \
> +                                           _version, _vmsd, _type) {          \
> +    .name       = (stringify(_field)),                                        \
> +    .version_id = (_version),                                                 \
> +    .vmsd       = &(_vmsd),                                                   \
> +    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),         \
> +    .size       = sizeof(_type),                                              \
> +    .flags      = VMS_STRUCT | VMS_VARRAY_UINT16 | VMS_ALLOC |
> VMS_POINTER,   \
> +    .offset     = vmstate_offset_pointer(_state, _field, _type),              \
> +}
> +
>  #define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num,
> _version, _vmsd, _type) {\
>      .name       = (stringify(_field)),                               \
>      .version_id = (_version),                                        \
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index
> aba12759d5..ffd7bf1fc7 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3077,6 +3077,13 @@ static bool mac_table_doesnt_fit(void *opaque,
> int version_id)
>      return !mac_table_fits(opaque, version_id);  }
> 
> +typedef struct VirtIONetInflightQueue {
> +    uint16_t idx;
> +    uint16_t num;
> +    QTAILQ_ENTRY(VirtIONetInflightQueue) entry;
> +    VirtQueueElementOld *elems;
> +} VirtIONetInflightQueue;
> +
>  /* This temporary type is shared by all the WITH_TMP methods
>   * although only some fields are used by each.
>   */
> @@ -3086,6 +3093,7 @@ struct VirtIONetMigTmp {
>      uint16_t        curr_queue_pairs_1;
>      uint8_t         has_ufo;
>      uint32_t        has_vnet_hdr;
> +    QTAILQ_HEAD(, VirtIONetInflightQueue) queues_inflight;
>  };
> 
>  /* The 2nd and subsequent tx_waiting flags are loaded later than @@ -
> 3231,6 +3239,124 @@ static const VMStateDescription
> vmstate_virtio_net_rss = {
>      },
>  };
> 
> +static const VMStateDescription vmstate_virtio_net_inflight_queue = {
> +    .name      = "virtio-net-device/inflight/queue",
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT16(idx, VirtIONetInflightQueue),
> +        VMSTATE_UINT16(num, VirtIONetInflightQueue),
> +
> +        VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(elems,
> VirtIONetInflightQueue, num,
> +                                           0, vmstate_virtqueue_element_old,
> +                                           VirtQueueElementOld),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static int virtio_net_inflight_init(void *opaque) {
> +    struct VirtIONetMigTmp *tmp = opaque;
> +
> +    QTAILQ_INIT(&tmp->queues_inflight);
> +    return 0;
> +}
> +
> +static int virtio_net_inflight_pre_save(void *opaque) {
> +    struct VirtIONetMigTmp *tmp = opaque;
> +    VirtIONet *net = tmp->parent;
> +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs :
> 1;
> +    VirtIONetInflightQueue *qi = g_new0(VirtIONetInflightQueue,
> +                                        curr_queue_pairs * 2);
> +
> +    virtio_net_inflight_init(opaque);
> +    for (uint16_t i = 0; i < curr_queue_pairs * 2; ++i) {
> +        VirtIONetQueue *q = &net->vqs[vq2q(i)];
> +        size_t n = i % 2 ? q->tx_inflight_num : q->rx_inflight_num;
> +        VirtQueueElement **inflight = i % 2 ? q->tx_inflight :
> + q->rx_inflight;
> +
> +        if (n == 0) {
> +            continue;
> +        }
> +
> +        qi[i].idx = i;
> +        qi[i].num = n;
> +        qi[i].elems = g_new0(VirtQueueElementOld, n);
> +        for (uint16_t j = 0; j < n; ++j) {
> +            qemu_put_virtqueue_element_old(inflight[j], &qi[i].elems[j]);
> +        }
> +        QTAILQ_INSERT_TAIL(&tmp->queues_inflight, &qi[i], entry);
> +    }
> +
> +    return 0;
> +}
> +
> +static int virtio_net_inflight_post_save(void *opaque) {
> +    struct VirtIONetMigTmp *tmp = opaque;
> +    VirtIONetInflightQueue *qi;
> +
> +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> +        g_free(qi->elems);
> +        g_free(qi);
> +    }
> +
> +    return 0;
> +}
> +
> +static int virtio_net_inflight_post_load(void *opaque, int version_id)
> +{
> +    struct VirtIONetMigTmp *tmp = opaque;
> +    VirtIONet *net = tmp->parent;
> +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs :
> 1;
> +    VirtIONetInflightQueue *qi;
> +
> +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> +        VirtIONetQueue *q = &net->vqs[vq2q(qi->idx)];
> +        uint32_t *n = qi->idx % 2 ? &q->tx_inflight_num : &q->rx_inflight_num;
> +        VirtQueueElement ***inflight = qi->idx % 2 ?
> +                                       &q->tx_inflight : &q->rx_inflight;
> +        if (unlikely(qi->num == 0)) {
> +            /* TODO: error message */
> +            return -1;
> +        }
> +
> +        if (unlikely(qi->idx > curr_queue_pairs * 2)) {
> +            /* TODO: error message */
> +            return -1;
> +        }
> +
> +        *n = qi->num;
> +        *inflight = g_new(VirtQueueElement *, *n);
> +        for (uint16_t j = 0; j < *n; ++j) {
> +            (*inflight)[j] = qemu_get_virtqueue_element_from_old(
> +                &net->parent_obj, &qi->elems[j],
> +                sizeof(VirtQueueElement));
> +        }
> +
> +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> +        g_free(qi->elems);
> +        g_free(qi);
> +    }
> +
> +    return 0;
> +}
> +
> +/* TODO: Allocate a temporal per queue / queue element, not all of
> +them! */ static const VMStateDescription vmstate_virtio_net_inflight = {
> +    .name      = "virtio-net-device/inflight",
> +    .pre_save = virtio_net_inflight_pre_save,
> +    .post_save = virtio_net_inflight_post_save,
> +    .pre_load = virtio_net_inflight_init,
> +    .post_load = virtio_net_inflight_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_QTAILQ_V(queues_inflight, struct VirtIONetMigTmp, 0,
> +                         vmstate_virtio_net_inflight_queue,
> +                         VirtIONetInflightQueue, entry),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
How is the CVQ related mac, vlan, rss replay different than these inflight descriptors, due to which inflights to be done by these callbacks and CVQ differently?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-05 17:04 ` [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors Eugenio Pérez
  2022-12-05 20:52   ` Parav Pandit
@ 2022-12-06  3:24   ` Jason Wang
  2022-12-07  8:56     ` Eugenio Perez Martin
  2023-01-10  3:02     ` Parav Pandit
  1 sibling, 2 replies; 30+ messages in thread
From: Jason Wang @ 2022-12-06  3:24 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

On Tue, Dec 6, 2022 at 1:05 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> There is currently no data to be migrated, since nothing populates or
> read the fields on virtio-net.
>
> The migration of in-flight descriptors is modelled after the migration
> of requests in virtio-blk. With some differences:
> * virtio-blk migrates queue number on each request. Here we only add a
>   vq if it has descriptors to migrate, and then we make all descriptors
>   in an array.
> * Use of QTAILQ since it works similar to signal the end of the inflight
>   descriptors: 1 for more data, 0 if end. But do it for each vq instead
>   of for each descriptor.
> * Usage of VMState macros.
>
> The fields of descriptors would be way more complicated if we use the
> VirtQueueElements directly, since there would be a few levels of
> indirections. Using VirtQueueElementOld for the moment, and migrate to
> VirtQueueElement for the final patch.
>
> TODO: Proper migration versioning
> TODO: Do not embed vhost-vdpa structs
> TODO: Migrate the VirtQueueElement, not VirtQueueElementOld.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/hw/virtio/virtio-net.h |   2 +
>  include/migration/vmstate.h    |  11 +++
>  hw/net/virtio-net.c            | 129 +++++++++++++++++++++++++++++++++
>  3 files changed, 142 insertions(+)
>
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index ef234ffe7e..ae7c017ef0 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -151,9 +151,11 @@ typedef struct VirtIONetQueue {
>      QEMUTimer *tx_timer;
>      QEMUBH *tx_bh;
>      uint32_t tx_waiting;
> +    uint32_t tx_inflight_num, rx_inflight_num;
>      struct {
>          VirtQueueElement *elem;
>      } async_tx;
> +    VirtQueueElement **tx_inflight, **rx_inflight;
>      struct VirtIONet *n;
>  } VirtIONetQueue;
>
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 9726d2d09e..9e0dfef9ee 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -626,6 +626,17 @@ extern const VMStateInfo vmstate_info_qlist;
>      .offset     = vmstate_offset_varray(_state, _field, _type),      \
>  }
>
> +#define VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(_field, _state, _field_num,        \
> +                                           _version, _vmsd, _type) {          \
> +    .name       = (stringify(_field)),                                        \
> +    .version_id = (_version),                                                 \
> +    .vmsd       = &(_vmsd),                                                   \
> +    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),         \
> +    .size       = sizeof(_type),                                              \
> +    .flags      = VMS_STRUCT | VMS_VARRAY_UINT16 | VMS_ALLOC | VMS_POINTER,   \
> +    .offset     = vmstate_offset_pointer(_state, _field, _type),              \
> +}
> +
>  #define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
>      .name       = (stringify(_field)),                               \
>      .version_id = (_version),                                        \
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index aba12759d5..ffd7bf1fc7 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3077,6 +3077,13 @@ static bool mac_table_doesnt_fit(void *opaque, int version_id)
>      return !mac_table_fits(opaque, version_id);
>  }
>
> +typedef struct VirtIONetInflightQueue {
> +    uint16_t idx;
> +    uint16_t num;
> +    QTAILQ_ENTRY(VirtIONetInflightQueue) entry;
> +    VirtQueueElementOld *elems;
> +} VirtIONetInflightQueue;
> +
>  /* This temporary type is shared by all the WITH_TMP methods
>   * although only some fields are used by each.
>   */
> @@ -3086,6 +3093,7 @@ struct VirtIONetMigTmp {
>      uint16_t        curr_queue_pairs_1;
>      uint8_t         has_ufo;
>      uint32_t        has_vnet_hdr;
> +    QTAILQ_HEAD(, VirtIONetInflightQueue) queues_inflight;
>  };
>
>  /* The 2nd and subsequent tx_waiting flags are loaded later than
> @@ -3231,6 +3239,124 @@ static const VMStateDescription vmstate_virtio_net_rss = {
>      },
>  };
>
> +static const VMStateDescription vmstate_virtio_net_inflight_queue = {
> +    .name      = "virtio-net-device/inflight/queue",
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT16(idx, VirtIONetInflightQueue),
> +        VMSTATE_UINT16(num, VirtIONetInflightQueue),
> +
> +        VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(elems, VirtIONetInflightQueue, num,
> +                                           0, vmstate_virtqueue_element_old,
> +                                           VirtQueueElementOld),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};

A dumb question, any reason we need bother with virtio-net? It looks
to me it's not a must and would complicate migration compatibility.

I guess virtio-blk is the better place.

Thanks

> +
> +static int virtio_net_inflight_init(void *opaque)
> +{
> +    struct VirtIONetMigTmp *tmp = opaque;
> +
> +    QTAILQ_INIT(&tmp->queues_inflight);
> +    return 0;
> +}
> +
> +static int virtio_net_inflight_pre_save(void *opaque)
> +{
> +    struct VirtIONetMigTmp *tmp = opaque;
> +    VirtIONet *net = tmp->parent;
> +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs : 1;
> +    VirtIONetInflightQueue *qi = g_new0(VirtIONetInflightQueue,
> +                                        curr_queue_pairs * 2);
> +
> +    virtio_net_inflight_init(opaque);
> +    for (uint16_t i = 0; i < curr_queue_pairs * 2; ++i) {
> +        VirtIONetQueue *q = &net->vqs[vq2q(i)];
> +        size_t n = i % 2 ? q->tx_inflight_num : q->rx_inflight_num;
> +        VirtQueueElement **inflight = i % 2 ? q->tx_inflight : q->rx_inflight;
> +
> +        if (n == 0) {
> +            continue;
> +        }
> +
> +        qi[i].idx = i;
> +        qi[i].num = n;
> +        qi[i].elems = g_new0(VirtQueueElementOld, n);
> +        for (uint16_t j = 0; j < n; ++j) {
> +            qemu_put_virtqueue_element_old(inflight[j], &qi[i].elems[j]);
> +        }
> +        QTAILQ_INSERT_TAIL(&tmp->queues_inflight, &qi[i], entry);
> +    }
> +
> +    return 0;
> +}
> +
> +static int virtio_net_inflight_post_save(void *opaque)
> +{
> +    struct VirtIONetMigTmp *tmp = opaque;
> +    VirtIONetInflightQueue *qi;
> +
> +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> +        g_free(qi->elems);
> +        g_free(qi);
> +    }
> +
> +    return 0;
> +}
> +
> +static int virtio_net_inflight_post_load(void *opaque, int version_id)
> +{
> +    struct VirtIONetMigTmp *tmp = opaque;
> +    VirtIONet *net = tmp->parent;
> +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs : 1;
> +    VirtIONetInflightQueue *qi;
> +
> +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> +        VirtIONetQueue *q = &net->vqs[vq2q(qi->idx)];
> +        uint32_t *n = qi->idx % 2 ? &q->tx_inflight_num : &q->rx_inflight_num;
> +        VirtQueueElement ***inflight = qi->idx % 2 ?
> +                                       &q->tx_inflight : &q->rx_inflight;
> +        if (unlikely(qi->num == 0)) {
> +            /* TODO: error message */
> +            return -1;
> +        }
> +
> +        if (unlikely(qi->idx > curr_queue_pairs * 2)) {
> +            /* TODO: error message */
> +            return -1;
> +        }
> +
> +        *n = qi->num;
> +        *inflight = g_new(VirtQueueElement *, *n);
> +        for (uint16_t j = 0; j < *n; ++j) {
> +            (*inflight)[j] = qemu_get_virtqueue_element_from_old(
> +                &net->parent_obj, &qi->elems[j],
> +                sizeof(VirtQueueElement));
> +        }
> +
> +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> +        g_free(qi->elems);
> +        g_free(qi);
> +    }
> +
> +    return 0;
> +}
> +
> +/* TODO: Allocate a temporal per queue / queue element, not all of them! */
> +static const VMStateDescription vmstate_virtio_net_inflight = {
> +    .name      = "virtio-net-device/inflight",
> +    .pre_save = virtio_net_inflight_pre_save,
> +    .post_save = virtio_net_inflight_post_save,
> +    .pre_load = virtio_net_inflight_init,
> +    .post_load = virtio_net_inflight_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_QTAILQ_V(queues_inflight, struct VirtIONetMigTmp, 0,
> +                         vmstate_virtio_net_inflight_queue,
> +                         VirtIONetInflightQueue, entry),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_virtio_net_device = {
>      .name = "virtio-net-device",
>      .version_id = VIRTIO_NET_VM_VERSION,
> @@ -3279,6 +3405,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
>                           vmstate_virtio_net_tx_waiting),
>          VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
>                              has_ctrl_guest_offloads),
> +        /* TODO: Move to subsection */
> +        VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
> +                         vmstate_virtio_net_inflight),
>          VMSTATE_END_OF_LIST()
>     },
>      .subsections = (const VMStateDescription * []) {
> --
> 2.31.1
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ
  2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
                   ` (12 preceding siblings ...)
  2022-12-05 17:04 ` [RFC PATCH for 8.0 13/13] vdpa: Recover inflight descriptors Eugenio Pérez
@ 2022-12-06  7:07 ` Jason Wang
  2022-12-07  8:59   ` Eugenio Perez Martin
  13 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  7:07 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

On Tue, Dec 6, 2022 at 1:04 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> The state of the descriptors (avail or used) may not be recoverable just
> looking at the guest memory.  Out of order used descriptor may override
> previous avail ones in the descriptor table or avail vring.
>
> Currently we're not migrating this status in net devices because virtio-net,
> vhost-kernel etc use the descriptors in order,

Note that this might not be the truth (when zerocopy is enabled).

> so the information always
> recoverable from guest's memory.  However, vDPA devices may use them out of
> order, and other kind of devices like block need this support.
>
> Shadow virtqueue is able to track these and resend them at the destination.

As discussed, there's a bootstrap issue here:

When SVQ needs to be enabled on demand, do we still need another way
to get inflight ones without the help of SVQ?

Thanks

> Add them to the virtio-net migration description so they are not lose in the
> process.
>
> This is a very early RFC just to validate the first draft so expect leftovers.
> To fetch and request the descriptors from a device without SVQ need to be
> implemented on top. Some other notable pending items are:
> * Do not send the descriptors actually recoverable from the guest memory.
> * Properly version the migrate data.
> * Properly abstract the descriptors access from virtio-net to SVQ.
> * Do not use VirtQueueElementOld but migrate directly VirtQueueElement.
> * Replace lots of assertions with runtime conditionals.
> * Other TODOs in the patch message or code changes.
>
> Thanks.
>
> Eugenio Pérez (13):
>   vhost: add available descriptor list in SVQ
>   vhost: iterate only available descriptors at SVQ stop
>   vhost: merge avail list and next avail descriptors detach
>   vhost: add vhost_svq_save_inflight
>   virtio: Specify uint32_t as VirtQueueElementOld members type
>   virtio: refactor qemu_get_virtqueue_element
>   virtio: refactor qemu_put_virtqueue_element
>   virtio: expose VirtQueueElementOld
>   virtio: add vmstate_virtqueue_element_old
>   virtio-net: Migrate vhost inflight descriptors
>   virtio-net: save inflight descriptors at vhost shutdown
>   vhost: expose vhost_svq_add_element
>   vdpa: Recover inflight descriptors
>
>  hw/virtio/vhost-shadow-virtqueue.h |   9 ++
>  include/hw/virtio/virtio-net.h     |   2 +
>  include/hw/virtio/virtio.h         |  32 ++++++
>  include/migration/vmstate.h        |  22 ++++
>  hw/net/vhost_net.c                 |  56 ++++++++++
>  hw/net/virtio-net.c                | 129 +++++++++++++++++++++++
>  hw/virtio/vhost-shadow-virtqueue.c |  52 +++++++--
>  hw/virtio/vhost-vdpa.c             |  11 --
>  hw/virtio/virtio.c                 | 162 ++++++++++++++++++-----------
>  9 files changed, 392 insertions(+), 83 deletions(-)
>
> --
> 2.31.1
>
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-05 20:52   ` Parav Pandit
@ 2022-12-07  8:40     ` Eugenio Perez Martin
  0 siblings, 0 replies; 30+ messages in thread
From: Eugenio Perez Martin @ 2022-12-07  8:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Jason Wang,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Michael S. Tsirkin, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela

On Mon, Dec 5, 2022 at 9:52 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Eugenio Pérez <eperezma@redhat.com>
> > Sent: Monday, December 5, 2022 12:05 PM
> >
> > There is currently no data to be migrated, since nothing populates or read
> > the fields on virtio-net.
> >
> > The migration of in-flight descriptors is modelled after the migration of
> > requests in virtio-blk. With some differences:
> > * virtio-blk migrates queue number on each request. Here we only add a
> >   vq if it has descriptors to migrate, and then we make all descriptors
> >   in an array.
> > * Use of QTAILQ since it works similar to signal the end of the inflight
> >   descriptors: 1 for more data, 0 if end. But do it for each vq instead
> >   of for each descriptor.
> > * Usage of VMState macros.
> >
> > The fields of descriptors would be way more complicated if we use the
> > VirtQueueElements directly, since there would be a few levels of
> > indirections. Using VirtQueueElementOld for the moment, and migrate to
> > VirtQueueElement for the final patch.
> >
> > TODO: Proper migration versioning
> > TODO: Do not embed vhost-vdpa structs
> > TODO: Migrate the VirtQueueElement, not VirtQueueElementOld.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  include/hw/virtio/virtio-net.h |   2 +
> >  include/migration/vmstate.h    |  11 +++
> >  hw/net/virtio-net.c            | 129 +++++++++++++++++++++++++++++++++
> >  3 files changed, 142 insertions(+)
> >
> > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> > index ef234ffe7e..ae7c017ef0 100644
> > --- a/include/hw/virtio/virtio-net.h
> > +++ b/include/hw/virtio/virtio-net.h
> > @@ -151,9 +151,11 @@ typedef struct VirtIONetQueue {
> >      QEMUTimer *tx_timer;
> >      QEMUBH *tx_bh;
> >      uint32_t tx_waiting;
> > +    uint32_t tx_inflight_num, rx_inflight_num;
> >      struct {
> >          VirtQueueElement *elem;
> >      } async_tx;
> > +    VirtQueueElement **tx_inflight, **rx_inflight;
> >      struct VirtIONet *n;
> >  } VirtIONetQueue;
> >
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 9726d2d09e..9e0dfef9ee 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -626,6 +626,17 @@ extern const VMStateInfo vmstate_info_qlist;
> >      .offset     = vmstate_offset_varray(_state, _field, _type),      \
> >  }
> >
> > +#define VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(_field, _state,
> > _field_num,        \
> > +                                           _version, _vmsd, _type) {          \
> > +    .name       = (stringify(_field)),                                        \
> > +    .version_id = (_version),                                                 \
> > +    .vmsd       = &(_vmsd),                                                   \
> > +    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),         \
> > +    .size       = sizeof(_type),                                              \
> > +    .flags      = VMS_STRUCT | VMS_VARRAY_UINT16 | VMS_ALLOC |
> > VMS_POINTER,   \
> > +    .offset     = vmstate_offset_pointer(_state, _field, _type),              \
> > +}
> > +
> >  #define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num,
> > _version, _vmsd, _type) {\
> >      .name       = (stringify(_field)),                               \
> >      .version_id = (_version),                                        \
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index
> > aba12759d5..ffd7bf1fc7 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -3077,6 +3077,13 @@ static bool mac_table_doesnt_fit(void *opaque,
> > int version_id)
> >      return !mac_table_fits(opaque, version_id);  }
> >
> > +typedef struct VirtIONetInflightQueue {
> > +    uint16_t idx;
> > +    uint16_t num;
> > +    QTAILQ_ENTRY(VirtIONetInflightQueue) entry;
> > +    VirtQueueElementOld *elems;
> > +} VirtIONetInflightQueue;
> > +
> >  /* This temporary type is shared by all the WITH_TMP methods
> >   * although only some fields are used by each.
> >   */
> > @@ -3086,6 +3093,7 @@ struct VirtIONetMigTmp {
> >      uint16_t        curr_queue_pairs_1;
> >      uint8_t         has_ufo;
> >      uint32_t        has_vnet_hdr;
> > +    QTAILQ_HEAD(, VirtIONetInflightQueue) queues_inflight;
> >  };
> >
> >  /* The 2nd and subsequent tx_waiting flags are loaded later than @@ -
> > 3231,6 +3239,124 @@ static const VMStateDescription
> > vmstate_virtio_net_rss = {
> >      },
> >  };
> >
> > +static const VMStateDescription vmstate_virtio_net_inflight_queue = {
> > +    .name      = "virtio-net-device/inflight/queue",
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT16(idx, VirtIONetInflightQueue),
> > +        VMSTATE_UINT16(num, VirtIONetInflightQueue),
> > +
> > +        VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(elems,
> > VirtIONetInflightQueue, num,
> > +                                           0, vmstate_virtqueue_element_old,
> > +                                           VirtQueueElementOld),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> > +static int virtio_net_inflight_init(void *opaque) {
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +
> > +    QTAILQ_INIT(&tmp->queues_inflight);
> > +    return 0;
> > +}
> > +
> > +static int virtio_net_inflight_pre_save(void *opaque) {
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +    VirtIONet *net = tmp->parent;
> > +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs :
> > 1;
> > +    VirtIONetInflightQueue *qi = g_new0(VirtIONetInflightQueue,
> > +                                        curr_queue_pairs * 2);
> > +
> > +    virtio_net_inflight_init(opaque);
> > +    for (uint16_t i = 0; i < curr_queue_pairs * 2; ++i) {
> > +        VirtIONetQueue *q = &net->vqs[vq2q(i)];
> > +        size_t n = i % 2 ? q->tx_inflight_num : q->rx_inflight_num;
> > +        VirtQueueElement **inflight = i % 2 ? q->tx_inflight :
> > + q->rx_inflight;
> > +
> > +        if (n == 0) {
> > +            continue;
> > +        }
> > +
> > +        qi[i].idx = i;
> > +        qi[i].num = n;
> > +        qi[i].elems = g_new0(VirtQueueElementOld, n);
> > +        for (uint16_t j = 0; j < n; ++j) {
> > +            qemu_put_virtqueue_element_old(inflight[j], &qi[i].elems[j]);
> > +        }
> > +        QTAILQ_INSERT_TAIL(&tmp->queues_inflight, &qi[i], entry);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int virtio_net_inflight_post_save(void *opaque) {
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +    VirtIONetInflightQueue *qi;
> > +
> > +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> > +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> > +        g_free(qi->elems);
> > +        g_free(qi);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int virtio_net_inflight_post_load(void *opaque, int version_id)
> > +{
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +    VirtIONet *net = tmp->parent;
> > +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs :
> > 1;
> > +    VirtIONetInflightQueue *qi;
> > +
> > +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> > +        VirtIONetQueue *q = &net->vqs[vq2q(qi->idx)];
> > +        uint32_t *n = qi->idx % 2 ? &q->tx_inflight_num : &q->rx_inflight_num;
> > +        VirtQueueElement ***inflight = qi->idx % 2 ?
> > +                                       &q->tx_inflight : &q->rx_inflight;
> > +        if (unlikely(qi->num == 0)) {
> > +            /* TODO: error message */
> > +            return -1;
> > +        }
> > +
> > +        if (unlikely(qi->idx > curr_queue_pairs * 2)) {
> > +            /* TODO: error message */
> > +            return -1;
> > +        }
> > +
> > +        *n = qi->num;
> > +        *inflight = g_new(VirtQueueElement *, *n);
> > +        for (uint16_t j = 0; j < *n; ++j) {
> > +            (*inflight)[j] = qemu_get_virtqueue_element_from_old(
> > +                &net->parent_obj, &qi->elems[j],
> > +                sizeof(VirtQueueElement));
> > +        }
> > +
> > +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> > +        g_free(qi->elems);
> > +        g_free(qi);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +/* TODO: Allocate a temporal per queue / queue element, not all of
> > +them! */ static const VMStateDescription vmstate_virtio_net_inflight = {
> > +    .name      = "virtio-net-device/inflight",
> > +    .pre_save = virtio_net_inflight_pre_save,
> > +    .post_save = virtio_net_inflight_post_save,
> > +    .pre_load = virtio_net_inflight_init,
> > +    .post_load = virtio_net_inflight_post_load,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_QTAILQ_V(queues_inflight, struct VirtIONetMigTmp, 0,
> > +                         vmstate_virtio_net_inflight_queue,
> > +                         VirtIONetInflightQueue, entry),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> How is the CVQ related mac, vlan, rss replay different than these inflight descriptors, due to which inflights to be done by these callbacks and CVQ differently?

CVQ is processed by qemu directly, so it is guaranteed they will not
be out of order. Guest memory is enough to recover in the destination.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-06  3:24   ` Jason Wang
@ 2022-12-07  8:56     ` Eugenio Perez Martin
  2023-01-16 21:01       ` Michael S. Tsirkin
  2023-01-10  3:02     ` Parav Pandit
  1 sibling, 1 reply; 30+ messages in thread
From: Eugenio Perez Martin @ 2022-12-07  8:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

On Tue, Dec 6, 2022 at 4:24 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Dec 6, 2022 at 1:05 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > There is currently no data to be migrated, since nothing populates or
> > read the fields on virtio-net.
> >
> > The migration of in-flight descriptors is modelled after the migration
> > of requests in virtio-blk. With some differences:
> > * virtio-blk migrates queue number on each request. Here we only add a
> >   vq if it has descriptors to migrate, and then we make all descriptors
> >   in an array.
> > * Use of QTAILQ since it works similar to signal the end of the inflight
> >   descriptors: 1 for more data, 0 if end. But do it for each vq instead
> >   of for each descriptor.
> > * Usage of VMState macros.
> >
> > The fields of descriptors would be way more complicated if we use the
> > VirtQueueElements directly, since there would be a few levels of
> > indirections. Using VirtQueueElementOld for the moment, and migrate to
> > VirtQueueElement for the final patch.
> >
> > TODO: Proper migration versioning
> > TODO: Do not embed vhost-vdpa structs
> > TODO: Migrate the VirtQueueElement, not VirtQueueElementOld.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  include/hw/virtio/virtio-net.h |   2 +
> >  include/migration/vmstate.h    |  11 +++
> >  hw/net/virtio-net.c            | 129 +++++++++++++++++++++++++++++++++
> >  3 files changed, 142 insertions(+)
> >
> > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> > index ef234ffe7e..ae7c017ef0 100644
> > --- a/include/hw/virtio/virtio-net.h
> > +++ b/include/hw/virtio/virtio-net.h
> > @@ -151,9 +151,11 @@ typedef struct VirtIONetQueue {
> >      QEMUTimer *tx_timer;
> >      QEMUBH *tx_bh;
> >      uint32_t tx_waiting;
> > +    uint32_t tx_inflight_num, rx_inflight_num;
> >      struct {
> >          VirtQueueElement *elem;
> >      } async_tx;
> > +    VirtQueueElement **tx_inflight, **rx_inflight;
> >      struct VirtIONet *n;
> >  } VirtIONetQueue;
> >
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 9726d2d09e..9e0dfef9ee 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -626,6 +626,17 @@ extern const VMStateInfo vmstate_info_qlist;
> >      .offset     = vmstate_offset_varray(_state, _field, _type),      \
> >  }
> >
> > +#define VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(_field, _state, _field_num,        \
> > +                                           _version, _vmsd, _type) {          \
> > +    .name       = (stringify(_field)),                                        \
> > +    .version_id = (_version),                                                 \
> > +    .vmsd       = &(_vmsd),                                                   \
> > +    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),         \
> > +    .size       = sizeof(_type),                                              \
> > +    .flags      = VMS_STRUCT | VMS_VARRAY_UINT16 | VMS_ALLOC | VMS_POINTER,   \
> > +    .offset     = vmstate_offset_pointer(_state, _field, _type),              \
> > +}
> > +
> >  #define VMSTATE_STRUCT_VARRAY_ALLOC(_field, _state, _field_num, _version, _vmsd, _type) {\
> >      .name       = (stringify(_field)),                               \
> >      .version_id = (_version),                                        \
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index aba12759d5..ffd7bf1fc7 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -3077,6 +3077,13 @@ static bool mac_table_doesnt_fit(void *opaque, int version_id)
> >      return !mac_table_fits(opaque, version_id);
> >  }
> >
> > +typedef struct VirtIONetInflightQueue {
> > +    uint16_t idx;
> > +    uint16_t num;
> > +    QTAILQ_ENTRY(VirtIONetInflightQueue) entry;
> > +    VirtQueueElementOld *elems;
> > +} VirtIONetInflightQueue;
> > +
> >  /* This temporary type is shared by all the WITH_TMP methods
> >   * although only some fields are used by each.
> >   */
> > @@ -3086,6 +3093,7 @@ struct VirtIONetMigTmp {
> >      uint16_t        curr_queue_pairs_1;
> >      uint8_t         has_ufo;
> >      uint32_t        has_vnet_hdr;
> > +    QTAILQ_HEAD(, VirtIONetInflightQueue) queues_inflight;
> >  };
> >
> >  /* The 2nd and subsequent tx_waiting flags are loaded later than
> > @@ -3231,6 +3239,124 @@ static const VMStateDescription vmstate_virtio_net_rss = {
> >      },
> >  };
> >
> > +static const VMStateDescription vmstate_virtio_net_inflight_queue = {
> > +    .name      = "virtio-net-device/inflight/queue",
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT16(idx, VirtIONetInflightQueue),
> > +        VMSTATE_UINT16(num, VirtIONetInflightQueue),
> > +
> > +        VMSTATE_STRUCT_VARRAY_ALLOC_UINT16(elems, VirtIONetInflightQueue, num,
> > +                                           0, vmstate_virtqueue_element_old,
> > +                                           VirtQueueElementOld),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
>
> A dumb question, any reason we need bother with virtio-net? It looks
> to me it's not a must and would complicate migration compatibility.
>
> I guess virtio-blk is the better place.
>

I'm fine to start with -blk, but if -net devices are processing
buffers out of order we have chances of losing descriptors too.

We can wait for more feedback to prioritize correctly this though.

Thanks!

> Thanks
>
> > +
> > +static int virtio_net_inflight_init(void *opaque)
> > +{
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +
> > +    QTAILQ_INIT(&tmp->queues_inflight);
> > +    return 0;
> > +}
> > +
> > +static int virtio_net_inflight_pre_save(void *opaque)
> > +{
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +    VirtIONet *net = tmp->parent;
> > +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs : 1;
> > +    VirtIONetInflightQueue *qi = g_new0(VirtIONetInflightQueue,
> > +                                        curr_queue_pairs * 2);
> > +
> > +    virtio_net_inflight_init(opaque);
> > +    for (uint16_t i = 0; i < curr_queue_pairs * 2; ++i) {
> > +        VirtIONetQueue *q = &net->vqs[vq2q(i)];
> > +        size_t n = i % 2 ? q->tx_inflight_num : q->rx_inflight_num;
> > +        VirtQueueElement **inflight = i % 2 ? q->tx_inflight : q->rx_inflight;
> > +
> > +        if (n == 0) {
> > +            continue;
> > +        }
> > +
> > +        qi[i].idx = i;
> > +        qi[i].num = n;
> > +        qi[i].elems = g_new0(VirtQueueElementOld, n);
> > +        for (uint16_t j = 0; j < n; ++j) {
> > +            qemu_put_virtqueue_element_old(inflight[j], &qi[i].elems[j]);
> > +        }
> > +        QTAILQ_INSERT_TAIL(&tmp->queues_inflight, &qi[i], entry);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int virtio_net_inflight_post_save(void *opaque)
> > +{
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +    VirtIONetInflightQueue *qi;
> > +
> > +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> > +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> > +        g_free(qi->elems);
> > +        g_free(qi);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int virtio_net_inflight_post_load(void *opaque, int version_id)
> > +{
> > +    struct VirtIONetMigTmp *tmp = opaque;
> > +    VirtIONet *net = tmp->parent;
> > +    uint16_t curr_queue_pairs = net->multiqueue ? net->curr_queue_pairs : 1;
> > +    VirtIONetInflightQueue *qi;
> > +
> > +    while ((qi = QTAILQ_FIRST(&tmp->queues_inflight))) {
> > +        VirtIONetQueue *q = &net->vqs[vq2q(qi->idx)];
> > +        uint32_t *n = qi->idx % 2 ? &q->tx_inflight_num : &q->rx_inflight_num;
> > +        VirtQueueElement ***inflight = qi->idx % 2 ?
> > +                                       &q->tx_inflight : &q->rx_inflight;
> > +        if (unlikely(qi->num == 0)) {
> > +            /* TODO: error message */
> > +            return -1;
> > +        }
> > +
> > +        if (unlikely(qi->idx > curr_queue_pairs * 2)) {
> > +            /* TODO: error message */
> > +            return -1;
> > +        }
> > +
> > +        *n = qi->num;
> > +        *inflight = g_new(VirtQueueElement *, *n);
> > +        for (uint16_t j = 0; j < *n; ++j) {
> > +            (*inflight)[j] = qemu_get_virtqueue_element_from_old(
> > +                &net->parent_obj, &qi->elems[j],
> > +                sizeof(VirtQueueElement));
> > +        }
> > +
> > +        QTAILQ_REMOVE(&tmp->queues_inflight, qi, entry);
> > +        g_free(qi->elems);
> > +        g_free(qi);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +/* TODO: Allocate a temporal per queue / queue element, not all of them! */
> > +static const VMStateDescription vmstate_virtio_net_inflight = {
> > +    .name      = "virtio-net-device/inflight",
> > +    .pre_save = virtio_net_inflight_pre_save,
> > +    .post_save = virtio_net_inflight_post_save,
> > +    .pre_load = virtio_net_inflight_init,
> > +    .post_load = virtio_net_inflight_post_load,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_QTAILQ_V(queues_inflight, struct VirtIONetMigTmp, 0,
> > +                         vmstate_virtio_net_inflight_queue,
> > +                         VirtIONetInflightQueue, entry),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> >  static const VMStateDescription vmstate_virtio_net_device = {
> >      .name = "virtio-net-device",
> >      .version_id = VIRTIO_NET_VM_VERSION,
> > @@ -3279,6 +3405,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
> >                           vmstate_virtio_net_tx_waiting),
> >          VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
> >                              has_ctrl_guest_offloads),
> > +        /* TODO: Move to subsection */
> > +        VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
> > +                         vmstate_virtio_net_inflight),
> >          VMSTATE_END_OF_LIST()
> >     },
> >      .subsections = (const VMStateDescription * []) {
> > --
> > 2.31.1
> >
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ
  2022-12-06  7:07 ` [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Jason Wang
@ 2022-12-07  8:59   ` Eugenio Perez Martin
  2022-12-08  7:31     ` Jason Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Eugenio Perez Martin @ 2022-12-07  8:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

On Tue, Dec 6, 2022 at 8:08 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Dec 6, 2022 at 1:04 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > The state of the descriptors (avail or used) may not be recoverable just
> > looking at the guest memory.  Out of order used descriptor may override
> > previous avail ones in the descriptor table or avail vring.
> >
> > Currently we're not migrating this status in net devices because virtio-net,
> > vhost-kernel etc use the descriptors in order,
>
> Note that this might not be the truth (when zerocopy is enabled).
>

Good point. So will virtio-net wait for those to complete then? How
does qemu handle if there are still inflight descriptors?

> > so the information always
> > recoverable from guest's memory.  However, vDPA devices may use them out of
> > order, and other kind of devices like block need this support.
> >
> > Shadow virtqueue is able to track these and resend them at the destination.
>
> As discussed, there's a bootstrap issue here:
>
> When SVQ needs to be enabled on demand, do we still need another way
> to get inflight ones without the help of SVQ?
>

To send and retrieve the descriptor without SVQ needs to be developed
on top of this. I should have made that more clear here in the cover
letter.

Thanks!

> Thanks
>
> > Add them to the virtio-net migration description so they are not lose in the
> > process.
> >
> > This is a very early RFC just to validate the first draft so expect leftovers.
> > To fetch and request the descriptors from a device without SVQ need to be
> > implemented on top. Some other notable pending items are:
> > * Do not send the descriptors actually recoverable from the guest memory.
> > * Properly version the migrate data.
> > * Properly abstract the descriptors access from virtio-net to SVQ.
> > * Do not use VirtQueueElementOld but migrate directly VirtQueueElement.
> > * Replace lots of assertions with runtime conditionals.
> > * Other TODOs in the patch message or code changes.
> >
> > Thanks.
> >
> > Eugenio Pérez (13):
> >   vhost: add available descriptor list in SVQ
> >   vhost: iterate only available descriptors at SVQ stop
> >   vhost: merge avail list and next avail descriptors detach
> >   vhost: add vhost_svq_save_inflight
> >   virtio: Specify uint32_t as VirtQueueElementOld members type
> >   virtio: refactor qemu_get_virtqueue_element
> >   virtio: refactor qemu_put_virtqueue_element
> >   virtio: expose VirtQueueElementOld
> >   virtio: add vmstate_virtqueue_element_old
> >   virtio-net: Migrate vhost inflight descriptors
> >   virtio-net: save inflight descriptors at vhost shutdown
> >   vhost: expose vhost_svq_add_element
> >   vdpa: Recover inflight descriptors
> >
> >  hw/virtio/vhost-shadow-virtqueue.h |   9 ++
> >  include/hw/virtio/virtio-net.h     |   2 +
> >  include/hw/virtio/virtio.h         |  32 ++++++
> >  include/migration/vmstate.h        |  22 ++++
> >  hw/net/vhost_net.c                 |  56 ++++++++++
> >  hw/net/virtio-net.c                | 129 +++++++++++++++++++++++
> >  hw/virtio/vhost-shadow-virtqueue.c |  52 +++++++--
> >  hw/virtio/vhost-vdpa.c             |  11 --
> >  hw/virtio/virtio.c                 | 162 ++++++++++++++++++-----------
> >  9 files changed, 392 insertions(+), 83 deletions(-)
> >
> > --
> > 2.31.1
> >
> >
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ
  2022-12-07  8:59   ` Eugenio Perez Martin
@ 2022-12-08  7:31     ` Jason Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Wang @ 2022-12-08  7:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela, Parav Pandit

On Wed, Dec 7, 2022 at 5:00 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Dec 6, 2022 at 8:08 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Dec 6, 2022 at 1:04 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > The state of the descriptors (avail or used) may not be recoverable just
> > > looking at the guest memory.  Out of order used descriptor may override
> > > previous avail ones in the descriptor table or avail vring.
> > >
> > > Currently we're not migrating this status in net devices because virtio-net,
> > > vhost-kernel etc use the descriptors in order,
> >
> > Note that this might not be the truth (when zerocopy is enabled).
> >
>
> Good point. So will virtio-net wait for those to complete then?

Vhost-net will wait for all the inflight descriptors to be completed.

> How
> does qemu handle if there are still inflight descriptors?

It doesn't care right now. For networking devices, devices can choose
to flush inflight before suspending. But this won't work for other
type of device like block.

Thanks

>
> > > so the information always
> > > recoverable from guest's memory.  However, vDPA devices may use them out of
> > > order, and other kind of devices like block need this support.
> > >
> > > Shadow virtqueue is able to track these and resend them at the destination.
> >
> > As discussed, there's a bootstrap issue here:
> >
> > When SVQ needs to be enabled on demand, do we still need another way
> > to get inflight ones without the help of SVQ?
> >
>
> To send and retrieve the descriptor without SVQ needs to be developed
> on top of this. I should have made that more clear here in the cover
> letter.
>
> Thanks!
>
> > Thanks
> >
> > > Add them to the virtio-net migration description so they are not lose in the
> > > process.
> > >
> > > This is a very early RFC just to validate the first draft so expect leftovers.
> > > To fetch and request the descriptors from a device without SVQ need to be
> > > implemented on top. Some other notable pending items are:
> > > * Do not send the descriptors actually recoverable from the guest memory.
> > > * Properly version the migrate data.
> > > * Properly abstract the descriptors access from virtio-net to SVQ.
> > > * Do not use VirtQueueElementOld but migrate directly VirtQueueElement.
> > > * Replace lots of assertions with runtime conditionals.
> > > * Other TODOs in the patch message or code changes.
> > >
> > > Thanks.
> > >
> > > Eugenio Pérez (13):
> > >   vhost: add available descriptor list in SVQ
> > >   vhost: iterate only available descriptors at SVQ stop
> > >   vhost: merge avail list and next avail descriptors detach
> > >   vhost: add vhost_svq_save_inflight
> > >   virtio: Specify uint32_t as VirtQueueElementOld members type
> > >   virtio: refactor qemu_get_virtqueue_element
> > >   virtio: refactor qemu_put_virtqueue_element
> > >   virtio: expose VirtQueueElementOld
> > >   virtio: add vmstate_virtqueue_element_old
> > >   virtio-net: Migrate vhost inflight descriptors
> > >   virtio-net: save inflight descriptors at vhost shutdown
> > >   vhost: expose vhost_svq_add_element
> > >   vdpa: Recover inflight descriptors
> > >
> > >  hw/virtio/vhost-shadow-virtqueue.h |   9 ++
> > >  include/hw/virtio/virtio-net.h     |   2 +
> > >  include/hw/virtio/virtio.h         |  32 ++++++
> > >  include/migration/vmstate.h        |  22 ++++
> > >  hw/net/vhost_net.c                 |  56 ++++++++++
> > >  hw/net/virtio-net.c                | 129 +++++++++++++++++++++++
> > >  hw/virtio/vhost-shadow-virtqueue.c |  52 +++++++--
> > >  hw/virtio/vhost-vdpa.c             |  11 --
> > >  hw/virtio/virtio.c                 | 162 ++++++++++++++++++-----------
> > >  9 files changed, 392 insertions(+), 83 deletions(-)
> > >
> > > --
> > > 2.31.1
> > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-06  3:24   ` Jason Wang
  2022-12-07  8:56     ` Eugenio Perez Martin
@ 2023-01-10  3:02     ` Parav Pandit
  2023-01-11  4:34       ` Jason Wang
  1 sibling, 1 reply; 30+ messages in thread
From: Parav Pandit @ 2023-01-10  3:02 UTC (permalink / raw)
  To: Jason Wang, Eugenio Pérez
  Cc: qemu-devel, Liuxiangdong, Stefan Hajnoczi, Harpreet Singh Anand,
	Gautam Dawar, Zhu Lingshan, Cindy Lu, Si-Wei Liu,
	Michael S. Tsirkin, Dr. David Alan Gilbert, Laurent Vivier,
	Eli Cohen, Stefano Garzarella, Juan Quintela

Hi Jason,

> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, December 5, 2022 10:25 PM

> 
> A dumb question, any reason we need bother with virtio-net? It looks to me it's
> not a must and would complicate migration compatibility.

Virtio net vdpa device is processing the descriptors out of order.
This vdpa device doesn’t offer IN_ORDER flag.

And when a VQ is suspended it cannot complete these descriptors as some dummy zero length completions.
The guest VM is flooded with [1].

So it is needed for the devices that doesn’t offer IN_ORDER feature.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/virtio_net.c?h=v6.2-rc3#n1252


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-10  3:02     ` Parav Pandit
@ 2023-01-11  4:34       ` Jason Wang
  2023-01-11  4:40         ` Parav Pandit
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2023-01-11  4:34 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Eugenio Pérez, qemu-devel, Liuxiangdong, Stefan Hajnoczi,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Michael S. Tsirkin, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela

On Tue, Jan 10, 2023 at 11:02 AM Parav Pandit <parav@nvidia.com> wrote:
>
> Hi Jason,
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, December 5, 2022 10:25 PM
>
> >
> > A dumb question, any reason we need bother with virtio-net? It looks to me it's
> > not a must and would complicate migration compatibility.
>
> Virtio net vdpa device is processing the descriptors out of order.
> This vdpa device doesn’t offer IN_ORDER flag.
>
> And when a VQ is suspended it cannot complete these descriptors as some dummy zero length completions.
> The guest VM is flooded with [1].

Yes, but any reason for the device to do out-of-order for RX?

>
> So it is needed for the devices that doesn’t offer IN_ORDER feature.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/virtio_net.c?h=v6.2-rc3#n1252

It is only enabled in a debug kernel which should be harmless?

Thanks

>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-11  4:34       ` Jason Wang
@ 2023-01-11  4:40         ` Parav Pandit
  2023-01-11  5:51           ` Jason Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Parav Pandit @ 2023-01-11  4:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eugenio Pérez, qemu-devel, Liuxiangdong, Stefan Hajnoczi,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Michael S. Tsirkin, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, January 10, 2023 11:35 PM
> 
> On Tue, Jan 10, 2023 at 11:02 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi Jason,
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Monday, December 5, 2022 10:25 PM
> >
> > >
> > > A dumb question, any reason we need bother with virtio-net? It looks
> > > to me it's not a must and would complicate migration compatibility.
> >
> > Virtio net vdpa device is processing the descriptors out of order.
> > This vdpa device doesn’t offer IN_ORDER flag.
> >
> > And when a VQ is suspended it cannot complete these descriptors as some
> dummy zero length completions.
> > The guest VM is flooded with [1].
> 
> Yes, but any reason for the device to do out-of-order for RX?
>
For some devices it is more optimal to process them out of order.
And its not limited to RX.
 
> >
> > So it is needed for the devices that doesn’t offer IN_ORDER feature.
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/net/virtio_net.c?h=v6.2-rc3#n1252
> 
> It is only enabled in a debug kernel which should be harmless?
it is KERN_DEBUG log level. Its is not debug kernel, just the debug log level.
And regardless, generating zero length packets for debug kernel is even more confusing. 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-11  4:40         ` Parav Pandit
@ 2023-01-11  5:51           ` Jason Wang
  2023-01-16 19:53             ` Parav Pandit
  2023-01-16 20:58             ` Michael S. Tsirkin
  0 siblings, 2 replies; 30+ messages in thread
From: Jason Wang @ 2023-01-11  5:51 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Eugenio Pérez, qemu-devel, Liuxiangdong, Stefan Hajnoczi,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Michael S. Tsirkin, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela

On Wed, Jan 11, 2023 at 12:40 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, January 10, 2023 11:35 PM
> >
> > On Tue, Jan 10, 2023 at 11:02 AM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > Hi Jason,
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Monday, December 5, 2022 10:25 PM
> > >
> > > >
> > > > A dumb question, any reason we need bother with virtio-net? It looks
> > > > to me it's not a must and would complicate migration compatibility.
> > >
> > > Virtio net vdpa device is processing the descriptors out of order.
> > > This vdpa device doesn’t offer IN_ORDER flag.
> > >
> > > And when a VQ is suspended it cannot complete these descriptors as some
> > dummy zero length completions.
> > > The guest VM is flooded with [1].
> >
> > Yes, but any reason for the device to do out-of-order for RX?
> >
> For some devices it is more optimal to process them out of order.
> And its not limited to RX.

TX should be fine, since the device can anyhow pretend to send all
packets, so we won't have any in-flight descriptors.

>
> > >
> > > So it is needed for the devices that doesn’t offer IN_ORDER feature.
> > >
> > > [1]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/net/virtio_net.c?h=v6.2-rc3#n1252
> >
> > It is only enabled in a debug kernel which should be harmless?
> it is KERN_DEBUG log level. Its is not debug kernel, just the debug log level.

Ok, but the production environment should not use that level anyhow.

> And regardless, generating zero length packets for debug kernel is even more confusing.

Note that it is allowed in the virtio-spec[1] (we probably can fix
that in the driver) and we have pr_debug() all over this drivers and
other places. It doesn't cause any side effects except for the
debugging purpose.

So I think having inflight tracking is useful, but I'm not sure it's
worth bothering with virtio-net (or worth to bothering now):

- zero length is allowed
- it only helps for debugging
- may cause issues for migration compatibility
- requires new infrastructure to be invented

Thanks

[1] spec said

"
Note: len is particularly useful for drivers using untrusted buffers:
if a driver does not know exactly how much has been written by the
device, the driver would have to zero the buffer in advance to ensure
no data leakage occurs.
"



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-11  5:51           ` Jason Wang
@ 2023-01-16 19:53             ` Parav Pandit
  2023-01-16 20:58             ` Michael S. Tsirkin
  1 sibling, 0 replies; 30+ messages in thread
From: Parav Pandit @ 2023-01-16 19:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eugenio Pérez, qemu-devel, Liuxiangdong, Stefan Hajnoczi,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Michael S. Tsirkin, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 11, 2023 12:51 AM
> 
> On Wed, Jan 11, 2023 at 12:40 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, January 10, 2023 11:35 PM
> > >
> > > On Tue, Jan 10, 2023 at 11:02 AM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > > Hi Jason,
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Monday, December 5, 2022 10:25 PM
> > > >
> > > > >
> > > > > A dumb question, any reason we need bother with virtio-net? It
> > > > > looks to me it's not a must and would complicate migration
> compatibility.
> > > >
> > > > Virtio net vdpa device is processing the descriptors out of order.
> > > > This vdpa device doesn’t offer IN_ORDER flag.
> > > >
> > > > And when a VQ is suspended it cannot complete these descriptors as
> > > > some
> > > dummy zero length completions.
> > > > The guest VM is flooded with [1].
> > >
> > > Yes, but any reason for the device to do out-of-order for RX?
> > >
> > For some devices it is more optimal to process them out of order.
> > And its not limited to RX.
> 
> TX should be fine, since the device can anyhow pretend to send all packets, so
> we won't have any in-flight descriptors.
> 
> >
> > > >
> > > > So it is needed for the devices that doesn’t offer IN_ORDER feature.
> > > >
> > > > [1]
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > /tre
> > > > e/drivers/net/virtio_net.c?h=v6.2-rc3#n1252
> > >
> > > It is only enabled in a debug kernel which should be harmless?
> > it is KERN_DEBUG log level. Its is not debug kernel, just the debug log level.
> 
> Ok, but the production environment should not use that level anyhow.
> 
> > And regardless, generating zero length packets for debug kernel is even more
> confusing.
> 
> Note that it is allowed in the virtio-spec[1] (we probably can fix that in the
> driver) and we have pr_debug() all over this drivers and other places. It doesn't
> cause any side effects except for the debugging purpose.
> 
> So I think having inflight tracking is useful, but I'm not sure it's worth bothering
> with virtio-net (or worth to bothering now):
> 
> - zero length is allowed
This isn’t explicitly called out. It may be worth to add to the spec.

> - it only helps for debugging
> - may cause issues for migration compatibility
We have this missing for long time regardless of this feature. So let's not mix here.

The mlx5 vdpa device can do eventual in-order completion before/at time of suspend, so I think we can wait for now to until more advance hw arrives.

> - requires new infrastructure to be invented
> 
> Thanks
> 
> [1] spec said
> 
This doesn’t say that its zero-length completion. Len is a mandatory field to tell how many bytes device wrote.
> "
> Note: len is particularly useful for drivers using untrusted buffers:
> if a driver does not know exactly how much has been written by the device, the
> driver would have to zero the buffer in advance to ensure no data leakage
> occurs.
> "


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-11  5:51           ` Jason Wang
  2023-01-16 19:53             ` Parav Pandit
@ 2023-01-16 20:58             ` Michael S. Tsirkin
  2023-01-17  6:54               ` Jason Wang
  1 sibling, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2023-01-16 20:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Eugenio Pérez, qemu-devel, Liuxiangdong,
	Stefan Hajnoczi, Harpreet Singh Anand, Gautam Dawar,
	Zhu Lingshan, Cindy Lu, Si-Wei Liu, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela

On Wed, Jan 11, 2023 at 01:51:06PM +0800, Jason Wang wrote:
> On Wed, Jan 11, 2023 at 12:40 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, January 10, 2023 11:35 PM
> > >
> > > On Tue, Jan 10, 2023 at 11:02 AM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > > Hi Jason,
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Monday, December 5, 2022 10:25 PM
> > > >
> > > > >
> > > > > A dumb question, any reason we need bother with virtio-net? It looks
> > > > > to me it's not a must and would complicate migration compatibility.
> > > >
> > > > Virtio net vdpa device is processing the descriptors out of order.
> > > > This vdpa device doesn’t offer IN_ORDER flag.
> > > >
> > > > And when a VQ is suspended it cannot complete these descriptors as some
> > > dummy zero length completions.
> > > > The guest VM is flooded with [1].
> > >
> > > Yes, but any reason for the device to do out-of-order for RX?
> > >
> > For some devices it is more optimal to process them out of order.
> > And its not limited to RX.
> 
> TX should be fine, since the device can anyhow pretend to send all
> packets, so we won't have any in-flight descriptors.

And drop them all? You end up with multisecond delays for things like
DHCP. Yes theoretically packets can be dropped at any time, but
practically people expect this to happen on busy systems, not randomly
out of the blue.

> >
> > > >
> > > > So it is needed for the devices that doesn’t offer IN_ORDER feature.
> > > >
> > > > [1]
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > > e/drivers/net/virtio_net.c?h=v6.2-rc3#n1252
> > >
> > > It is only enabled in a debug kernel which should be harmless?
> > it is KERN_DEBUG log level. Its is not debug kernel, just the debug log level.
> 
> Ok, but the production environment should not use that level anyhow.

It's just one example.  And it's enough in my eyes to prove we really
can't start sending zero length RX buffers to drivers and expect all to be
well. If we want to we need to negotiate a new feature bit.


> > And regardless, generating zero length packets for debug kernel is even more confusing.
> 
> Note that it is allowed in the virtio-spec[1] (we probably can fix
> that in the driver) and we have pr_debug() all over this drivers and
> other places. It doesn't cause any side effects except for the
> debugging purpose.
> 
> So I think having inflight tracking is useful, but I'm not sure it's
> worth bothering with virtio-net (or worth to bothering now):
> 
> - zero length is allowed
> - it only helps for debugging
> - may cause issues for migration compatibility
> - requires new infrastructure to be invented
> 
> Thanks
> 
> [1] spec said
> 
> "
> Note: len is particularly useful for drivers using untrusted buffers:
> if a driver does not know exactly how much has been written by the
> device, the driver would have to zero the buffer in advance to ensure
> no data leakage occurs.
> "

I don't think this talks about zero length at all.
Let me try to explain what this talk about in my opinion.


There are cases where device does not know exactly how much
data it wrote into buffer. Should it over-estimate
such that driver can be sure that buffer after the reported
length is unchanged? Or should it instead under-estimate
such that driver can be sure that the reported length has
been initialized by device?

What this text in the spec says is that it must always
under-estimate and not over-estimate. And it attempts to
explain why this is useful: imagine driver that trusts the
device and wants to make sure buffer is initialized.
With the definition in the spec, it only needs to initialize
data after the reported length. Initialize how? It's up to the
driver but for example it can zero this buffer.

In short, all the text says is "do not over-report length,
only set it to part of buffer you wrote".

Besides that, the text itself is from the original spec and it did not
age well:

1)- no one actually relies on this

2)- rather than untrusted "buffers" what we commonly have is untrusted
  devices so length can't be trusted either

3)- writes on PCI are posted and if your security model
  depends on buffer being initialized and you want to
  recover from errors you really can't expect device to
  give you this info. Luckily no one cares see 1) above.


-- 
MST



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2022-12-07  8:56     ` Eugenio Perez Martin
@ 2023-01-16 21:01       ` Michael S. Tsirkin
  2023-01-17  3:38         ` Jason Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2023-01-16 21:01 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, qemu-devel, Liuxiangdong, Stefan Hajnoczi,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Dr. David Alan Gilbert, Laurent Vivier, Eli Cohen,
	Stefano Garzarella, Juan Quintela, Parav Pandit

On Wed, Dec 07, 2022 at 09:56:20AM +0100, Eugenio Perez Martin wrote:
> > A dumb question, any reason we need bother with virtio-net? It looks
> > to me it's not a must and would complicate migration compatibility.
> >
> > I guess virtio-blk is the better place.
> >
> 
> I'm fine to start with -blk, but if -net devices are processing
> buffers out of order we have chances of losing descriptors too.
> 
> We can wait for more feedback to prioritize correctly this though.
> 
> Thanks!

Traditionally vhost serialized everything when dropping the VQ.
Would be interesting to hear from hardware vendors on whether
it's hard or easy to do in hardware.
But I suspect all devices will want the capability eventually
because why not, if we have the code let's just do it everywhere.

-- 
MST



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-16 21:01       ` Michael S. Tsirkin
@ 2023-01-17  3:38         ` Jason Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Wang @ 2023-01-17  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eugenio Perez Martin, qemu-devel, Liuxiangdong, Stefan Hajnoczi,
	Harpreet Singh Anand, Gautam Dawar, Zhu Lingshan, Cindy Lu,
	Si-Wei Liu, Dr. David Alan Gilbert, Laurent Vivier, Eli Cohen,
	Stefano Garzarella, Juan Quintela, Parav Pandit

On Tue, Jan 17, 2023 at 5:02 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Dec 07, 2022 at 09:56:20AM +0100, Eugenio Perez Martin wrote:
> > > A dumb question, any reason we need bother with virtio-net? It looks
> > > to me it's not a must and would complicate migration compatibility.
> > >
> > > I guess virtio-blk is the better place.
> > >
> >
> > I'm fine to start with -blk, but if -net devices are processing
> > buffers out of order we have chances of losing descriptors too.
> >
> > We can wait for more feedback to prioritize correctly this though.
> >
> > Thanks!
>
> Traditionally vhost serialized everything when dropping the VQ.
> Would be interesting to hear from hardware vendors on whether
> it's hard or easy to do in hardware.

The feedback is some vendors will do the serialization while others are not.

> But I suspect all devices will want the capability eventually
> because why not, if we have the code let's just do it everywhere.

Yes, but it's more about priority/compatibility other than whether it
is needed. We want to let the migration work as earlier as possible
for vendor to test and try.

Thanks

>
> --
> MST
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors
  2023-01-16 20:58             ` Michael S. Tsirkin
@ 2023-01-17  6:54               ` Jason Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Wang @ 2023-01-17  6:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Eugenio Pérez, qemu-devel, Liuxiangdong,
	Stefan Hajnoczi, Harpreet Singh Anand, Gautam Dawar,
	Zhu Lingshan, Cindy Lu, Si-Wei Liu, Dr. David Alan Gilbert,
	Laurent Vivier, Eli Cohen, Stefano Garzarella, Juan Quintela

On Tue, Jan 17, 2023 at 4:58 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 11, 2023 at 01:51:06PM +0800, Jason Wang wrote:
> > On Wed, Jan 11, 2023 at 12:40 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, January 10, 2023 11:35 PM
> > > >
> > > > On Tue, Jan 10, 2023 at 11:02 AM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > > Hi Jason,
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Monday, December 5, 2022 10:25 PM
> > > > >
> > > > > >
> > > > > > A dumb question, any reason we need bother with virtio-net? It looks
> > > > > > to me it's not a must and would complicate migration compatibility.
> > > > >
> > > > > Virtio net vdpa device is processing the descriptors out of order.
> > > > > This vdpa device doesn’t offer IN_ORDER flag.
> > > > >
> > > > > And when a VQ is suspended it cannot complete these descriptors as some
> > > > dummy zero length completions.
> > > > > The guest VM is flooded with [1].
> > > >
> > > > Yes, but any reason for the device to do out-of-order for RX?
> > > >
> > > For some devices it is more optimal to process them out of order.
> > > And its not limited to RX.
> >
> > TX should be fine, since the device can anyhow pretend to send all
> > packets, so we won't have any in-flight descriptors.
>
> And drop them all?

Depends on how many inflight descriptors. This is the worst case and
actually this is how software vhost backends did since it can't
validate whether or not the packet is sent to the wire. And it's not
worse than RX in which a lot of packets will be dropped for sure
during live migration.

> You end up with multisecond delays for things like
> DHCP.

Well, if DHCP is done during the live migration this is somehow
unavoidable, a lot of things needs to be recovered not only from the
migration downtime. E.g it may suffer from delay of gARP packet and
others.

> Yes theoretically packets can be dropped at any time, but
> practically people expect this to happen on busy systems, not randomly
> out of the blue.

The problem is that we never validate whether or not the packet is
sent for a software device. Various layers could be placed under the
vhost, so there's no guarantee that the packet won't be lost.

>
> > >
> > > > >
> > > > > So it is needed for the devices that doesn’t offer IN_ORDER feature.
> > > > >
> > > > > [1]
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > > > e/drivers/net/virtio_net.c?h=v6.2-rc3#n1252
> > > >
> > > > It is only enabled in a debug kernel which should be harmless?
> > > it is KERN_DEBUG log level. Its is not debug kernel, just the debug log level.
> >
> > Ok, but the production environment should not use that level anyhow.
>
> It's just one example.  And it's enough in my eyes to prove we really
> can't start sending zero length RX buffers to drivers and expect all to be
> well. If we want to we need to negotiate a new feature bit.

Ok.

>
>
> > > And regardless, generating zero length packets for debug kernel is even more confusing.
> >
> > Note that it is allowed in the virtio-spec[1] (we probably can fix
> > that in the driver) and we have pr_debug() all over this drivers and
> > other places. It doesn't cause any side effects except for the
> > debugging purpose.
> >
> > So I think having inflight tracking is useful, but I'm not sure it's
> > worth bothering with virtio-net (or worth to bothering now):
> >
> > - zero length is allowed
> > - it only helps for debugging
> > - may cause issues for migration compatibility
> > - requires new infrastructure to be invented
> >
> > Thanks
> >
> > [1] spec said
> >
> > "
> > Note: len is particularly useful for drivers using untrusted buffers:
> > if a driver does not know exactly how much has been written by the
> > device, the driver would have to zero the buffer in advance to ensure
> > no data leakage occurs.
> > "
>
> I don't think this talks about zero length at all.
> Let me try to explain what this talk about in my opinion.
>
>
> There are cases where device does not know exactly how much
> data it wrote into buffer.

Actually, I think the inflight could be one case. Or do you have any
other case when the device doesn't know how much data it wrote?

> Should it over-estimate
> such that driver can be sure that buffer after the reported
> length is unchanged?

I can't think of a case when such over-estimation can help for any
logic. (Filling magic value into the buffer and deduce the actual
length that is written by the device?)

> Or should it instead under-estimate
> such that driver can be sure that the reported length has
> been initialized by device?
>
> What this text in the spec says is that it must always
> under-estimate and not over-estimate. And it attempts to
> explain why this is useful: imagine driver that trusts the
> device and wants to make sure buffer is initialized.
> With the definition in the spec, it only needs to initialize
> data after the reported length.

Just to make sure I understand, such initialization can happen only
after the buffer is completed by the device. But after that the buffer
doesn't belong to the device anymore so drivers are free to do any
initialization they want. Or anything makes this special?

Thanks

> Initialize how? It's up to the
> driver but for example it can zero this buffer.
>
>
> In short, all the text says is "do not over-report length,
> only set it to part of buffer you wrote".
>
> Besides that, the text itself is from the original spec and it did not
> age well:
>
> 1)- no one actually relies on this
>
> 2)- rather than untrusted "buffers" what we commonly have is untrusted
>   devices so length can't be trusted either
>
> 3)- writes on PCI are posted and if your security model
>   depends on buffer being initialized and you want to
>   recover from errors you really can't expect device to
>   give you this info. Luckily no one cares see 1) above.
>
>
> --
> MST
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-01-17  6:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-05 17:04 [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 01/13] vhost: add available descriptor list in SVQ Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 02/13] vhost: iterate only available descriptors at SVQ stop Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 03/13] vhost: merge avail list and next avail descriptors detach Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 04/13] vhost: add vhost_svq_save_inflight Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 05/13] virtio: Specify uint32_t as VirtQueueElementOld members type Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 06/13] virtio: refactor qemu_get_virtqueue_element Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 07/13] virtio: refactor qemu_put_virtqueue_element Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 08/13] virtio: expose VirtQueueElementOld Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 09/13] virtio: add vmstate_virtqueue_element_old Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 10/13] virtio-net: Migrate vhost inflight descriptors Eugenio Pérez
2022-12-05 20:52   ` Parav Pandit
2022-12-07  8:40     ` Eugenio Perez Martin
2022-12-06  3:24   ` Jason Wang
2022-12-07  8:56     ` Eugenio Perez Martin
2023-01-16 21:01       ` Michael S. Tsirkin
2023-01-17  3:38         ` Jason Wang
2023-01-10  3:02     ` Parav Pandit
2023-01-11  4:34       ` Jason Wang
2023-01-11  4:40         ` Parav Pandit
2023-01-11  5:51           ` Jason Wang
2023-01-16 19:53             ` Parav Pandit
2023-01-16 20:58             ` Michael S. Tsirkin
2023-01-17  6:54               ` Jason Wang
2022-12-05 17:04 ` [RFC PATCH for 8.0 11/13] virtio-net: save inflight descriptors at vhost shutdown Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 12/13] vhost: expose vhost_svq_add_element Eugenio Pérez
2022-12-05 17:04 ` [RFC PATCH for 8.0 13/13] vdpa: Recover inflight descriptors Eugenio Pérez
2022-12-06  7:07 ` [RFC PATCH for 8.0 00/13] vDPA-net inflight descriptors migration with SVQ Jason Wang
2022-12-07  8:59   ` Eugenio Perez Martin
2022-12-08  7:31     ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.