All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
@ 2023-02-08  9:42 Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 01/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
                   ` (14 more replies)
  0 siblings, 15 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

It's possible to migrate vdpa net devices if they are shadowed from the
start.  But to always shadow the dataplane is to effectively break its host
passthrough, so its not convenient in vDPA scenarios.

This series enables dynamically switching to shadow mode only at
migration time.  This allows full data virtqueues passthrough all the
time qemu is not migrating.

In this series only net devices with no CVQ are migratable.  CVQ adds
additional state that would make the series bigger and still had some
controversy on previous RFC, so let's split it.

The first patch delays the creation of the iova tree until it is really needed,
and makes it easier to dynamically move from and to SVQ mode.

Next patches from 02 to 05 handle the suspending and getting of vq state (base)
of the device at the switch to SVQ mode.  The new _F_SUSPEND feature is
negotiated and stop device flow is changed so the state can be fetched trusting
the device will not modify it.

Since vhost backend must offer VHOST_F_LOG_ALL to be migratable, last patches
but the last one add the needed migration blockers so vhost-vdpa can offer it
safely.  They also add the handling of this feature.

Finally, the last patch makes virtio vhost-vdpa backend to offer
VHOST_F_LOG_ALL so qemu migrate the device as long as no other blocker has been
added.

Successfully tested with vdpa_sim_net with patch [1] applied and with the qemu
emulated device with vp_vdpa with some restrictions:
* No CVQ. No feature that didn't work with SVQ previously (packed, ...)
* VIRTIO_RING_F_STATE patches implementing [2].
* Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
  DPDK.

Comments are welcome.

v2:
- Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is empty at
  the check moment.

v1:
- Omit all code working with CVQ and block migration if the device supports
  CVQ.
- Remove spurious kick.
- Move all possible checks for migration to vhost-vdpa instead of the net
  backend. Move them to init code from start code.
- Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net backend.
- Properly split suspend after geting base and adding of status_reset patches.
- Add possible TODOs to points where this series can improve in the future.
- Check the state of migration using migration_in_setup and
  migration_has_failed instead of checking all the possible migration status in
  a switch.
- Add TODO with possible low hand fruit using RESUME ops.
- Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
  their thing instead of adding a variable.
- RFC v2 at https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html

RFC v2:
- Use a migration listener instead of a memory listener to know when
  the migration starts.
- Add stuff not picked with ASID patches, like enable rings after
  driver_ok
- Add rewinding on the migration src, not in dst
- RFC v1 at https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html

[1] https://lore.kernel.org/lkml/20230203142501.300125-1-eperezma@redhat.com/T/
[2] https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html

Eugenio Pérez (13):
  vdpa net: move iova tree creation from init to start
  vdpa: Negotiate _F_SUSPEND feature
  vdpa: add vhost_vdpa_suspend
  vdpa: move vhost reset after get vring base
  vdpa: rewind at get_base, not set_base
  vdpa net: allow VHOST_F_LOG_ALL
  vdpa: add vdpa net migration state notifier
  vdpa: disable RAM block discard only for the first device
  vdpa net: block migration if the device has CVQ
  vdpa: block migration if device has unsupported features
  vdpa: block migration if dev does not have _F_SUSPEND
  vdpa: block migration if SVQ does not admit a feature
  vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices

 include/hw/virtio/vhost-backend.h |   4 +
 hw/virtio/vhost-vdpa.c            | 126 +++++++++++++++-----
 hw/virtio/vhost.c                 |   3 +
 net/vhost-vdpa.c                  | 192 +++++++++++++++++++++++++-----
 hw/virtio/trace-events            |   1 +
 5 files changed, 267 insertions(+), 59 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-13  6:50     ` Si-Wei Liu
  2023-02-08  9:42 ` [PATCH v2 02/13] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Only create iova_tree if and when it is needed.

The cleanup keeps being responsible of last VQ but this change allows it
to merge both cleanup functions.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 28 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index de5ed8ff22..a9e6c8f28e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -178,13 +178,9 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
-    struct vhost_dev *dev = &s->vhost_net->dev;
 
     qemu_vfree(s->cvq_cmd_out_buffer);
     qemu_vfree(s->status);
-    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
-        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
-    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
     return size;
 }
 
+/** From any vdpa net client, get the netclient of first queue pair */
+static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
+{
+    NICState *nic = qemu_get_nic(s->nc.peer);
+    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
+
+    return DO_UPCAST(VhostVDPAState, nc, nc0);
+}
+
+static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
+{
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    if (v->shadow_vqs_enabled) {
+        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
+                                           v->iova_range.last);
+    }
+}
+
+static int vhost_vdpa_net_data_start(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    if (v->index == 0) {
+        vhost_vdpa_net_data_start_first(s);
+        return 0;
+    }
+
+    if (v->shadow_vqs_enabled) {
+        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
+        v->iova_tree = s0->vhost_vdpa.iova_tree;
+    }
+
+    return 0;
+}
+
+static void vhost_vdpa_net_client_stop(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_dev *dev;
+
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    dev = s->vhost_vdpa.dev;
+    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+    }
+}
+
 static NetClientInfo net_vhost_vdpa_info = {
         .type = NET_CLIENT_DRIVER_VHOST_VDPA,
         .size = sizeof(VhostVDPAState),
         .receive = vhost_vdpa_receive,
+        .start = vhost_vdpa_net_data_start,
+        .stop = vhost_vdpa_net_client_stop,
         .cleanup = vhost_vdpa_cleanup,
         .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
         .has_ufo = vhost_vdpa_has_ufo,
@@ -351,7 +401,7 @@ dma_map_err:
 
 static int vhost_vdpa_net_cvq_start(NetClientState *nc)
 {
-    VhostVDPAState *s;
+    VhostVDPAState *s, *s0;
     struct vhost_vdpa *v;
     uint64_t backend_features;
     int64_t cvq_group;
@@ -425,6 +475,15 @@ out:
         return 0;
     }
 
+    s0 = vhost_vdpa_net_first_nc_vdpa(s);
+    if (s0->vhost_vdpa.iova_tree) {
+        /* SVQ is already configured for all virtqueues */
+        v->iova_tree = s0->vhost_vdpa.iova_tree;
+    } else {
+        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
+                                           v->iova_range.last);
+    }
+
     r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
                                vhost_vdpa_net_cvq_cmd_page_len(), false);
     if (unlikely(r < 0)) {
@@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
     if (s->vhost_vdpa.shadow_vqs_enabled) {
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
-        if (!s->always_svq) {
-            /*
-             * If only the CVQ is shadowed we can delete this safely.
-             * If all the VQs are shadows this will be needed by the time the
-             * device is started again to register SVQ vrings and similar.
-             */
-            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
-        }
     }
+
+    vhost_vdpa_net_client_stop(nc);
 }
 
 static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
@@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        int nvqs,
                                        bool is_datapath,
                                        bool svq,
-                                       struct vhost_vdpa_iova_range iova_range,
-                                       VhostIOVATree *iova_tree)
+                                       struct vhost_vdpa_iova_range iova_range)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_range = iova_range;
     s->vhost_vdpa.shadow_data = svq;
-    s->vhost_vdpa.iova_tree = iova_tree;
     if (!is_datapath) {
         s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
                                             vhost_vdpa_net_cvq_cmd_page_len());
@@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
-    g_autoptr(VhostIOVATree) iova_tree = NULL;
     struct vhost_vdpa_iova_range iova_range;
     NetClientState *nc;
     int queue_pairs, r, i = 0, has_cvq = 0;
@@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         goto err;
     }
 
-    if (opts->x_svq) {
-        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
-            goto err_svq;
-        }
-
-        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
+        goto err;
     }
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
@@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                      vdpa_device_fd, i, 2, true, opts->x_svq,
-                                     iova_range, iova_tree);
+                                     iova_range);
         if (!ncs[i])
             goto err;
     }
@@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                  vdpa_device_fd, i, 1, false,
-                                 opts->x_svq, iova_range, iova_tree);
+                                 opts->x_svq, iova_range);
         if (!nc)
             goto err;
     }
 
-    /* iova_tree ownership belongs to last NetClientState */
-    g_steal_pointer(&iova_tree);
     return 0;
 
 err:
@@ -849,7 +893,6 @@ err:
         }
     }
 
-err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 02/13] vdpa: Negotiate _F_SUSPEND feature
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 01/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend Eugenio Pérez
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

This is needed for qemu to know it can suspend the device to retrieve
its status and enable SVQ with it, so all the process is transparent to
the guest.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 542e003101..2e79fbe4b2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -659,7 +659,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
     uint64_t features;
     uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
         0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
+        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
+        0x1ULL << VHOST_BACKEND_F_SUSPEND;
     int r;
 
     if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 01/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 02/13] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-21  5:27     ` Jason Wang
  2023-02-08  9:42 ` [PATCH v2 04/13] vdpa: move vhost reset after get vring base Eugenio Pérez
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

The function vhost.c:vhost_dev_stop fetches the vring base so the vq
state can be migrated to other devices.  However, this is unreliable in
vdpa, since we didn't signal the device to suspend the queues, making
the value fetched useless.

Suspend the device if possible before fetching first and subsequent
vring bases.

Moreover, vdpa totally reset and wipes the device at the last device
before fetch its vrings base, making that operation useless in the last
device. This will be fixed in later patches of this series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 19 +++++++++++++++++++
 hw/virtio/trace-events |  1 +
 2 files changed, 20 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2e79fbe4b2..cbbe92ffe8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1108,6 +1108,24 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
     }
 }
 
+static void vhost_vdpa_suspend(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    int r;
+
+    if (!vhost_vdpa_first_dev(dev) ||
+        !(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
+        return;
+    }
+
+    trace_vhost_vdpa_suspend(dev);
+    r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
+    if (unlikely(r)) {
+        error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
+        /* Not aborting since we're called from stop context */
+    }
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
@@ -1122,6 +1140,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         }
         vhost_vdpa_set_vring_ready(dev);
     } else {
+        vhost_vdpa_suspend(dev);
         vhost_vdpa_svqs_stop(dev);
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a87c5f39a2..8f8d05cf9b 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
 vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
 vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
 vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
+vhost_vdpa_suspend(void *dev) "dev: %p"
 vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 04/13] vdpa: move vhost reset after get vring base
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (2 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-21  5:36     ` Jason Wang
  2023-02-08  9:42 ` [PATCH v2 05/13] vdpa: rewind at get_base, not set_base Eugenio Pérez
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

The function vhost.c:vhost_dev_stop calls vhost operation
vhost_dev_start(false). In the case of vdpa it totally reset and wipes
the device, making the fetching of the vring base (virtqueue state) totally
useless.

The kernel backend does not use vhost_dev_start vhost op callback, but
vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
is desirable, but it can be added on top.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-backend.h |  4 ++++
 hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
 hw/virtio/vhost.c                 |  3 +++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index c5ab49051e..ec3fbae58d 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
 
 typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
                                        int fd);
+
+typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -177,6 +180,7 @@ typedef struct VhostOps {
     vhost_get_device_id_op vhost_get_device_id;
     vhost_force_iommu_op vhost_force_iommu;
     vhost_set_config_call_op vhost_set_config_call;
+    vhost_reset_status_op vhost_reset_status;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index cbbe92ffe8..26e38a6aab 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     if (started) {
         memory_listener_register(&v->listener, &address_space_memory);
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-    } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
-        memory_listener_unregister(&v->listener);
+    }
 
-        return 0;
+    return 0;
+}
+
+static void vhost_vdpa_reset_status(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+        return;
     }
+
+    vhost_vdpa_reset_device(dev);
+    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                                VIRTIO_CONFIG_S_DRIVER);
+    memory_listener_unregister(&v->listener);
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
@@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
         .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
         .vhost_force_iommu = vhost_vdpa_force_iommu,
         .vhost_set_config_call = vhost_vdpa_set_config_call,
+        .vhost_reset_status = vhost_vdpa_reset_status,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index eb8c4c378c..a266396576 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
                              hdev->vqs + i,
                              hdev->vq_index + i);
     }
+    if (hdev->vhost_ops->vhost_reset_status) {
+        hdev->vhost_ops->vhost_reset_status(hdev);
+    }
 
     if (vhost_dev_has_iommu(hdev)) {
         if (hdev->vhost_ops->vhost_set_iotlb_callback) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 05/13] vdpa: rewind at get_base, not set_base
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (3 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 04/13] vdpa: move vhost reset after get vring base Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-21  5:40     ` Jason Wang
  2023-02-08  9:42 ` [PATCH v2 06/13] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

At this moment it is only possible to migrate to a vdpa device running
with x-svq=on. As a protective measure, the rewind of the inflight
descriptors was done at the destination. That way if the source sent a
virtqueue with inuse descriptors they are always discarded.

Since this series allows to migrate also to passthrough devices with no
SVQ, the right thing to do is to rewind at the source so the base of
vrings are correct.

Support for inflight descriptors may be added in the future.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 26e38a6aab..d99db0bd03 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1211,18 +1211,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
     struct vhost_vdpa *v = dev->opaque;
-    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
 
-    /*
-     * vhost-vdpa devices does not support in-flight requests. Set all of them
-     * as available.
-     *
-     * TODO: This is ok for networking, but other kinds of devices might
-     * have problems with these retransmissions.
-     */
-    while (virtqueue_rewind(vq, 1)) {
-        continue;
-    }
     if (v->shadow_vqs_enabled) {
         /*
          * Device vring base was set at device start. SVQ base is handled by
@@ -1241,6 +1230,19 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
     int ret;
 
     if (v->shadow_vqs_enabled) {
+        VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
+
+        /*
+         * vhost-vdpa devices does not support in-flight requests. Set all of
+         * them as available.
+         *
+         * TODO: This is ok for networking, but other kinds of devices might
+         * have problems with these retransmissions.
+         */
+        while (virtqueue_rewind(vq, 1)) {
+            continue;
+        }
+
         ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
         return 0;
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 06/13] vdpa net: allow VHOST_F_LOG_ALL
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (4 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 05/13] vdpa: rewind at get_base, not set_base Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 07/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Since some actions move to the start function instead of init, the
device features may not be the parent vdpa device's, but the one
returned by vhost backend.  If transition to SVQ is supported, the vhost
backend will return _F_LOG_ALL to signal the device is migratable.

Add VHOST_F_LOG_ALL.  HW dirty page tracking can be added on top of this
change if the device supports it in the future.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a9e6c8f28e..dd686b4514 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -98,6 +98,8 @@ static const uint64_t vdpa_svq_device_features =
     BIT_ULL(VIRTIO_NET_F_MQ) |
     BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
     BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+    /* VHOST_F_LOG_ALL is exposed by SVQ */
+    BIT_ULL(VHOST_F_LOG_ALL) |
     BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
     BIT_ULL(VIRTIO_NET_F_STANDBY);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (5 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 06/13] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-13  6:50     ` Si-Wei Liu
  2023-02-22  3:55     ` Jason Wang
  2023-02-08  9:42 ` [PATCH v2 08/13] vdpa: disable RAM block discard only for the first device Eugenio Pérez
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

This allows net to restart the device backend to configure SVQ on it.

Ideally, these changes should not be net specific. However, the vdpa net
backend is the one with enough knowledge to configure everything because
of some reasons:
* Queues might need to be shadowed or not depending on its kind (control
  vs data).
* Queues need to share the same map translations (iova tree).

Because of that it is cleaner to restart the whole net backend and
configure again as expected, similar to how vhost-kernel moves between
userspace and passthrough.

If more kinds of devices need dynamic switching to SVQ we can create a
callback struct like VhostOps and move most of the code there.
VhostOps cannot be reused since all vdpa backend share them, and to
personalize just for networking would be too heavy.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
v3:
* Add TODO to use the resume operation in the future.
* Use migration_in_setup and migration_has_failed instead of a
  complicated switch case.
---
 net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index dd686b4514..bca13f97fd 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -26,12 +26,14 @@
 #include <err.h>
 #include "standard-headers/linux/virtio_net.h"
 #include "monitor/monitor.h"
+#include "migration/misc.h"
 #include "hw/virtio/vhost.h"
 
 /* Todo:need to add the multiqueue support here */
 typedef struct VhostVDPAState {
     NetClientState nc;
     struct vhost_vdpa vhost_vdpa;
+    Notifier migration_state;
     VHostNetState *vhost_net;
 
     /* Control commands shadow buffers */
@@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
     return DO_UPCAST(VhostVDPAState, nc, nc0);
 }
 
+static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
+{
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+    VirtIONet *n;
+    VirtIODevice *vdev;
+    int data_queue_pairs, cvq, r;
+    NetClientState *peer;
+
+    /* We are only called on the first data vqs and only if x-svq is not set */
+    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
+        return;
+    }
+
+    vdev = v->dev->vdev;
+    n = VIRTIO_NET(vdev);
+    if (!n->vhost_started) {
+        return;
+    }
+
+    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+                                  n->max_ncs - n->max_queue_pairs : 0;
+    /*
+     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
+     * in the future and resume the device if read-only operations between
+     * suspend and reset goes wrong.
+     */
+    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
+
+    peer = s->nc.peer;
+    for (int i = 0; i < data_queue_pairs + cvq; i++) {
+        VhostVDPAState *vdpa_state;
+        NetClientState *nc;
+
+        if (i < data_queue_pairs) {
+            nc = qemu_get_peer(peer, i);
+        } else {
+            nc = qemu_get_peer(peer, n->max_queue_pairs);
+        }
+
+        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
+        vdpa_state->vhost_vdpa.shadow_data = enable;
+
+        if (i < data_queue_pairs) {
+            /* Do not override CVQ shadow_vqs_enabled */
+            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
+        }
+    }
+
+    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
+    if (unlikely(r < 0)) {
+        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
+    }
+}
+
+static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
+{
+    MigrationState *migration = data;
+    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
+                                     migration_state);
+
+    if (migration_in_setup(migration)) {
+        vhost_vdpa_net_log_global_enable(s, true);
+    } else if (migration_has_failed(migration)) {
+        vhost_vdpa_net_log_global_enable(s, false);
+    }
+}
+
 static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
 {
     struct vhost_vdpa *v = &s->vhost_vdpa;
 
+    add_migration_state_change_notifier(&s->migration_state);
     if (v->shadow_vqs_enabled) {
         v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
                                            v->iova_range.last);
@@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
+    if (s->vhost_vdpa.index == 0) {
+        remove_migration_state_change_notifier(&s->migration_state);
+    }
+
     dev = s->vhost_vdpa.dev;
     if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
         g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
@@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
     s->always_svq = svq;
+    s->migration_state.notify = vdpa_net_migration_state_notifier;
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_range = iova_range;
     s->vhost_vdpa.shadow_data = svq;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 08/13] vdpa: disable RAM block discard only for the first device
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (6 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 07/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 09/13] vdpa net: block migration if the device has CVQ Eugenio Pérez
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Although it does not make a big difference, its more correct and
simplifies the cleanup path in subsequent patches.

Move ram_block_discard_disable(false) call to the top of
vhost_vdpa_cleanup because:
* We cannot use vhost_vdpa_first_dev after dev->opaque = NULL
  assignment.
* Improve the stack order in cleanup: since it is the last action taken
  in init, it should be the first at cleanup.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d99db0bd03..84a6b9690b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -431,16 +431,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     trace_vhost_vdpa_init(dev, opaque);
     int ret;
 
-    /*
-     * Similar to VFIO, we end up pinning all guest memory and have to
-     * disable discarding of RAM.
-     */
-    ret = ram_block_discard_disable(true);
-    if (ret) {
-        error_report("Cannot set discarding of RAM broken");
-        return ret;
-    }
-
     v = opaque;
     v->dev = dev;
     dev->opaque =  opaque ;
@@ -452,6 +442,16 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
         return 0;
     }
 
+    /*
+     * Similar to VFIO, we end up pinning all guest memory and have to
+     * disable discarding of RAM.
+     */
+    ret = ram_block_discard_disable(true);
+    if (ret) {
+        error_report("Cannot set discarding of RAM broken");
+        return ret;
+    }
+
     vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                VIRTIO_CONFIG_S_DRIVER);
 
@@ -577,12 +577,15 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
     v = dev->opaque;
     trace_vhost_vdpa_cleanup(dev, v);
+    if (vhost_vdpa_first_dev(dev)) {
+        ram_block_discard_disable(false);
+    }
+
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     memory_listener_unregister(&v->listener);
     vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
-    ram_block_discard_disable(false);
 
     return 0;
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (7 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 08/13] vdpa: disable RAM block discard only for the first device Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-13  6:50     ` Si-Wei Liu
  2023-02-22  4:00     ` Jason Wang
  2023-02-08  9:42 ` [PATCH v2 10/13] vdpa: block migration if device has unsupported features Eugenio Pérez
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Devices with CVQ needs to migrate state beyond vq state.  Leaving this
to future series.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index bca13f97fd..309861e56c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     }
 
     if (has_cvq) {
+        VhostVDPAState *s;
+
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                  vdpa_device_fd, i, 1, false,
                                  opts->x_svq, iova_range);
         if (!nc)
             goto err;
+
+        s = DO_UPCAST(VhostVDPAState, nc, nc);
+        error_setg(&s->vhost_vdpa.dev->migration_blocker,
+                   "net vdpa cannot migrate with MQ feature");
     }
 
     return 0;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 10/13] vdpa: block migration if device has unsupported features
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (8 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 09/13] vdpa net: block migration if the device has CVQ Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND Eugenio Pérez
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

A vdpa net device must initialize with SVQ in order to be migratable at
this moment, and initialization code verifies some conditions.  If the
device is not initialized with the x-svq parameter, it will not expose
_F_LOG so the vhost subsystem will block VM migration from its
initialization.

Next patches change this, so we need to verify migration conditions
differently.

QEMU only supports a subset of net features in SVQ, and it cannot
migrate state that cannot track or restore in the destination.  Add a
migration blocker if the device offer an unsupported feature.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 309861e56c..a0c4d5de2c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -952,6 +952,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                                      iova_range);
         if (!ncs[i])
             goto err;
+
+        if (i == 0) {
+            VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, ncs[0]);
+
+            if (!s->vhost_vdpa.dev->migration_blocker) {
+                vhost_vdpa_net_valid_svq_features(features,
+                                        &s->vhost_vdpa.dev->migration_blocker);
+            }
+        }
     }
 
     if (has_cvq) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (9 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 10/13] vdpa: block migration if device has unsupported features Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-22  4:05     ` Jason Wang
  2023-02-08  9:42 ` [PATCH v2 12/13] vdpa: block migration if SVQ does not admit a feature Eugenio Pérez
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Next patches enable devices to be migrated even if vdpa netdev has not
been started with x-svq. However, not all devices are migratable, so we
need to block migration if we detect that.

Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
has not been started with x-svq.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 84a6b9690b..9d30cf9b3c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
         return 0;
     }
 
+    /*
+     * If dev->shadow_vqs_enabled at initialization that means the device has
+     * been started with x-svq=on, so don't block migration
+     */
+    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
+        uint64_t backend_features;
+
+        /* We don't have dev->backend_features yet */
+        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
+                              &backend_features);
+        if (unlikely(ret)) {
+            error_setg_errno(errp, -ret, "Could not get backend features");
+            return ret;
+        }
+
+        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
+            error_setg(&dev->migration_blocker,
+                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
+        }
+    }
+
     /*
      * Similar to VFIO, we end up pinning all guest memory and have to
      * disable discarding of RAM.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 12/13] vdpa: block migration if SVQ does not admit a feature
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (10 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-08  9:42 ` [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices Eugenio Pérez
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Next patches enable devices to be migrated even if vdpa netdev has not
been started with x-svq. However, not all devices are migratable, so we
need to block migration if we detect that.

Block migration if we detect the device expose a feature SVQ does not
know how to work with.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9d30cf9b3c..13a86a2bb1 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -460,6 +460,15 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
         if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
             error_setg(&dev->migration_blocker,
                 "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
+        } else {
+            /* We don't have dev->features yet */
+            uint64_t features;
+            ret = vhost_vdpa_get_dev_features(dev, &features);
+            if (unlikely(ret)) {
+                error_setg_errno(errp, -ret, "Could not get device features");
+                return ret;
+            }
+            vhost_svq_valid_features(features, &dev->migration_blocker);
         }
     }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (11 preceding siblings ...)
  2023-02-08  9:42 ` [PATCH v2 12/13] vdpa: block migration if SVQ does not admit a feature Eugenio Pérez
@ 2023-02-08  9:42 ` Eugenio Pérez
  2023-02-22  4:07     ` Jason Wang
  2023-02-08 10:29   ` Alvaro Karsz
  2023-02-10 12:57 ` Gautam Dawar
  14 siblings, 1 reply; 68+ messages in thread
From: Eugenio Pérez @ 2023-02-08  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

vhost-vdpa devices can return this features now that blockers have been
set in case some features are not met.

Expose VHOST_F_LOG_ALL only in that case.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 13a86a2bb1..5fddc77c5c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1319,10 +1319,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
 static int vhost_vdpa_get_features(struct vhost_dev *dev,
                                      uint64_t *features)
 {
-    struct vhost_vdpa *v = dev->opaque;
     int ret = vhost_vdpa_get_dev_features(dev, features);
 
-    if (ret == 0 && v->shadow_vqs_enabled) {
+    if (ret == 0) {
         /* Add SVQ logging capabilities */
         *features |= BIT_ULL(VHOST_F_LOG_ALL);
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
@ 2023-02-08 10:29   ` Alvaro Karsz
  2023-02-08  9:42 ` [PATCH v2 02/13] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 68+ messages in thread
From: Alvaro Karsz @ 2023-02-08 10:29 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong

HI Eugenio, thanks for the series!

I tested the series with our DPU, SolidNET.

The test went as follow:

- Create 2 virtio net vdpa devices, every device in a separated VF.
- Start 2 VMs with the vdpa device as a single network device, without
shadow vq.
   The source VM with "-netdev
vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=hostnet0"
   The destination VM with "-netdev
vhost-vdpa,vhostdev=/dev/vhost-vdpa-1,id=hostnet0"
- Boot the source VM, test the network by pinging.
- Migrate
- Test the destination VM network.

Everything worked fine.

Tested-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
@ 2023-02-08 10:29   ` Alvaro Karsz
  0 siblings, 0 replies; 68+ messages in thread
From: Alvaro Karsz @ 2023-02-08 10:29 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, Zhu Lingshan, Lei Yang,
	Liuxiangdong, Shannon Nelson, Parav Pandit, Gautam Dawar,
	Eli Cohen, Stefan Hajnoczi, Laurent Vivier, longpeng2,
	virtualization, Stefano Garzarella, si-wei.liu

HI Eugenio, thanks for the series!

I tested the series with our DPU, SolidNET.

The test went as follow:

- Create 2 virtio net vdpa devices, every device in a separated VF.
- Start 2 VMs with the vdpa device as a single network device, without
shadow vq.
   The source VM with "-netdev
vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=hostnet0"
   The destination VM with "-netdev
vhost-vdpa,vhostdev=/dev/vhost-vdpa-1,id=hostnet0"
- Boot the source VM, test the network by pinging.
- Migrate
- Test the destination VM network.

Everything worked fine.

Tested-by: Alvaro Karsz <alvaro.karsz@solid-run.com>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
  2023-02-08 10:29   ` Alvaro Karsz
  (?)
@ 2023-02-09 14:38   ` Lei Yang
  -1 siblings, 0 replies; 68+ messages in thread
From: Lei Yang @ 2023-02-09 14:38 UTC (permalink / raw)
  To: Alvaro Karsz
  Cc: Eugenio Pérez, qemu-devel, Harpreet Singh Anand,
	Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, Zhu Lingshan,
	Liuxiangdong, Shannon Nelson, Parav Pandit, Gautam Dawar,
	Eli Cohen, Stefan Hajnoczi, Laurent Vivier, longpeng2,
	virtualization, Stefano Garzarella, si-wei.liu

QE tested this series on the rhel. Creating two vdpa_sim devices, and
boot two VMs without shadow vq. The migration was successful and
everything worked fine

Tested-by: Lei Yang <leiyang@redhat.com>

Alvaro Karsz <alvaro.karsz@solid-run.com> 于2023年2月8日周三 18:29写道:
>
> HI Eugenio, thanks for the series!
>
> I tested the series with our DPU, SolidNET.
>
> The test went as follow:
>
> - Create 2 virtio net vdpa devices, every device in a separated VF.
> - Start 2 VMs with the vdpa device as a single network device, without
> shadow vq.
>    The source VM with "-netdev
> vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=hostnet0"
>    The destination VM with "-netdev
> vhost-vdpa,vhostdev=/dev/vhost-vdpa-1,id=hostnet0"
> - Boot the source VM, test the network by pinging.
> - Migrate
> - Test the destination VM network.
>
> Everything worked fine.
>
> Tested-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
  2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
                   ` (13 preceding siblings ...)
  2023-02-08 10:29   ` Alvaro Karsz
@ 2023-02-10 12:57 ` Gautam Dawar
  2023-02-15 18:40   ` Eugenio Perez Martin
  14 siblings, 1 reply; 68+ messages in thread
From: Gautam Dawar @ 2023-02-10 12:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

Hi Eugenio,

I've tested this patch series on Xilinx/AMD SN1022 device without 
control vq and VM Live Migration between two hosts worked fine.

Tested-by: Gautam Dawar <gautam.dawar@amd.com>


Here is some minor feedback:

Pls fix the typo (Dynamycally -> Dynamically) in the Subject.

On 2/8/23 15:12, Eugenio Pérez wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
> It's possible to migrate vdpa net devices if they are shadowed from the
>
> start.  But to always shadow the dataplane is to effectively break its host
>
> passthrough, so its not convenient in vDPA scenarios.
I believe you meant efficient instead of convenient.
>
>
>
> This series enables dynamically switching to shadow mode only at
>
> migration time.  This allows full data virtqueues passthrough all the
>
> time qemu is not migrating.
>
>
>
> In this series only net devices with no CVQ are migratable.  CVQ adds
>
> additional state that would make the series bigger and still had some
>
> controversy on previous RFC, so let's split it.
>
>
>
> The first patch delays the creation of the iova tree until it is really needed,
>
> and makes it easier to dynamically move from and to SVQ mode.
It would help adding some detail on the iova tree being referred to here.
>
>
>
> Next patches from 02 to 05 handle the suspending and getting of vq state (base)
>
> of the device at the switch to SVQ mode.  The new _F_SUSPEND feature is
>
> negotiated and stop device flow is changed so the state can be fetched trusting
>
> the device will not modify it.
>
>
>
> Since vhost backend must offer VHOST_F_LOG_ALL to be migratable, last patches
>
> but the last one add the needed migration blockers so vhost-vdpa can offer it

"last patches but the last one"?

Thanks.

>
> safely.  They also add the handling of this feature.
>
>
>
> Finally, the last patch makes virtio vhost-vdpa backend to offer
>
> VHOST_F_LOG_ALL so qemu migrate the device as long as no other blocker has been
>
> added.
>
>
>
> Successfully tested with vdpa_sim_net with patch [1] applied and with the qemu
>
> emulated device with vp_vdpa with some restrictions:
>
> * No CVQ. No feature that didn't work with SVQ previously (packed, ...)
>
> * VIRTIO_RING_F_STATE patches implementing [2].
>
> * Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
>
>    DPDK.
>
>
>
> Comments are welcome.
>
>
>
> v2:
>
> - Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is empty at
>
>    the check moment.
>
>
>
> v1:
>
> - Omit all code working with CVQ and block migration if the device supports
>
>    CVQ.
>
> - Remove spurious kick.
Even with the spurious kick, datapath didn't resume at destination VM 
after LM as kick happened before DRIVER_OK. So IMO, it will be required 
that the vdpa parent driver simulates a kick after creating/starting HW 
rings.
>
> - Move all possible checks for migration to vhost-vdpa instead of the net
>
>    backend. Move them to init code from start code.
>
> - Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net backend.
>
> - Properly split suspend after geting base and adding of status_reset patches.
>
> - Add possible TODOs to points where this series can improve in the future.
>
> - Check the state of migration using migration_in_setup and
>
>    migration_has_failed instead of checking all the possible migration status in
>
>    a switch.
>
> - Add TODO with possible low hand fruit using RESUME ops.
>
> - Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
>
>    their thing instead of adding a variable.
>
> - RFC v2 at https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html
>
>
>
> RFC v2:
>
> - Use a migration listener instead of a memory listener to know when
>
>    the migration starts.
>
> - Add stuff not picked with ASID patches, like enable rings after
>
>    driver_ok
>
> - Add rewinding on the migration src, not in dst
>
> - RFC v1 at https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html
>
>
>
> [1] https://lore.kernel.org/lkml/20230203142501.300125-1-eperezma@redhat.com/T/
>
> [2] https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html
>
>
>
> Eugenio Pérez (13):
>
>    vdpa net: move iova tree creation from init to start
>
>    vdpa: Negotiate _F_SUSPEND feature
>
>    vdpa: add vhost_vdpa_suspend
>
>    vdpa: move vhost reset after get vring base
>
>    vdpa: rewind at get_base, not set_base
>
>    vdpa net: allow VHOST_F_LOG_ALL
>
>    vdpa: add vdpa net migration state notifier
>
>    vdpa: disable RAM block discard only for the first device
>
>    vdpa net: block migration if the device has CVQ
>
>    vdpa: block migration if device has unsupported features
>
>    vdpa: block migration if dev does not have _F_SUSPEND
>
>    vdpa: block migration if SVQ does not admit a feature
>
>    vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
>
>
>
>   include/hw/virtio/vhost-backend.h |   4 +
>
>   hw/virtio/vhost-vdpa.c            | 126 +++++++++++++++-----
>
>   hw/virtio/vhost.c                 |   3 +
>
>   net/vhost-vdpa.c                  | 192 +++++++++++++++++++++++++-----
>
>   hw/virtio/trace-events            |   1 +
>
>   5 files changed, 267 insertions(+), 59 deletions(-)
>
>
>
> --
>
> 2.31.1
>
>
>
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-08  9:42 ` [PATCH v2 01/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
@ 2023-02-13  6:50     ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-13  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong



On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> Only create iova_tree if and when it is needed.
>
> The cleanup keeps being responsible of last VQ but this change allows it
> to merge both cleanup functions.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
>   net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 71 insertions(+), 28 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index de5ed8ff22..a9e6c8f28e 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -178,13 +178,9 @@ err_init:
>   static void vhost_vdpa_cleanup(NetClientState *nc)
>   {
>       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_dev *dev = &s->vhost_net->dev;
>   
>       qemu_vfree(s->cvq_cmd_out_buffer);
>       qemu_vfree(s->status);
> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> -    }
>       if (s->vhost_net) {
>           vhost_net_cleanup(s->vhost_net);
>           g_free(s->vhost_net);
> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>       return size;
>   }
>   
> +/** From any vdpa net client, get the netclient of first queue pair */
> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> +{
> +    NICState *nic = qemu_get_nic(s->nc.peer);
> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> +
> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> +}
> +
> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (v->shadow_vqs_enabled) {
> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> +                                           v->iova_range.last);
> +    }
> +}
> +
> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    if (v->index == 0) {
> +        vhost_vdpa_net_data_start_first(s);
> +        return 0;
> +    }
> +
> +    if (v->shadow_vqs_enabled) {
> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> +    }
> +
> +    return 0;
> +}
> +
> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_dev *dev;
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    dev = s->vhost_vdpa.dev;
> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> +    }
> +}
> +
>   static NetClientInfo net_vhost_vdpa_info = {
>           .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>           .size = sizeof(VhostVDPAState),
>           .receive = vhost_vdpa_receive,
> +        .start = vhost_vdpa_net_data_start,
> +        .stop = vhost_vdpa_net_client_stop,
>           .cleanup = vhost_vdpa_cleanup,
>           .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>           .has_ufo = vhost_vdpa_has_ufo,
> @@ -351,7 +401,7 @@ dma_map_err:
>   
>   static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>   {
> -    VhostVDPAState *s;
> +    VhostVDPAState *s, *s0;
>       struct vhost_vdpa *v;
>       uint64_t backend_features;
>       int64_t cvq_group;
> @@ -425,6 +475,15 @@ out:
>           return 0;
>       }
>   
> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +    if (s0->vhost_vdpa.iova_tree) {
> +        /* SVQ is already configured for all virtqueues */
> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> +    } else {
> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> +                                           v->iova_range.last);
I wonder how this case could happen, vhost_vdpa_net_data_start_first() 
should've allocated an iova tree on the first data vq. Is zero data vq 
ever possible on net vhost-vdpa?

Thanks,
-Siwei
> +    }
> +
>       r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>                                  vhost_vdpa_net_cvq_cmd_page_len(), false);
>       if (unlikely(r < 0)) {
> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>       if (s->vhost_vdpa.shadow_vqs_enabled) {
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> -        if (!s->always_svq) {
> -            /*
> -             * If only the CVQ is shadowed we can delete this safely.
> -             * If all the VQs are shadows this will be needed by the time the
> -             * device is started again to register SVQ vrings and similar.
> -             */
> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> -        }
>       }
> +
> +    vhost_vdpa_net_client_stop(nc);
>   }
>   
>   static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                          int nvqs,
>                                          bool is_datapath,
>                                          bool svq,
> -                                       struct vhost_vdpa_iova_range iova_range,
> -                                       VhostIOVATree *iova_tree)
> +                                       struct vhost_vdpa_iova_range iova_range)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;
> -    s->vhost_vdpa.iova_tree = iova_tree;
>       if (!is_datapath) {
>           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>                                               vhost_vdpa_net_cvq_cmd_page_len());
> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       uint64_t features;
>       int vdpa_device_fd;
>       g_autofree NetClientState **ncs = NULL;
> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>       struct vhost_vdpa_iova_range iova_range;
>       NetClientState *nc;
>       int queue_pairs, r, i = 0, has_cvq = 0;
> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           goto err;
>       }
>   
> -    if (opts->x_svq) {
> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> -            goto err_svq;
> -        }
> -
> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> +        goto err;
>       }
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> -                                     iova_range, iova_tree);
> +                                     iova_range);
>           if (!ncs[i])
>               goto err;
>       }
> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
> -                                 opts->x_svq, iova_range, iova_tree);
> +                                 opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
>       }
>   
> -    /* iova_tree ownership belongs to last NetClientState */
> -    g_steal_pointer(&iova_tree);
>       return 0;
>   
>   err:
> @@ -849,7 +893,6 @@ err:
>           }
>       }
>   
> -err_svq:
>       qemu_close(vdpa_device_fd);
>   
>       return -1;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
@ 2023-02-13  6:50     ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-13  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella



On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> Only create iova_tree if and when it is needed.
>
> The cleanup keeps being responsible of last VQ but this change allows it
> to merge both cleanup functions.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
>   net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 71 insertions(+), 28 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index de5ed8ff22..a9e6c8f28e 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -178,13 +178,9 @@ err_init:
>   static void vhost_vdpa_cleanup(NetClientState *nc)
>   {
>       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_dev *dev = &s->vhost_net->dev;
>   
>       qemu_vfree(s->cvq_cmd_out_buffer);
>       qemu_vfree(s->status);
> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> -    }
>       if (s->vhost_net) {
>           vhost_net_cleanup(s->vhost_net);
>           g_free(s->vhost_net);
> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>       return size;
>   }
>   
> +/** From any vdpa net client, get the netclient of first queue pair */
> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> +{
> +    NICState *nic = qemu_get_nic(s->nc.peer);
> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> +
> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> +}
> +
> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (v->shadow_vqs_enabled) {
> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> +                                           v->iova_range.last);
> +    }
> +}
> +
> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    if (v->index == 0) {
> +        vhost_vdpa_net_data_start_first(s);
> +        return 0;
> +    }
> +
> +    if (v->shadow_vqs_enabled) {
> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> +    }
> +
> +    return 0;
> +}
> +
> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_dev *dev;
> +
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    dev = s->vhost_vdpa.dev;
> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> +    }
> +}
> +
>   static NetClientInfo net_vhost_vdpa_info = {
>           .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>           .size = sizeof(VhostVDPAState),
>           .receive = vhost_vdpa_receive,
> +        .start = vhost_vdpa_net_data_start,
> +        .stop = vhost_vdpa_net_client_stop,
>           .cleanup = vhost_vdpa_cleanup,
>           .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>           .has_ufo = vhost_vdpa_has_ufo,
> @@ -351,7 +401,7 @@ dma_map_err:
>   
>   static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>   {
> -    VhostVDPAState *s;
> +    VhostVDPAState *s, *s0;
>       struct vhost_vdpa *v;
>       uint64_t backend_features;
>       int64_t cvq_group;
> @@ -425,6 +475,15 @@ out:
>           return 0;
>       }
>   
> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +    if (s0->vhost_vdpa.iova_tree) {
> +        /* SVQ is already configured for all virtqueues */
> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> +    } else {
> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> +                                           v->iova_range.last);
I wonder how this case could happen, vhost_vdpa_net_data_start_first() 
should've allocated an iova tree on the first data vq. Is zero data vq 
ever possible on net vhost-vdpa?

Thanks,
-Siwei
> +    }
> +
>       r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>                                  vhost_vdpa_net_cvq_cmd_page_len(), false);
>       if (unlikely(r < 0)) {
> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>       if (s->vhost_vdpa.shadow_vqs_enabled) {
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> -        if (!s->always_svq) {
> -            /*
> -             * If only the CVQ is shadowed we can delete this safely.
> -             * If all the VQs are shadows this will be needed by the time the
> -             * device is started again to register SVQ vrings and similar.
> -             */
> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> -        }
>       }
> +
> +    vhost_vdpa_net_client_stop(nc);
>   }
>   
>   static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                          int nvqs,
>                                          bool is_datapath,
>                                          bool svq,
> -                                       struct vhost_vdpa_iova_range iova_range,
> -                                       VhostIOVATree *iova_tree)
> +                                       struct vhost_vdpa_iova_range iova_range)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;
> -    s->vhost_vdpa.iova_tree = iova_tree;
>       if (!is_datapath) {
>           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>                                               vhost_vdpa_net_cvq_cmd_page_len());
> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       uint64_t features;
>       int vdpa_device_fd;
>       g_autofree NetClientState **ncs = NULL;
> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>       struct vhost_vdpa_iova_range iova_range;
>       NetClientState *nc;
>       int queue_pairs, r, i = 0, has_cvq = 0;
> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           goto err;
>       }
>   
> -    if (opts->x_svq) {
> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> -            goto err_svq;
> -        }
> -
> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> +        goto err;
>       }
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> -                                     iova_range, iova_tree);
> +                                     iova_range);
>           if (!ncs[i])
>               goto err;
>       }
> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
> -                                 opts->x_svq, iova_range, iova_tree);
> +                                 opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
>       }
>   
> -    /* iova_tree ownership belongs to last NetClientState */
> -    g_steal_pointer(&iova_tree);
>       return 0;
>   
>   err:
> @@ -849,7 +893,6 @@ err:
>           }
>       }
>   
> -err_svq:
>       qemu_close(vdpa_device_fd);
>   
>       return -1;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
  2023-02-08  9:42 ` [PATCH v2 07/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
@ 2023-02-13  6:50     ` Si-Wei Liu
  2023-02-22  3:55     ` Jason Wang
  1 sibling, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-13  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> This allows net to restart the device backend to configure SVQ on it.
>
> Ideally, these changes should not be net specific. However, the vdpa net
> backend is the one with enough knowledge to configure everything because
> of some reasons:
> * Queues might need to be shadowed or not depending on its kind (control
>    vs data).
> * Queues need to share the same map translations (iova tree).
>
> Because of that it is cleaner to restart the whole net backend and
> configure again as expected, similar to how vhost-kernel moves between
> userspace and passthrough.
>
> If more kinds of devices need dynamic switching to SVQ we can create a
> callback struct like VhostOps and move most of the code there.
> VhostOps cannot be reused since all vdpa backend share them, and to
> personalize just for networking would be too heavy.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3:
> * Add TODO to use the resume operation in the future.
> * Use migration_in_setup and migration_has_failed instead of a
>    complicated switch case.
> ---
>   net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index dd686b4514..bca13f97fd 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -26,12 +26,14 @@
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
>   #include "monitor/monitor.h"
> +#include "migration/misc.h"
>   #include "hw/virtio/vhost.h"
>   
>   /* Todo:need to add the multiqueue support here */
>   typedef struct VhostVDPAState {
>       NetClientState nc;
>       struct vhost_vdpa vhost_vdpa;
> +    Notifier migration_state;
>       VHostNetState *vhost_net;
>   
>       /* Control commands shadow buffers */
> @@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>       return DO_UPCAST(VhostVDPAState, nc, nc0);
>   }
>   
> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    VirtIONet *n;
> +    VirtIODevice *vdev;
> +    int data_queue_pairs, cvq, r;
> +    NetClientState *peer;
> +
> +    /* We are only called on the first data vqs and only if x-svq is not set */
> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> +        return;
> +    }
> +
> +    vdev = v->dev->vdev;
> +    n = VIRTIO_NET(vdev);
> +    if (!n->vhost_started) {
> +        return;
What if vhost gets started after migration is started, will svq still be 
(dynamically) enabled during vhost_dev_start()? I don't see relevant 
code to deal with it?

> +    }
> +
> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +                                  n->max_ncs - n->max_queue_pairs : 0;
> +    /*
> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> +     * in the future and resume the device if read-only operations between
> +     * suspend and reset goes wrong.
> +     */
> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +
> +    peer = s->nc.peer;
> +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> +        VhostVDPAState *vdpa_state;
> +        NetClientState *nc;
> +
> +        if (i < data_queue_pairs) {
> +            nc = qemu_get_peer(peer, i);
> +        } else {
> +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> +        }
> +
> +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> +        vdpa_state->vhost_vdpa.shadow_data = enable;
Don't get why shadow_data is set on cvq's vhost_vdpa? This may result in 
address space collision: data vq's iova getting improperly allocated on 
cvq's address space in vhost_vdpa_listener_region_{add,del}(). Noted 
currently there's an issue where guest VM's memory listener registration 
is always hooked to the last vq, which could be on the cvq in a 
different iova address space VHOST_VDPA_NET_CVQ_ASID.

Thanks,
-Siwei

> +
> +        if (i < data_queue_pairs) {
> +            /* Do not override CVQ shadow_vqs_enabled */
> +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> +        }
> +    }
> +
> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +    if (unlikely(r < 0)) {
> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> +    }
> +}
> +
> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> +{
> +    MigrationState *migration = data;
> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> +                                     migration_state);
> +
> +    if (migration_in_setup(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, true);
> +    } else if (migration_has_failed(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, false);
> +    }
> +}
> +
>   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>   {
>       struct vhost_vdpa *v = &s->vhost_vdpa;
>   
> +    add_migration_state_change_notifier(&s->migration_state);
>       if (v->shadow_vqs_enabled) {
>           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>                                              v->iova_range.last);
> @@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> +    if (s->vhost_vdpa.index == 0) {
> +        remove_migration_state_change_notifier(&s->migration_state);
> +    }
> +
>       dev = s->vhost_vdpa.dev;
>       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> @@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
>       s->always_svq = svq;
> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
@ 2023-02-13  6:50     ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-13  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella


On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> This allows net to restart the device backend to configure SVQ on it.
>
> Ideally, these changes should not be net specific. However, the vdpa net
> backend is the one with enough knowledge to configure everything because
> of some reasons:
> * Queues might need to be shadowed or not depending on its kind (control
>    vs data).
> * Queues need to share the same map translations (iova tree).
>
> Because of that it is cleaner to restart the whole net backend and
> configure again as expected, similar to how vhost-kernel moves between
> userspace and passthrough.
>
> If more kinds of devices need dynamic switching to SVQ we can create a
> callback struct like VhostOps and move most of the code there.
> VhostOps cannot be reused since all vdpa backend share them, and to
> personalize just for networking would be too heavy.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3:
> * Add TODO to use the resume operation in the future.
> * Use migration_in_setup and migration_has_failed instead of a
>    complicated switch case.
> ---
>   net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index dd686b4514..bca13f97fd 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -26,12 +26,14 @@
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
>   #include "monitor/monitor.h"
> +#include "migration/misc.h"
>   #include "hw/virtio/vhost.h"
>   
>   /* Todo:need to add the multiqueue support here */
>   typedef struct VhostVDPAState {
>       NetClientState nc;
>       struct vhost_vdpa vhost_vdpa;
> +    Notifier migration_state;
>       VHostNetState *vhost_net;
>   
>       /* Control commands shadow buffers */
> @@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>       return DO_UPCAST(VhostVDPAState, nc, nc0);
>   }
>   
> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    VirtIONet *n;
> +    VirtIODevice *vdev;
> +    int data_queue_pairs, cvq, r;
> +    NetClientState *peer;
> +
> +    /* We are only called on the first data vqs and only if x-svq is not set */
> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> +        return;
> +    }
> +
> +    vdev = v->dev->vdev;
> +    n = VIRTIO_NET(vdev);
> +    if (!n->vhost_started) {
> +        return;
What if vhost gets started after migration is started, will svq still be 
(dynamically) enabled during vhost_dev_start()? I don't see relevant 
code to deal with it?

> +    }
> +
> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +                                  n->max_ncs - n->max_queue_pairs : 0;
> +    /*
> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> +     * in the future and resume the device if read-only operations between
> +     * suspend and reset goes wrong.
> +     */
> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +
> +    peer = s->nc.peer;
> +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> +        VhostVDPAState *vdpa_state;
> +        NetClientState *nc;
> +
> +        if (i < data_queue_pairs) {
> +            nc = qemu_get_peer(peer, i);
> +        } else {
> +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> +        }
> +
> +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> +        vdpa_state->vhost_vdpa.shadow_data = enable;
Don't get why shadow_data is set on cvq's vhost_vdpa? This may result in 
address space collision: data vq's iova getting improperly allocated on 
cvq's address space in vhost_vdpa_listener_region_{add,del}(). Noted 
currently there's an issue where guest VM's memory listener registration 
is always hooked to the last vq, which could be on the cvq in a 
different iova address space VHOST_VDPA_NET_CVQ_ASID.

Thanks,
-Siwei

> +
> +        if (i < data_queue_pairs) {
> +            /* Do not override CVQ shadow_vqs_enabled */
> +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> +        }
> +    }
> +
> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +    if (unlikely(r < 0)) {
> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> +    }
> +}
> +
> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> +{
> +    MigrationState *migration = data;
> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> +                                     migration_state);
> +
> +    if (migration_in_setup(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, true);
> +    } else if (migration_has_failed(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, false);
> +    }
> +}
> +
>   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>   {
>       struct vhost_vdpa *v = &s->vhost_vdpa;
>   
> +    add_migration_state_change_notifier(&s->migration_state);
>       if (v->shadow_vqs_enabled) {
>           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>                                              v->iova_range.last);
> @@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> +    if (s->vhost_vdpa.index == 0) {
> +        remove_migration_state_change_notifier(&s->migration_state);
> +    }
> +
>       dev = s->vhost_vdpa.dev;
>       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> @@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
>       s->always_svq = svq;
> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
  2023-02-08  9:42 ` [PATCH v2 09/13] vdpa net: block migration if the device has CVQ Eugenio Pérez
@ 2023-02-13  6:50     ` Si-Wei Liu
  2023-02-22  4:00     ` Jason Wang
  1 sibling, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-13  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong



On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> to future series.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index bca13f97fd..309861e56c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       }
>   
>       if (has_cvq) {
> +        VhostVDPAState *s;
> +
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
>                                    opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
> +
> +        s = DO_UPCAST(VhostVDPAState, nc, nc);
> +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
> +                   "net vdpa cannot migrate with MQ feature");
Not sure how this can work: migration_blocker is only checked and gets 
added from vhost_dev_init(), which is already done through 
net_vhost_vdpa_init() above. Same question applies to the next patch of 
this series.

Thanks,
-Siwei

>       }
>   
>       return 0;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
@ 2023-02-13  6:50     ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-13  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella



On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> to future series.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index bca13f97fd..309861e56c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       }
>   
>       if (has_cvq) {
> +        VhostVDPAState *s;
> +
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
>                                    opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
> +
> +        s = DO_UPCAST(VhostVDPAState, nc, nc);
> +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
> +                   "net vdpa cannot migrate with MQ feature");
Not sure how this can work: migration_blocker is only checked and gets 
added from vhost_dev_init(), which is already done through 
net_vhost_vdpa_init() above. Same question applies to the next patch of 
this series.

Thanks,
-Siwei

>       }
>   
>       return 0;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-13  6:50     ` Si-Wei Liu
  (?)
@ 2023-02-13 11:14     ` Eugenio Perez Martin
  2023-02-14  1:45         ` Si-Wei Liu
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-13 11:14 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella

On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> > Only create iova_tree if and when it is needed.
> >
> > The cleanup keeps being responsible of last VQ but this change allows it
> > to merge both cleanup functions.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
> >   1 file changed, 71 insertions(+), 28 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index de5ed8ff22..a9e6c8f28e 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -178,13 +178,9 @@ err_init:
> >   static void vhost_vdpa_cleanup(NetClientState *nc)
> >   {
> >       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > -    struct vhost_dev *dev = &s->vhost_net->dev;
> >
> >       qemu_vfree(s->cvq_cmd_out_buffer);
> >       qemu_vfree(s->status);
> > -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > -    }
> >       if (s->vhost_net) {
> >           vhost_net_cleanup(s->vhost_net);
> >           g_free(s->vhost_net);
> > @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> >       return size;
> >   }
> >
> > +/** From any vdpa net client, get the netclient of first queue pair */
> > +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> > +{
> > +    NICState *nic = qemu_get_nic(s->nc.peer);
> > +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> > +
> > +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> > +}
> > +
> > +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> > +{
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > +                                           v->iova_range.last);
> > +    }
> > +}
> > +
> > +static int vhost_vdpa_net_data_start(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    if (v->index == 0) {
> > +        vhost_vdpa_net_data_start_first(s);
> > +        return 0;
> > +    }
> > +
> > +    if (v->shadow_vqs_enabled) {
> > +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    struct vhost_dev *dev;
> > +
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    dev = s->vhost_vdpa.dev;
> > +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > +    }
> > +}
> > +
> >   static NetClientInfo net_vhost_vdpa_info = {
> >           .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> >           .size = sizeof(VhostVDPAState),
> >           .receive = vhost_vdpa_receive,
> > +        .start = vhost_vdpa_net_data_start,
> > +        .stop = vhost_vdpa_net_client_stop,
> >           .cleanup = vhost_vdpa_cleanup,
> >           .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> >           .has_ufo = vhost_vdpa_has_ufo,
> > @@ -351,7 +401,7 @@ dma_map_err:
> >
> >   static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >   {
> > -    VhostVDPAState *s;
> > +    VhostVDPAState *s, *s0;
> >       struct vhost_vdpa *v;
> >       uint64_t backend_features;
> >       int64_t cvq_group;
> > @@ -425,6 +475,15 @@ out:
> >           return 0;
> >       }
> >
> > +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> > +    if (s0->vhost_vdpa.iova_tree) {
> > +        /* SVQ is already configured for all virtqueues */
> > +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> > +    } else {
> > +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > +                                           v->iova_range.last);
> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
> should've allocated an iova tree on the first data vq. Is zero data vq
> ever possible on net vhost-vdpa?
>

It's the case of the current qemu master when only CVQ is being
shadowed. It's not that "there are no data vq": If that case were
possible, CVQ vhost-vdpa state would be s0.

The case is that since only CVQ vhost-vdpa is the one being migrated,
only CVQ has an iova tree.

With this series applied and with no migration running, the case is
the same as before: only SVQ gets shadowed. When migration starts, all
vqs are migrated, and share iova tree.

Thanks!

> Thanks,
> -Siwei
> > +    }
> > +
> >       r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> >                                  vhost_vdpa_net_cvq_cmd_page_len(), false);
> >       if (unlikely(r < 0)) {
> > @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >       if (s->vhost_vdpa.shadow_vqs_enabled) {
> >           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> >           vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> > -        if (!s->always_svq) {
> > -            /*
> > -             * If only the CVQ is shadowed we can delete this safely.
> > -             * If all the VQs are shadows this will be needed by the time the
> > -             * device is started again to register SVQ vrings and similar.
> > -             */
> > -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > -        }
> >       }
> > +
> > +    vhost_vdpa_net_client_stop(nc);
> >   }
> >
> >   static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> > @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                          int nvqs,
> >                                          bool is_datapath,
> >                                          bool svq,
> > -                                       struct vhost_vdpa_iova_range iova_range,
> > -                                       VhostIOVATree *iova_tree)
> > +                                       struct vhost_vdpa_iova_range iova_range)
> >   {
> >       NetClientState *nc = NULL;
> >       VhostVDPAState *s;
> > @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
> > -    s->vhost_vdpa.iova_tree = iova_tree;
> >       if (!is_datapath) {
> >           s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >                                               vhost_vdpa_net_cvq_cmd_page_len());
> > @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       uint64_t features;
> >       int vdpa_device_fd;
> >       g_autofree NetClientState **ncs = NULL;
> > -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >       struct vhost_vdpa_iova_range iova_range;
> >       NetClientState *nc;
> >       int queue_pairs, r, i = 0, has_cvq = 0;
> > @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >           goto err;
> >       }
> >
> > -    if (opts->x_svq) {
> > -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> > -            goto err_svq;
> > -        }
> > -
> > -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> > +        goto err;
> >       }
> >
> >       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> > @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       for (i = 0; i < queue_pairs; i++) {
> >           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                        vdpa_device_fd, i, 2, true, opts->x_svq,
> > -                                     iova_range, iova_tree);
> > +                                     iova_range);
> >           if (!ncs[i])
> >               goto err;
> >       }
> > @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       if (has_cvq) {
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                    vdpa_device_fd, i, 1, false,
> > -                                 opts->x_svq, iova_range, iova_tree);
> > +                                 opts->x_svq, iova_range);
> >           if (!nc)
> >               goto err;
> >       }
> >
> > -    /* iova_tree ownership belongs to last NetClientState */
> > -    g_steal_pointer(&iova_tree);
> >       return 0;
> >
> >   err:
> > @@ -849,7 +893,6 @@ err:
> >           }
> >       }
> >
> > -err_svq:
> >       qemu_close(vdpa_device_fd);
> >
> >       return -1;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
  2023-02-13  6:50     ` Si-Wei Liu
  (?)
@ 2023-02-13 15:51     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-13 15:51 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella

On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> > This allows net to restart the device backend to configure SVQ on it.
> >
> > Ideally, these changes should not be net specific. However, the vdpa net
> > backend is the one with enough knowledge to configure everything because
> > of some reasons:
> > * Queues might need to be shadowed or not depending on its kind (control
> >    vs data).
> > * Queues need to share the same map translations (iova tree).
> >
> > Because of that it is cleaner to restart the whole net backend and
> > configure again as expected, similar to how vhost-kernel moves between
> > userspace and passthrough.
> >
> > If more kinds of devices need dynamic switching to SVQ we can create a
> > callback struct like VhostOps and move most of the code there.
> > VhostOps cannot be reused since all vdpa backend share them, and to
> > personalize just for networking would be too heavy.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3:
> > * Add TODO to use the resume operation in the future.
> > * Use migration_in_setup and migration_has_failed instead of a
> >    complicated switch case.
> > ---
> >   net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 76 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index dd686b4514..bca13f97fd 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -26,12 +26,14 @@
> >   #include <err.h>
> >   #include "standard-headers/linux/virtio_net.h"
> >   #include "monitor/monitor.h"
> > +#include "migration/misc.h"
> >   #include "hw/virtio/vhost.h"
> >
> >   /* Todo:need to add the multiqueue support here */
> >   typedef struct VhostVDPAState {
> >       NetClientState nc;
> >       struct vhost_vdpa vhost_vdpa;
> > +    Notifier migration_state;
> >       VHostNetState *vhost_net;
> >
> >       /* Control commands shadow buffers */
> > @@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >       return DO_UPCAST(VhostVDPAState, nc, nc0);
> >   }
> >
> > +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> > +{
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +    VirtIONet *n;
> > +    VirtIODevice *vdev;
> > +    int data_queue_pairs, cvq, r;
> > +    NetClientState *peer;
> > +
> > +    /* We are only called on the first data vqs and only if x-svq is not set */
> > +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> > +        return;
> > +    }
> > +
> > +    vdev = v->dev->vdev;
> > +    n = VIRTIO_NET(vdev);
> > +    if (!n->vhost_started) {
> > +        return;
> What if vhost gets started after migration is started, will svq still be
> (dynamically) enabled during vhost_dev_start()? I don't see relevant
> code to deal with it?
>

Good catch. v->shadow_vqs_enabled must change even if
!n->vhost_started. That should be the only code needed.

Also, migration listener must be registered from qemu startup, not on
device start.

> > +    }
> > +
> > +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> > +                                  n->max_ncs - n->max_queue_pairs : 0;
> > +    /*
> > +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> > +     * in the future and resume the device if read-only operations between
> > +     * suspend and reset goes wrong.
> > +     */
> > +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +
> > +    peer = s->nc.peer;
> > +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> > +        VhostVDPAState *vdpa_state;
> > +        NetClientState *nc;
> > +
> > +        if (i < data_queue_pairs) {
> > +            nc = qemu_get_peer(peer, i);
> > +        } else {
> > +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> > +        }
> > +
> > +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> > +        vdpa_state->vhost_vdpa.shadow_data = enable;
> Don't get why shadow_data is set on cvq's vhost_vdpa? This may result in
> address space collision: data vq's iova getting improperly allocated on
> cvq's address space in vhost_vdpa_listener_region_{add,del}(). Noted
> currently there's an issue where guest VM's memory listener registration
> is always hooked to the last vq, which could be on the cvq in a
> different iova address space VHOST_VDPA_NET_CVQ_ASID.
>

Let me answer in reverse. guest VM's memory listener registration is
effectively always hooked to the last vq, that's why shadow_data is
needed.

In the past it was enough with v->shadow_vqs_enabled. However, since
the introduction of ASID support & CVQ tracking through it, The
listener (hooked at CVQ) needs to know if it should use iova tree or
not. That's why a separated variable shadow_data is needed.

That way, it may happen that cvq vhost_vdpa->shadow_vqs_enabled = true
but cvq vhost_vdpa->shadow_vqs_enabledshadow_data = false.

Is that clearer?

Thanks!

> Thanks,
> -Siwei
>
> > +
> > +        if (i < data_queue_pairs) {
> > +            /* Do not override CVQ shadow_vqs_enabled */
> > +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> > +        }
> > +    }
> > +
> > +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +    if (unlikely(r < 0)) {
> > +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> > +    }
> > +}
> > +
> > +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> > +{
> > +    MigrationState *migration = data;
> > +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> > +                                     migration_state);
> > +
> > +    if (migration_in_setup(migration)) {
> > +        vhost_vdpa_net_log_global_enable(s, true);
> > +    } else if (migration_has_failed(migration)) {
> > +        vhost_vdpa_net_log_global_enable(s, false);
> > +    }
> > +}
> > +
> >   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >   {
> >       struct vhost_vdpa *v = &s->vhost_vdpa;
> >
> > +    add_migration_state_change_notifier(&s->migration_state);
> >       if (v->shadow_vqs_enabled) {
> >           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >                                              v->iova_range.last);
> > @@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >
> >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > +    if (s->vhost_vdpa.index == 0) {
> > +        remove_migration_state_change_notifier(&s->migration_state);
> > +    }
> > +
> >       dev = s->vhost_vdpa.dev;
> >       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > @@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> >       s->always_svq = svq;
> > +    s->migration_state.notify = vdpa_net_migration_state_notifier;
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-13 11:14     ` Eugenio Perez Martin
@ 2023-02-14  1:45         ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-14  1:45 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong



On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
>>> Only create iova_tree if and when it is needed.
>>>
>>> The cleanup keeps being responsible of last VQ but this change allows it
>>> to merge both cleanup functions.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>>    net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>    1 file changed, 71 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>> index de5ed8ff22..a9e6c8f28e 100644
>>> --- a/net/vhost-vdpa.c
>>> +++ b/net/vhost-vdpa.c
>>> @@ -178,13 +178,9 @@ err_init:
>>>    static void vhost_vdpa_cleanup(NetClientState *nc)
>>>    {
>>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>
>>>        qemu_vfree(s->cvq_cmd_out_buffer);
>>>        qemu_vfree(s->status);
>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> -    }
>>>        if (s->vhost_net) {
>>>            vhost_net_cleanup(s->vhost_net);
>>>            g_free(s->vhost_net);
>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>        return size;
>>>    }
>>>
>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>> +{
>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>> +
>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>> +}
>>> +
>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>> +{
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> +                                           v->iova_range.last);
>>> +    }
>>> +}
>>> +
>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>> +{
>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +
>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>> +
>>> +    if (v->index == 0) {
>>> +        vhost_vdpa_net_data_start_first(s);
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>> +{
>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +    struct vhost_dev *dev;
>>> +
>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>> +
>>> +    dev = s->vhost_vdpa.dev;
>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> +    }
>>> +}
>>> +
>>>    static NetClientInfo net_vhost_vdpa_info = {
>>>            .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>            .size = sizeof(VhostVDPAState),
>>>            .receive = vhost_vdpa_receive,
>>> +        .start = vhost_vdpa_net_data_start,
>>> +        .stop = vhost_vdpa_net_client_stop,
>>>            .cleanup = vhost_vdpa_cleanup,
>>>            .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>            .has_ufo = vhost_vdpa_has_ufo,
>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>
>>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>    {
>>> -    VhostVDPAState *s;
>>> +    VhostVDPAState *s, *s0;
>>>        struct vhost_vdpa *v;
>>>        uint64_t backend_features;
>>>        int64_t cvq_group;
>>> @@ -425,6 +475,15 @@ out:
>>>            return 0;
>>>        }
>>>
>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +    if (s0->vhost_vdpa.iova_tree) {
>>> +        /* SVQ is already configured for all virtqueues */
>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>> +    } else {
>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> +                                           v->iova_range.last);
>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
>> should've allocated an iova tree on the first data vq. Is zero data vq
>> ever possible on net vhost-vdpa?
>>
> It's the case of the current qemu master when only CVQ is being
> shadowed. It's not that "there are no data vq": If that case were
> possible, CVQ vhost-vdpa state would be s0.
>
> The case is that since only CVQ vhost-vdpa is the one being migrated,
> only CVQ has an iova tree.
OK, so this corresponds to the case where live migration is not started 
and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID. 
Thanks for explaining it!

>
> With this series applied and with no migration running, the case is
> the same as before: only SVQ gets shadowed. When migration starts, all
> vqs are migrated, and share iova tree.
I wonder what is the reason to share the iova tree when migration 
starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?

Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I 
don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to 
VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I 
collision I mentioned earlier:

9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16 
msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000 
perm: 0x1 type: 2
9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16 
msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000 
perm: 0x3 type: 2
9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20 
index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000 
avail_user_addr: 0x2000 log_guest_addr: 0x0
:
:
9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16 
msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000 
perm: 0x1 type: 2
9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16 
msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000 
perm: 0x3 type: 2
9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930 
index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000 
avail_user_addr: 0x17000 log_guest_addr: 0x0
9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000 
perm: 0x1 type: 2
9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000 
perm: 0x3 type: 2
9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000 
perm: 0x1 type: 2
9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000 
perm: 0x3 type: 2
9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0 
index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000 
avail_user_addr: 0x1b400 log_guest_addr: 0x0
9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa: 
0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
9585@1676093788.635670:vhost_vdpa_listener_begin_batch 
vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm: 
0x3 type: 2
2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16, 
errno=14 (Bad address)
2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping 
failed, unable to continue


Regards,
-Siwei
>
> Thanks!
>
>> Thanks,
>> -Siwei
>>> +    }
>>> +
>>>        r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>                                   vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>        if (unlikely(r < 0)) {
>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>        if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>> -        if (!s->always_svq) {
>>> -            /*
>>> -             * If only the CVQ is shadowed we can delete this safely.
>>> -             * If all the VQs are shadows this will be needed by the time the
>>> -             * device is started again to register SVQ vrings and similar.
>>> -             */
>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> -        }
>>>        }
>>> +
>>> +    vhost_vdpa_net_client_stop(nc);
>>>    }
>>>
>>>    static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>                                           int nvqs,
>>>                                           bool is_datapath,
>>>                                           bool svq,
>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>> -                                       VhostIOVATree *iova_tree)
>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>    {
>>>        NetClientState *nc = NULL;
>>>        VhostVDPAState *s;
>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>        s->vhost_vdpa.iova_range = iova_range;
>>>        s->vhost_vdpa.shadow_data = svq;
>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>        if (!is_datapath) {
>>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>                                                vhost_vdpa_net_cvq_cmd_page_len());
>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        uint64_t features;
>>>        int vdpa_device_fd;
>>>        g_autofree NetClientState **ncs = NULL;
>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>        struct vhost_vdpa_iova_range iova_range;
>>>        NetClientState *nc;
>>>        int queue_pairs, r, i = 0, has_cvq = 0;
>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>            goto err;
>>>        }
>>>
>>> -    if (opts->x_svq) {
>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>> -            goto err_svq;
>>> -        }
>>> -
>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>> +        goto err;
>>>        }
>>>
>>>        ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        for (i = 0; i < queue_pairs; i++) {
>>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
>>> -                                     iova_range, iova_tree);
>>> +                                     iova_range);
>>>            if (!ncs[i])
>>>                goto err;
>>>        }
>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        if (has_cvq) {
>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                     vdpa_device_fd, i, 1, false,
>>> -                                 opts->x_svq, iova_range, iova_tree);
>>> +                                 opts->x_svq, iova_range);
>>>            if (!nc)
>>>                goto err;
>>>        }
>>>
>>> -    /* iova_tree ownership belongs to last NetClientState */
>>> -    g_steal_pointer(&iova_tree);
>>>        return 0;
>>>
>>>    err:
>>> @@ -849,7 +893,6 @@ err:
>>>            }
>>>        }
>>>
>>> -err_svq:
>>>        qemu_close(vdpa_device_fd);
>>>
>>>        return -1;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
@ 2023-02-14  1:45         ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-14  1:45 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella



On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
>>> Only create iova_tree if and when it is needed.
>>>
>>> The cleanup keeps being responsible of last VQ but this change allows it
>>> to merge both cleanup functions.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>>    net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>    1 file changed, 71 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>> index de5ed8ff22..a9e6c8f28e 100644
>>> --- a/net/vhost-vdpa.c
>>> +++ b/net/vhost-vdpa.c
>>> @@ -178,13 +178,9 @@ err_init:
>>>    static void vhost_vdpa_cleanup(NetClientState *nc)
>>>    {
>>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>
>>>        qemu_vfree(s->cvq_cmd_out_buffer);
>>>        qemu_vfree(s->status);
>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> -    }
>>>        if (s->vhost_net) {
>>>            vhost_net_cleanup(s->vhost_net);
>>>            g_free(s->vhost_net);
>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>        return size;
>>>    }
>>>
>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>> +{
>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>> +
>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>> +}
>>> +
>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>> +{
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> +                                           v->iova_range.last);
>>> +    }
>>> +}
>>> +
>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>> +{
>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>> +
>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>> +
>>> +    if (v->index == 0) {
>>> +        vhost_vdpa_net_data_start_first(s);
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>> +{
>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +    struct vhost_dev *dev;
>>> +
>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>> +
>>> +    dev = s->vhost_vdpa.dev;
>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> +    }
>>> +}
>>> +
>>>    static NetClientInfo net_vhost_vdpa_info = {
>>>            .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>            .size = sizeof(VhostVDPAState),
>>>            .receive = vhost_vdpa_receive,
>>> +        .start = vhost_vdpa_net_data_start,
>>> +        .stop = vhost_vdpa_net_client_stop,
>>>            .cleanup = vhost_vdpa_cleanup,
>>>            .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>            .has_ufo = vhost_vdpa_has_ufo,
>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>
>>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>    {
>>> -    VhostVDPAState *s;
>>> +    VhostVDPAState *s, *s0;
>>>        struct vhost_vdpa *v;
>>>        uint64_t backend_features;
>>>        int64_t cvq_group;
>>> @@ -425,6 +475,15 @@ out:
>>>            return 0;
>>>        }
>>>
>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>> +    if (s0->vhost_vdpa.iova_tree) {
>>> +        /* SVQ is already configured for all virtqueues */
>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>> +    } else {
>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>> +                                           v->iova_range.last);
>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
>> should've allocated an iova tree on the first data vq. Is zero data vq
>> ever possible on net vhost-vdpa?
>>
> It's the case of the current qemu master when only CVQ is being
> shadowed. It's not that "there are no data vq": If that case were
> possible, CVQ vhost-vdpa state would be s0.
>
> The case is that since only CVQ vhost-vdpa is the one being migrated,
> only CVQ has an iova tree.
OK, so this corresponds to the case where live migration is not started 
and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID. 
Thanks for explaining it!

>
> With this series applied and with no migration running, the case is
> the same as before: only SVQ gets shadowed. When migration starts, all
> vqs are migrated, and share iova tree.
I wonder what is the reason to share the iova tree when migration 
starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?

Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I 
don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to 
VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I 
collision I mentioned earlier:

9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16 
msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000 
perm: 0x1 type: 2
9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16 
msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000 
perm: 0x3 type: 2
9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20 
index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000 
avail_user_addr: 0x2000 log_guest_addr: 0x0
:
:
9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16 
msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000 
perm: 0x1 type: 2
9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16 
msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000 
perm: 0x3 type: 2
9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930 
index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000 
avail_user_addr: 0x17000 log_guest_addr: 0x0
9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000 
perm: 0x1 type: 2
9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000 
perm: 0x3 type: 2
9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000 
perm: 0x1 type: 2
9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000 
perm: 0x3 type: 2
9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0 
index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000 
avail_user_addr: 0x1b400 log_guest_addr: 0x0
9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa: 
0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
9585@1676093788.635670:vhost_vdpa_listener_begin_batch 
vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16 
msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm: 
0x3 type: 2
2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16, 
errno=14 (Bad address)
2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping 
failed, unable to continue


Regards,
-Siwei
>
> Thanks!
>
>> Thanks,
>> -Siwei
>>> +    }
>>> +
>>>        r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>                                   vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>        if (unlikely(r < 0)) {
>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>        if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>> -        if (!s->always_svq) {
>>> -            /*
>>> -             * If only the CVQ is shadowed we can delete this safely.
>>> -             * If all the VQs are shadows this will be needed by the time the
>>> -             * device is started again to register SVQ vrings and similar.
>>> -             */
>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>> -        }
>>>        }
>>> +
>>> +    vhost_vdpa_net_client_stop(nc);
>>>    }
>>>
>>>    static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>                                           int nvqs,
>>>                                           bool is_datapath,
>>>                                           bool svq,
>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>> -                                       VhostIOVATree *iova_tree)
>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>    {
>>>        NetClientState *nc = NULL;
>>>        VhostVDPAState *s;
>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>        s->vhost_vdpa.iova_range = iova_range;
>>>        s->vhost_vdpa.shadow_data = svq;
>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>        if (!is_datapath) {
>>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>                                                vhost_vdpa_net_cvq_cmd_page_len());
>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        uint64_t features;
>>>        int vdpa_device_fd;
>>>        g_autofree NetClientState **ncs = NULL;
>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>        struct vhost_vdpa_iova_range iova_range;
>>>        NetClientState *nc;
>>>        int queue_pairs, r, i = 0, has_cvq = 0;
>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>            goto err;
>>>        }
>>>
>>> -    if (opts->x_svq) {
>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>> -            goto err_svq;
>>> -        }
>>> -
>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>> +        goto err;
>>>        }
>>>
>>>        ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        for (i = 0; i < queue_pairs; i++) {
>>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
>>> -                                     iova_range, iova_tree);
>>> +                                     iova_range);
>>>            if (!ncs[i])
>>>                goto err;
>>>        }
>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        if (has_cvq) {
>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                     vdpa_device_fd, i, 1, false,
>>> -                                 opts->x_svq, iova_range, iova_tree);
>>> +                                 opts->x_svq, iova_range);
>>>            if (!nc)
>>>                goto err;
>>>        }
>>>
>>> -    /* iova_tree ownership belongs to last NetClientState */
>>> -    g_steal_pointer(&iova_tree);
>>>        return 0;
>>>
>>>    err:
>>> @@ -849,7 +893,6 @@ err:
>>>            }
>>>        }
>>>
>>> -err_svq:
>>>        qemu_close(vdpa_device_fd);
>>>
>>>        return -1;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
  2023-02-13  6:50     ` Si-Wei Liu
  (?)
@ 2023-02-14 18:06     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-14 18:06 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella

On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> > Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> > to future series.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index bca13f97fd..309861e56c 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       }
> >
> >       if (has_cvq) {
> > +        VhostVDPAState *s;
> > +
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                    vdpa_device_fd, i, 1, false,
> >                                    opts->x_svq, iova_range);
> >           if (!nc)
> >               goto err;
> > +
> > +        s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
> > +                   "net vdpa cannot migrate with MQ feature");
> Not sure how this can work: migration_blocker is only checked and gets
> added from vhost_dev_init(), which is already done through
> net_vhost_vdpa_init() above. Same question applies to the next patch of
> this series.
>

Good catch, fixing in v3.

Thanks!



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-14  1:45         ` Si-Wei Liu
  (?)
@ 2023-02-14 19:07         ` Eugenio Perez Martin
  2023-02-16  2:14             ` Si-Wei Liu
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-14 19:07 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella

On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
> > On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> >>> Only create iova_tree if and when it is needed.
> >>>
> >>> The cleanup keeps being responsible of last VQ but this change allows it
> >>> to merge both cleanup functions.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>> ---
> >>>    net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
> >>>    1 file changed, 71 insertions(+), 28 deletions(-)
> >>>
> >>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>> index de5ed8ff22..a9e6c8f28e 100644
> >>> --- a/net/vhost-vdpa.c
> >>> +++ b/net/vhost-vdpa.c
> >>> @@ -178,13 +178,9 @@ err_init:
> >>>    static void vhost_vdpa_cleanup(NetClientState *nc)
> >>>    {
> >>>        VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> -    struct vhost_dev *dev = &s->vhost_net->dev;
> >>>
> >>>        qemu_vfree(s->cvq_cmd_out_buffer);
> >>>        qemu_vfree(s->status);
> >>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> -    }
> >>>        if (s->vhost_net) {
> >>>            vhost_net_cleanup(s->vhost_net);
> >>>            g_free(s->vhost_net);
> >>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> >>>        return size;
> >>>    }
> >>>
> >>> +/** From any vdpa net client, get the netclient of first queue pair */
> >>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >>> +{
> >>> +    NICState *nic = qemu_get_nic(s->nc.peer);
> >>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> >>> +
> >>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> >>> +}
> >>> +
> >>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >>> +{
> >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>> +
> >>> +    if (v->shadow_vqs_enabled) {
> >>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>> +                                           v->iova_range.last);
> >>> +    }
> >>> +}
> >>> +
> >>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> >>> +{
> >>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>> +
> >>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>> +
> >>> +    if (v->index == 0) {
> >>> +        vhost_vdpa_net_data_start_first(s);
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (v->shadow_vqs_enabled) {
> >>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>> +    }
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >>> +{
> >>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>> +    struct vhost_dev *dev;
> >>> +
> >>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>> +
> >>> +    dev = s->vhost_vdpa.dev;
> >>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> +    }
> >>> +}
> >>> +
> >>>    static NetClientInfo net_vhost_vdpa_info = {
> >>>            .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> >>>            .size = sizeof(VhostVDPAState),
> >>>            .receive = vhost_vdpa_receive,
> >>> +        .start = vhost_vdpa_net_data_start,
> >>> +        .stop = vhost_vdpa_net_client_stop,
> >>>            .cleanup = vhost_vdpa_cleanup,
> >>>            .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> >>>            .has_ufo = vhost_vdpa_has_ufo,
> >>> @@ -351,7 +401,7 @@ dma_map_err:
> >>>
> >>>    static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>    {
> >>> -    VhostVDPAState *s;
> >>> +    VhostVDPAState *s, *s0;
> >>>        struct vhost_vdpa *v;
> >>>        uint64_t backend_features;
> >>>        int64_t cvq_group;
> >>> @@ -425,6 +475,15 @@ out:
> >>>            return 0;
> >>>        }
> >>>
> >>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>> +    if (s0->vhost_vdpa.iova_tree) {
> >>> +        /* SVQ is already configured for all virtqueues */
> >>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>> +    } else {
> >>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>> +                                           v->iova_range.last);
> >> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
> >> should've allocated an iova tree on the first data vq. Is zero data vq
> >> ever possible on net vhost-vdpa?
> >>
> > It's the case of the current qemu master when only CVQ is being
> > shadowed. It's not that "there are no data vq": If that case were
> > possible, CVQ vhost-vdpa state would be s0.
> >
> > The case is that since only CVQ vhost-vdpa is the one being migrated,
> > only CVQ has an iova tree.
> OK, so this corresponds to the case where live migration is not started
> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
> Thanks for explaining it!
>
> >
> > With this series applied and with no migration running, the case is
> > the same as before: only SVQ gets shadowed. When migration starts, all
> > vqs are migrated, and share iova tree.
> I wonder what is the reason to share the iova tree when migration
> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
>
> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
> collision I mentioned earlier:
>

There is no such change. This code only migrates devices with no CVQ,
as they have their own difficulties.

In the previous RFC there was no such change either. Since it's hard
to modify passthrough devices IOVA tree, CVQ AS updates keep being
VHOST_VDPA_NET_CVQ_ASID.

They both share the same IOVA tree though, just for simplicity. If
address space exhaustion is a problem we can make them independent,
but this complicates the code a little bit.

> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
> perm: 0x1 type: 2
> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
> perm: 0x3 type: 2
> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
> avail_user_addr: 0x2000 log_guest_addr: 0x0
> :
> :
> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
> perm: 0x1 type: 2
> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
> perm: 0x3 type: 2
> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
> avail_user_addr: 0x17000 log_guest_addr: 0x0
> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
> perm: 0x1 type: 2
> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
> perm: 0x3 type: 2
> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
> perm: 0x1 type: 2
> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
> perm: 0x3 type: 2
> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
> avail_user_addr: 0x1b400 log_guest_addr: 0x0
> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
> 0x3 type: 2
> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
> errno=14 (Bad address)
> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
> failed, unable to continue
>

I'm not sure how you get to this. Maybe you were able to start the
migration because the CVQ migration blocker was not effectively added?

Thanks!


>
> Regards,
> -Siwei
> >
> > Thanks!
> >
> >> Thanks,
> >> -Siwei
> >>> +    }
> >>> +
> >>>        r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> >>>                                   vhost_vdpa_net_cvq_cmd_page_len(), false);
> >>>        if (unlikely(r < 0)) {
> >>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >>>        if (s->vhost_vdpa.shadow_vqs_enabled) {
> >>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> >>>            vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> >>> -        if (!s->always_svq) {
> >>> -            /*
> >>> -             * If only the CVQ is shadowed we can delete this safely.
> >>> -             * If all the VQs are shadows this will be needed by the time the
> >>> -             * device is started again to register SVQ vrings and similar.
> >>> -             */
> >>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>> -        }
> >>>        }
> >>> +
> >>> +    vhost_vdpa_net_client_stop(nc);
> >>>    }
> >>>
> >>>    static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> >>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>                                           int nvqs,
> >>>                                           bool is_datapath,
> >>>                                           bool svq,
> >>> -                                       struct vhost_vdpa_iova_range iova_range,
> >>> -                                       VhostIOVATree *iova_tree)
> >>> +                                       struct vhost_vdpa_iova_range iova_range)
> >>>    {
> >>>        NetClientState *nc = NULL;
> >>>        VhostVDPAState *s;
> >>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>        s->vhost_vdpa.shadow_vqs_enabled = svq;
> >>>        s->vhost_vdpa.iova_range = iova_range;
> >>>        s->vhost_vdpa.shadow_data = svq;
> >>> -    s->vhost_vdpa.iova_tree = iova_tree;
> >>>        if (!is_datapath) {
> >>>            s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >>>                                                vhost_vdpa_net_cvq_cmd_page_len());
> >>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        uint64_t features;
> >>>        int vdpa_device_fd;
> >>>        g_autofree NetClientState **ncs = NULL;
> >>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >>>        struct vhost_vdpa_iova_range iova_range;
> >>>        NetClientState *nc;
> >>>        int queue_pairs, r, i = 0, has_cvq = 0;
> >>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>            goto err;
> >>>        }
> >>>
> >>> -    if (opts->x_svq) {
> >>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>> -            goto err_svq;
> >>> -        }
> >>> -
> >>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> >>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>> +        goto err;
> >>>        }
> >>>
> >>>        ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        for (i = 0; i < queue_pairs; i++) {
> >>>            ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>                                         vdpa_device_fd, i, 2, true, opts->x_svq,
> >>> -                                     iova_range, iova_tree);
> >>> +                                     iova_range);
> >>>            if (!ncs[i])
> >>>                goto err;
> >>>        }
> >>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>        if (has_cvq) {
> >>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>                                     vdpa_device_fd, i, 1, false,
> >>> -                                 opts->x_svq, iova_range, iova_tree);
> >>> +                                 opts->x_svq, iova_range);
> >>>            if (!nc)
> >>>                goto err;
> >>>        }
> >>>
> >>> -    /* iova_tree ownership belongs to last NetClientState */
> >>> -    g_steal_pointer(&iova_tree);
> >>>        return 0;
> >>>
> >>>    err:
> >>> @@ -849,7 +893,6 @@ err:
> >>>            }
> >>>        }
> >>>
> >>> -err_svq:
> >>>        qemu_close(vdpa_device_fd);
> >>>
> >>>        return -1;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
  2023-02-10 12:57 ` Gautam Dawar
@ 2023-02-15 18:40   ` Eugenio Perez Martin
  2023-02-16 13:50     ` Lei Yang
  0 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-15 18:40 UTC (permalink / raw)
  To: Gautam Dawar
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella,
	si-wei.liu

On Fri, Feb 10, 2023 at 1:58 PM Gautam Dawar <gdawar@amd.com> wrote:
>
> Hi Eugenio,
>
> I've tested this patch series on Xilinx/AMD SN1022 device without
> control vq and VM Live Migration between two hosts worked fine.
>
> Tested-by: Gautam Dawar <gautam.dawar@amd.com>
>

Thanks for the testing!

>
> Here is some minor feedback:
>
> Pls fix the typo (Dynamycally -> Dynamically) in the Subject.
>
> On 2/8/23 15:12, Eugenio Pérez wrote:
> > CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> >
> >
> > It's possible to migrate vdpa net devices if they are shadowed from the
> >
> > start.  But to always shadow the dataplane is to effectively break its host
> >
> > passthrough, so its not convenient in vDPA scenarios.
> I believe you meant efficient instead of convenient.
> >
> >
> >
> > This series enables dynamically switching to shadow mode only at
> >
> > migration time.  This allows full data virtqueues passthrough all the
> >
> > time qemu is not migrating.
> >
> >
> >
> > In this series only net devices with no CVQ are migratable.  CVQ adds
> >
> > additional state that would make the series bigger and still had some
> >
> > controversy on previous RFC, so let's split it.
> >
> >
> >
> > The first patch delays the creation of the iova tree until it is really needed,
> >
> > and makes it easier to dynamically move from and to SVQ mode.
> It would help adding some detail on the iova tree being referred to here.
> >
> >
> >
> > Next patches from 02 to 05 handle the suspending and getting of vq state (base)
> >
> > of the device at the switch to SVQ mode.  The new _F_SUSPEND feature is
> >
> > negotiated and stop device flow is changed so the state can be fetched trusting
> >
> > the device will not modify it.
> >
> >
> >
> > Since vhost backend must offer VHOST_F_LOG_ALL to be migratable, last patches
> >
> > but the last one add the needed migration blockers so vhost-vdpa can offer it
>
> "last patches but the last one"?
>

I think I solved all of the above in v3, thanks for notifying them!

Would it be possible to test with v3 too?

> Thanks.
>
> >
> > safely.  They also add the handling of this feature.
> >
> >
> >
> > Finally, the last patch makes virtio vhost-vdpa backend to offer
> >
> > VHOST_F_LOG_ALL so qemu migrate the device as long as no other blocker has been
> >
> > added.
> >
> >
> >
> > Successfully tested with vdpa_sim_net with patch [1] applied and with the qemu
> >
> > emulated device with vp_vdpa with some restrictions:
> >
> > * No CVQ. No feature that didn't work with SVQ previously (packed, ...)
> >
> > * VIRTIO_RING_F_STATE patches implementing [2].
> >
> > * Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
> >
> >    DPDK.
> >
> >
> >
> > Comments are welcome.
> >
> >
> >
> > v2:
> >
> > - Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is empty at
> >
> >    the check moment.
> >
> >
> >
> > v1:
> >
> > - Omit all code working with CVQ and block migration if the device supports
> >
> >    CVQ.
> >
> > - Remove spurious kick.
> Even with the spurious kick, datapath didn't resume at destination VM
> after LM as kick happened before DRIVER_OK. So IMO, it will be required
> that the vdpa parent driver simulates a kick after creating/starting HW
> rings.

Right, it did not solve the issue.

If I'm not wrong all vdpa drivers are moving to that model, checking
for new avail descriptors right after DRIVER_OK. Maybe it is better to
keep this discussion at patch 12/13 on RFC v2?

Thanks!

> >
> > - Move all possible checks for migration to vhost-vdpa instead of the net
> >
> >    backend. Move them to init code from start code.
> >
> > - Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net backend.
> >
> > - Properly split suspend after geting base and adding of status_reset patches.
> >
> > - Add possible TODOs to points where this series can improve in the future.
> >
> > - Check the state of migration using migration_in_setup and
> >
> >    migration_has_failed instead of checking all the possible migration status in
> >
> >    a switch.
> >
> > - Add TODO with possible low hand fruit using RESUME ops.
> >
> > - Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
> >
> >    their thing instead of adding a variable.
> >
> > - RFC v2 at https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html
> >
> >
> >
> > RFC v2:
> >
> > - Use a migration listener instead of a memory listener to know when
> >
> >    the migration starts.
> >
> > - Add stuff not picked with ASID patches, like enable rings after
> >
> >    driver_ok
> >
> > - Add rewinding on the migration src, not in dst
> >
> > - RFC v1 at https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html
> >
> >
> >
> > [1] https://lore.kernel.org/lkml/20230203142501.300125-1-eperezma@redhat.com/T/
> >
> > [2] https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html
> >
> >
> >
> > Eugenio Pérez (13):
> >
> >    vdpa net: move iova tree creation from init to start
> >
> >    vdpa: Negotiate _F_SUSPEND feature
> >
> >    vdpa: add vhost_vdpa_suspend
> >
> >    vdpa: move vhost reset after get vring base
> >
> >    vdpa: rewind at get_base, not set_base
> >
> >    vdpa net: allow VHOST_F_LOG_ALL
> >
> >    vdpa: add vdpa net migration state notifier
> >
> >    vdpa: disable RAM block discard only for the first device
> >
> >    vdpa net: block migration if the device has CVQ
> >
> >    vdpa: block migration if device has unsupported features
> >
> >    vdpa: block migration if dev does not have _F_SUSPEND
> >
> >    vdpa: block migration if SVQ does not admit a feature
> >
> >    vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
> >
> >
> >
> >   include/hw/virtio/vhost-backend.h |   4 +
> >
> >   hw/virtio/vhost-vdpa.c            | 126 +++++++++++++++-----
> >
> >   hw/virtio/vhost.c                 |   3 +
> >
> >   net/vhost-vdpa.c                  | 192 +++++++++++++++++++++++++-----
> >
> >   hw/virtio/trace-events            |   1 +
> >
> >   5 files changed, 267 insertions(+), 59 deletions(-)
> >
> >
> >
> > --
> >
> > 2.31.1
> >
> >
> >
> >
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-14 19:07         ` Eugenio Perez Martin
@ 2023-02-16  2:14             ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-16  2:14 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong



On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
> On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
>>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
>>>>> Only create iova_tree if and when it is needed.
>>>>>
>>>>> The cleanup keeps being responsible of last VQ but this change allows it
>>>>> to merge both cleanup functions.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>> ---
>>>>>     net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>>>     1 file changed, 71 insertions(+), 28 deletions(-)
>>>>>
>>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>>>> index de5ed8ff22..a9e6c8f28e 100644
>>>>> --- a/net/vhost-vdpa.c
>>>>> +++ b/net/vhost-vdpa.c
>>>>> @@ -178,13 +178,9 @@ err_init:
>>>>>     static void vhost_vdpa_cleanup(NetClientState *nc)
>>>>>     {
>>>>>         VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>>>
>>>>>         qemu_vfree(s->cvq_cmd_out_buffer);
>>>>>         qemu_vfree(s->status);
>>>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>> -    }
>>>>>         if (s->vhost_net) {
>>>>>             vhost_net_cleanup(s->vhost_net);
>>>>>             g_free(s->vhost_net);
>>>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>>>         return size;
>>>>>     }
>>>>>
>>>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>>>> +{
>>>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>>>> +
>>>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>>>> +}
>>>>> +
>>>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>>>> +{
>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>> +                                           v->iova_range.last);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>>>> +{
>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>> +
>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>> +
>>>>> +    if (v->index == 0) {
>>>>> +        vhost_vdpa_net_data_start_first(s);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>>>> +{
>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>> +    struct vhost_dev *dev;
>>>>> +
>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>> +
>>>>> +    dev = s->vhost_vdpa.dev;
>>>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>     static NetClientInfo net_vhost_vdpa_info = {
>>>>>             .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>>>             .size = sizeof(VhostVDPAState),
>>>>>             .receive = vhost_vdpa_receive,
>>>>> +        .start = vhost_vdpa_net_data_start,
>>>>> +        .stop = vhost_vdpa_net_client_stop,
>>>>>             .cleanup = vhost_vdpa_cleanup,
>>>>>             .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>>>             .has_ufo = vhost_vdpa_has_ufo,
>>>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>>>
>>>>>     static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>>>     {
>>>>> -    VhostVDPAState *s;
>>>>> +    VhostVDPAState *s, *s0;
>>>>>         struct vhost_vdpa *v;
>>>>>         uint64_t backend_features;
>>>>>         int64_t cvq_group;
>>>>> @@ -425,6 +475,15 @@ out:
>>>>>             return 0;
>>>>>         }
>>>>>
>>>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>> +    if (s0->vhost_vdpa.iova_tree) {
>>>>> +        /* SVQ is already configured for all virtqueues */
>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>> +    } else {
>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>> +                                           v->iova_range.last);
>>>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
>>>> should've allocated an iova tree on the first data vq. Is zero data vq
>>>> ever possible on net vhost-vdpa?
>>>>
>>> It's the case of the current qemu master when only CVQ is being
>>> shadowed. It's not that "there are no data vq": If that case were
>>> possible, CVQ vhost-vdpa state would be s0.
>>>
>>> The case is that since only CVQ vhost-vdpa is the one being migrated,
>>> only CVQ has an iova tree.
>> OK, so this corresponds to the case where live migration is not started
>> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
>> Thanks for explaining it!
>>
>>> With this series applied and with no migration running, the case is
>>> the same as before: only SVQ gets shadowed. When migration starts, all
>>> vqs are migrated, and share iova tree.
>> I wonder what is the reason to share the iova tree when migration
>> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
>>
>> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
>> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
>> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
>> collision I mentioned earlier:
>>
> There is no such change. This code only migrates devices with no CVQ,
> as they have their own difficulties.
>
> In the previous RFC there was no such change either. Since it's hard
> to modify passthrough devices IOVA tree, CVQ AS updates keep being
> VHOST_VDPA_NET_CVQ_ASID.
That's my understanding too, the current code doesn't support changing 
AS once it is set, although uAPI doesn't prohibit it.

> They both share the same IOVA tree though, just for simplicity.
It would be good to document this assumption somewhere in the code, it's 
not easy to infer userspace doesn't have the same view as that in the 
kernel in terms of the iova tree being used.

>   If
> address space exhaustion is a problem we can make them independent,
> but this complicates the code a little bit.
>
>> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
>> perm: 0x1 type: 2
>> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
>> perm: 0x3 type: 2
>> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
>> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
>> avail_user_addr: 0x2000 log_guest_addr: 0x0
>> :
>> :
>> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
>> perm: 0x1 type: 2
>> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
>> perm: 0x3 type: 2
>> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
>> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
>> avail_user_addr: 0x17000 log_guest_addr: 0x0
>> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
>> perm: 0x1 type: 2
>> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
>> perm: 0x3 type: 2
>> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
>> perm: 0x1 type: 2
>> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
>> perm: 0x3 type: 2
>> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
>> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
>> avail_user_addr: 0x1b400 log_guest_addr: 0x0
>> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
>> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
>> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
>> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
>> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
>> 0x3 type: 2
>> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
>> errno=14 (Bad address)
>> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
>> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
>> failed, unable to continue
>>
> I'm not sure how you get to this. Maybe you were able to start the
> migration because the CVQ migration blocker was not effectively added?
It's something else, below line at the start of 
vhost_vdpa_net_cvq_start() would override the shadow_data on the CVQ.

     v->shadow_data = s->always_svq;

Which leads to my previous question why shadow_data needs to apply to 
the CVQ, and why the userspace iova is shared between data queues and CVQ.

-Siwei


>
> Thanks!
>
>
>> Regards,
>> -Siwei
>>> Thanks!
>>>
>>>> Thanks,
>>>> -Siwei
>>>>> +    }
>>>>> +
>>>>>         r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>>>                                    vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>>>         if (unlikely(r < 0)) {
>>>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>>>         if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>>>             vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>>>             vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>>>> -        if (!s->always_svq) {
>>>>> -            /*
>>>>> -             * If only the CVQ is shadowed we can delete this safely.
>>>>> -             * If all the VQs are shadows this will be needed by the time the
>>>>> -             * device is started again to register SVQ vrings and similar.
>>>>> -             */
>>>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>> -        }
>>>>>         }
>>>>> +
>>>>> +    vhost_vdpa_net_client_stop(nc);
>>>>>     }
>>>>>
>>>>>     static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>                                            int nvqs,
>>>>>                                            bool is_datapath,
>>>>>                                            bool svq,
>>>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>>>> -                                       VhostIOVATree *iova_tree)
>>>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>>>     {
>>>>>         NetClientState *nc = NULL;
>>>>>         VhostVDPAState *s;
>>>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>         s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>>>         s->vhost_vdpa.iova_range = iova_range;
>>>>>         s->vhost_vdpa.shadow_data = svq;
>>>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>>>         if (!is_datapath) {
>>>>>             s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>>>                                                 vhost_vdpa_net_cvq_cmd_page_len());
>>>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>         uint64_t features;
>>>>>         int vdpa_device_fd;
>>>>>         g_autofree NetClientState **ncs = NULL;
>>>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>>>         struct vhost_vdpa_iova_range iova_range;
>>>>>         NetClientState *nc;
>>>>>         int queue_pairs, r, i = 0, has_cvq = 0;
>>>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>             goto err;
>>>>>         }
>>>>>
>>>>> -    if (opts->x_svq) {
>>>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>> -            goto err_svq;
>>>>> -        }
>>>>> -
>>>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>> +        goto err;
>>>>>         }
>>>>>
>>>>>         ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>         for (i = 0; i < queue_pairs; i++) {
>>>>>             ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>                                          vdpa_device_fd, i, 2, true, opts->x_svq,
>>>>> -                                     iova_range, iova_tree);
>>>>> +                                     iova_range);
>>>>>             if (!ncs[i])
>>>>>                 goto err;
>>>>>         }
>>>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>         if (has_cvq) {
>>>>>             nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>                                      vdpa_device_fd, i, 1, false,
>>>>> -                                 opts->x_svq, iova_range, iova_tree);
>>>>> +                                 opts->x_svq, iova_range);
>>>>>             if (!nc)
>>>>>                 goto err;
>>>>>         }
>>>>>
>>>>> -    /* iova_tree ownership belongs to last NetClientState */
>>>>> -    g_steal_pointer(&iova_tree);
>>>>>         return 0;
>>>>>
>>>>>     err:
>>>>> @@ -849,7 +893,6 @@ err:
>>>>>             }
>>>>>         }
>>>>>
>>>>> -err_svq:
>>>>>         qemu_close(vdpa_device_fd);
>>>>>
>>>>>         return -1;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
@ 2023-02-16  2:14             ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-16  2:14 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella



On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
> On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
>>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
>>>>> Only create iova_tree if and when it is needed.
>>>>>
>>>>> The cleanup keeps being responsible of last VQ but this change allows it
>>>>> to merge both cleanup functions.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>> ---
>>>>>     net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>>>     1 file changed, 71 insertions(+), 28 deletions(-)
>>>>>
>>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>>>> index de5ed8ff22..a9e6c8f28e 100644
>>>>> --- a/net/vhost-vdpa.c
>>>>> +++ b/net/vhost-vdpa.c
>>>>> @@ -178,13 +178,9 @@ err_init:
>>>>>     static void vhost_vdpa_cleanup(NetClientState *nc)
>>>>>     {
>>>>>         VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>>>
>>>>>         qemu_vfree(s->cvq_cmd_out_buffer);
>>>>>         qemu_vfree(s->status);
>>>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>> -    }
>>>>>         if (s->vhost_net) {
>>>>>             vhost_net_cleanup(s->vhost_net);
>>>>>             g_free(s->vhost_net);
>>>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>>>         return size;
>>>>>     }
>>>>>
>>>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>>>> +{
>>>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>>>> +
>>>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>>>> +}
>>>>> +
>>>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>>>> +{
>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>> +                                           v->iova_range.last);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>>>> +{
>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>> +
>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>> +
>>>>> +    if (v->index == 0) {
>>>>> +        vhost_vdpa_net_data_start_first(s);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>>>> +{
>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>> +    struct vhost_dev *dev;
>>>>> +
>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>> +
>>>>> +    dev = s->vhost_vdpa.dev;
>>>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>     static NetClientInfo net_vhost_vdpa_info = {
>>>>>             .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>>>             .size = sizeof(VhostVDPAState),
>>>>>             .receive = vhost_vdpa_receive,
>>>>> +        .start = vhost_vdpa_net_data_start,
>>>>> +        .stop = vhost_vdpa_net_client_stop,
>>>>>             .cleanup = vhost_vdpa_cleanup,
>>>>>             .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>>>             .has_ufo = vhost_vdpa_has_ufo,
>>>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>>>
>>>>>     static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>>>     {
>>>>> -    VhostVDPAState *s;
>>>>> +    VhostVDPAState *s, *s0;
>>>>>         struct vhost_vdpa *v;
>>>>>         uint64_t backend_features;
>>>>>         int64_t cvq_group;
>>>>> @@ -425,6 +475,15 @@ out:
>>>>>             return 0;
>>>>>         }
>>>>>
>>>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>> +    if (s0->vhost_vdpa.iova_tree) {
>>>>> +        /* SVQ is already configured for all virtqueues */
>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>> +    } else {
>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>> +                                           v->iova_range.last);
>>>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
>>>> should've allocated an iova tree on the first data vq. Is zero data vq
>>>> ever possible on net vhost-vdpa?
>>>>
>>> It's the case of the current qemu master when only CVQ is being
>>> shadowed. It's not that "there are no data vq": If that case were
>>> possible, CVQ vhost-vdpa state would be s0.
>>>
>>> The case is that since only CVQ vhost-vdpa is the one being migrated,
>>> only CVQ has an iova tree.
>> OK, so this corresponds to the case where live migration is not started
>> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
>> Thanks for explaining it!
>>
>>> With this series applied and with no migration running, the case is
>>> the same as before: only SVQ gets shadowed. When migration starts, all
>>> vqs are migrated, and share iova tree.
>> I wonder what is the reason to share the iova tree when migration
>> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
>>
>> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
>> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
>> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
>> collision I mentioned earlier:
>>
> There is no such change. This code only migrates devices with no CVQ,
> as they have their own difficulties.
>
> In the previous RFC there was no such change either. Since it's hard
> to modify passthrough devices IOVA tree, CVQ AS updates keep being
> VHOST_VDPA_NET_CVQ_ASID.
That's my understanding too, the current code doesn't support changing 
AS once it is set, although uAPI doesn't prohibit it.

> They both share the same IOVA tree though, just for simplicity.
It would be good to document this assumption somewhere in the code, it's 
not easy to infer userspace doesn't have the same view as that in the 
kernel in terms of the iova tree being used.

>   If
> address space exhaustion is a problem we can make them independent,
> but this complicates the code a little bit.
>
>> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
>> perm: 0x1 type: 2
>> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
>> perm: 0x3 type: 2
>> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
>> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
>> avail_user_addr: 0x2000 log_guest_addr: 0x0
>> :
>> :
>> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
>> perm: 0x1 type: 2
>> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
>> perm: 0x3 type: 2
>> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
>> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
>> avail_user_addr: 0x17000 log_guest_addr: 0x0
>> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
>> perm: 0x1 type: 2
>> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
>> perm: 0x3 type: 2
>> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
>> perm: 0x1 type: 2
>> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
>> perm: 0x3 type: 2
>> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
>> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
>> avail_user_addr: 0x1b400 log_guest_addr: 0x0
>> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
>> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
>> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
>> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
>> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
>> 0x3 type: 2
>> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
>> errno=14 (Bad address)
>> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
>> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
>> failed, unable to continue
>>
> I'm not sure how you get to this. Maybe you were able to start the
> migration because the CVQ migration blocker was not effectively added?
It's something else, below line at the start of 
vhost_vdpa_net_cvq_start() would override the shadow_data on the CVQ.

     v->shadow_data = s->always_svq;

Which leads to my previous question why shadow_data needs to apply to 
the CVQ, and why the userspace iova is shared between data queues and CVQ.

-Siwei


>
> Thanks!
>
>
>> Regards,
>> -Siwei
>>> Thanks!
>>>
>>>> Thanks,
>>>> -Siwei
>>>>> +    }
>>>>> +
>>>>>         r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>>>                                    vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>>>         if (unlikely(r < 0)) {
>>>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>>>         if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>>>             vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>>>             vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>>>> -        if (!s->always_svq) {
>>>>> -            /*
>>>>> -             * If only the CVQ is shadowed we can delete this safely.
>>>>> -             * If all the VQs are shadows this will be needed by the time the
>>>>> -             * device is started again to register SVQ vrings and similar.
>>>>> -             */
>>>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>> -        }
>>>>>         }
>>>>> +
>>>>> +    vhost_vdpa_net_client_stop(nc);
>>>>>     }
>>>>>
>>>>>     static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>                                            int nvqs,
>>>>>                                            bool is_datapath,
>>>>>                                            bool svq,
>>>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>>>> -                                       VhostIOVATree *iova_tree)
>>>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>>>     {
>>>>>         NetClientState *nc = NULL;
>>>>>         VhostVDPAState *s;
>>>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>         s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>>>         s->vhost_vdpa.iova_range = iova_range;
>>>>>         s->vhost_vdpa.shadow_data = svq;
>>>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>>>         if (!is_datapath) {
>>>>>             s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>>>                                                 vhost_vdpa_net_cvq_cmd_page_len());
>>>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>         uint64_t features;
>>>>>         int vdpa_device_fd;
>>>>>         g_autofree NetClientState **ncs = NULL;
>>>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>>>         struct vhost_vdpa_iova_range iova_range;
>>>>>         NetClientState *nc;
>>>>>         int queue_pairs, r, i = 0, has_cvq = 0;
>>>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>             goto err;
>>>>>         }
>>>>>
>>>>> -    if (opts->x_svq) {
>>>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>> -            goto err_svq;
>>>>> -        }
>>>>> -
>>>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>> +        goto err;
>>>>>         }
>>>>>
>>>>>         ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>         for (i = 0; i < queue_pairs; i++) {
>>>>>             ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>                                          vdpa_device_fd, i, 2, true, opts->x_svq,
>>>>> -                                     iova_range, iova_tree);
>>>>> +                                     iova_range);
>>>>>             if (!ncs[i])
>>>>>                 goto err;
>>>>>         }
>>>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>         if (has_cvq) {
>>>>>             nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>                                      vdpa_device_fd, i, 1, false,
>>>>> -                                 opts->x_svq, iova_range, iova_tree);
>>>>> +                                 opts->x_svq, iova_range);
>>>>>             if (!nc)
>>>>>                 goto err;
>>>>>         }
>>>>>
>>>>> -    /* iova_tree ownership belongs to last NetClientState */
>>>>> -    g_steal_pointer(&iova_tree);
>>>>>         return 0;
>>>>>
>>>>>     err:
>>>>> @@ -849,7 +893,6 @@ err:
>>>>>             }
>>>>>         }
>>>>>
>>>>> -err_svq:
>>>>>         qemu_close(vdpa_device_fd);
>>>>>
>>>>>         return -1;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-16  2:14             ` Si-Wei Liu
  (?)
@ 2023-02-16  7:35             ` Eugenio Perez Martin
  2023-02-17  7:38                 ` Si-Wei Liu
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-16  7:35 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella

On Thu, Feb 16, 2023 at 3:15 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
> > On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
> >>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>
> >>>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> >>>>> Only create iova_tree if and when it is needed.
> >>>>>
> >>>>> The cleanup keeps being responsible of last VQ but this change allows it
> >>>>> to merge both cleanup functions.
> >>>>>
> >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>>>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>>>> ---
> >>>>>     net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
> >>>>>     1 file changed, 71 insertions(+), 28 deletions(-)
> >>>>>
> >>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>>>> index de5ed8ff22..a9e6c8f28e 100644
> >>>>> --- a/net/vhost-vdpa.c
> >>>>> +++ b/net/vhost-vdpa.c
> >>>>> @@ -178,13 +178,9 @@ err_init:
> >>>>>     static void vhost_vdpa_cleanup(NetClientState *nc)
> >>>>>     {
> >>>>>         VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
> >>>>>
> >>>>>         qemu_vfree(s->cvq_cmd_out_buffer);
> >>>>>         qemu_vfree(s->status);
> >>>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>>>> -    }
> >>>>>         if (s->vhost_net) {
> >>>>>             vhost_net_cleanup(s->vhost_net);
> >>>>>             g_free(s->vhost_net);
> >>>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> >>>>>         return size;
> >>>>>     }
> >>>>>
> >>>>> +/** From any vdpa net client, get the netclient of first queue pair */
> >>>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >>>>> +{
> >>>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
> >>>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> >>>>> +
> >>>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> >>>>> +}
> >>>>> +
> >>>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >>>>> +{
> >>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>>>> +
> >>>>> +    if (v->shadow_vqs_enabled) {
> >>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>>>> +                                           v->iova_range.last);
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> >>>>> +{
> >>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>>>> +
> >>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>>> +
> >>>>> +    if (v->index == 0) {
> >>>>> +        vhost_vdpa_net_data_start_first(s);
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (v->shadow_vqs_enabled) {
> >>>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>>>> +    }
> >>>>> +
> >>>>> +    return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >>>>> +{
> >>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>>> +    struct vhost_dev *dev;
> >>>>> +
> >>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>>> +
> >>>>> +    dev = s->vhost_vdpa.dev;
> >>>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>>     static NetClientInfo net_vhost_vdpa_info = {
> >>>>>             .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> >>>>>             .size = sizeof(VhostVDPAState),
> >>>>>             .receive = vhost_vdpa_receive,
> >>>>> +        .start = vhost_vdpa_net_data_start,
> >>>>> +        .stop = vhost_vdpa_net_client_stop,
> >>>>>             .cleanup = vhost_vdpa_cleanup,
> >>>>>             .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> >>>>>             .has_ufo = vhost_vdpa_has_ufo,
> >>>>> @@ -351,7 +401,7 @@ dma_map_err:
> >>>>>
> >>>>>     static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>>>     {
> >>>>> -    VhostVDPAState *s;
> >>>>> +    VhostVDPAState *s, *s0;
> >>>>>         struct vhost_vdpa *v;
> >>>>>         uint64_t backend_features;
> >>>>>         int64_t cvq_group;
> >>>>> @@ -425,6 +475,15 @@ out:
> >>>>>             return 0;
> >>>>>         }
> >>>>>
> >>>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>>>> +    if (s0->vhost_vdpa.iova_tree) {
> >>>>> +        /* SVQ is already configured for all virtqueues */
> >>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>>>> +    } else {
> >>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>>>> +                                           v->iova_range.last);
> >>>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
> >>>> should've allocated an iova tree on the first data vq. Is zero data vq
> >>>> ever possible on net vhost-vdpa?
> >>>>
> >>> It's the case of the current qemu master when only CVQ is being
> >>> shadowed. It's not that "there are no data vq": If that case were
> >>> possible, CVQ vhost-vdpa state would be s0.
> >>>
> >>> The case is that since only CVQ vhost-vdpa is the one being migrated,
> >>> only CVQ has an iova tree.
> >> OK, so this corresponds to the case where live migration is not started
> >> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
> >> Thanks for explaining it!
> >>
> >>> With this series applied and with no migration running, the case is
> >>> the same as before: only SVQ gets shadowed. When migration starts, all
> >>> vqs are migrated, and share iova tree.
> >> I wonder what is the reason to share the iova tree when migration
> >> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
> >>
> >> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
> >> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
> >> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
> >> collision I mentioned earlier:
> >>
> > There is no such change. This code only migrates devices with no CVQ,
> > as they have their own difficulties.
> >
> > In the previous RFC there was no such change either. Since it's hard
> > to modify passthrough devices IOVA tree, CVQ AS updates keep being
> > VHOST_VDPA_NET_CVQ_ASID.
> That's my understanding too, the current code doesn't support changing
> AS once it is set, although uAPI doesn't prohibit it.
>
> > They both share the same IOVA tree though, just for simplicity.
> It would be good to document this assumption somewhere in the code, it's
> not easy to infer userspace doesn't have the same view as that in the
> kernel in terms of the iova tree being used.
>
> >   If
> > address space exhaustion is a problem we can make them independent,
> > but this complicates the code a little bit.
> >
> >> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
> >> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
> >> perm: 0x1 type: 2
> >> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
> >> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
> >> perm: 0x3 type: 2
> >> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
> >> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
> >> avail_user_addr: 0x2000 log_guest_addr: 0x0
> >> :
> >> :
> >> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
> >> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
> >> perm: 0x1 type: 2
> >> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
> >> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
> >> perm: 0x3 type: 2
> >> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
> >> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
> >> avail_user_addr: 0x17000 log_guest_addr: 0x0
> >> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
> >> perm: 0x1 type: 2
> >> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
> >> perm: 0x3 type: 2
> >> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
> >> perm: 0x1 type: 2
> >> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
> >> perm: 0x3 type: 2
> >> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
> >> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
> >> avail_user_addr: 0x1b400 log_guest_addr: 0x0
> >> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
> >> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
> >> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
> >> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
> >> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
> >> 0x3 type: 2
> >> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
> >> errno=14 (Bad address)
> >> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
> >> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
> >> failed, unable to continue
> >>
> > I'm not sure how you get to this. Maybe you were able to start the
> > migration because the CVQ migration blocker was not effectively added?
> It's something else, below line at the start of
> vhost_vdpa_net_cvq_start() would override the shadow_data on the CVQ.
>
>      v->shadow_data = s->always_svq;
>
> Which leads to my previous question why shadow_data needs to apply to
> the CVQ
>

Ok, I'm proposing some documentation here. I'll send a new patch
adding it to the sources if you think it is complete.

Shadow_data needs to apply to CVQ because memory_listener is
registered against CVQ, and memory listener needs to know if data vqs
are passthrough or shadowed. We could apply a memory register to a
different vhost_vdpa but then its lifecycle gets complicated.
---

For completion, the original discussion was [1].

> and why the userspace iova is shared between data queues and CVQ.

It's not shared unless the device does not support ASID. They only
share the iova tree because iova tree itself is not used for tracking
memory itself but only translations, so its lifecycle is easier. Each
piece of memory's lifecycle is tracked differently:
* Guest's memory is tracked by the memory listener itself, so we got
all the regions at register / unregister and in its own updates.
* SVQ vrings are tracked in vhost_vdpa->shadow_vqs[i].
* CVQ shadow buffers are tracked in net VhostVDPAState.
---

I'll send a new series adding the two pieces of doc if you think they
are complete. Please let me know if you'd add or remove something.

Note that this code is already on qemu master so this doc should not
block this series, correct?

Thanks!

[1] https://mail.gnu.org/archive/html/qemu-devel/2022-11/msg02033.html

> -Siwei
>
>
> >
> > Thanks!
> >
> >
> >> Regards,
> >> -Siwei
> >>> Thanks!
> >>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>> +    }
> >>>>> +
> >>>>>         r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> >>>>>                                    vhost_vdpa_net_cvq_cmd_page_len(), false);
> >>>>>         if (unlikely(r < 0)) {
> >>>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >>>>>         if (s->vhost_vdpa.shadow_vqs_enabled) {
> >>>>>             vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> >>>>>             vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> >>>>> -        if (!s->always_svq) {
> >>>>> -            /*
> >>>>> -             * If only the CVQ is shadowed we can delete this safely.
> >>>>> -             * If all the VQs are shadows this will be needed by the time the
> >>>>> -             * device is started again to register SVQ vrings and similar.
> >>>>> -             */
> >>>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>>>> -        }
> >>>>>         }
> >>>>> +
> >>>>> +    vhost_vdpa_net_client_stop(nc);
> >>>>>     }
> >>>>>
> >>>>>     static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> >>>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>>>                                            int nvqs,
> >>>>>                                            bool is_datapath,
> >>>>>                                            bool svq,
> >>>>> -                                       struct vhost_vdpa_iova_range iova_range,
> >>>>> -                                       VhostIOVATree *iova_tree)
> >>>>> +                                       struct vhost_vdpa_iova_range iova_range)
> >>>>>     {
> >>>>>         NetClientState *nc = NULL;
> >>>>>         VhostVDPAState *s;
> >>>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>>>         s->vhost_vdpa.shadow_vqs_enabled = svq;
> >>>>>         s->vhost_vdpa.iova_range = iova_range;
> >>>>>         s->vhost_vdpa.shadow_data = svq;
> >>>>> -    s->vhost_vdpa.iova_tree = iova_tree;
> >>>>>         if (!is_datapath) {
> >>>>>             s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >>>>>                                                 vhost_vdpa_net_cvq_cmd_page_len());
> >>>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>         uint64_t features;
> >>>>>         int vdpa_device_fd;
> >>>>>         g_autofree NetClientState **ncs = NULL;
> >>>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >>>>>         struct vhost_vdpa_iova_range iova_range;
> >>>>>         NetClientState *nc;
> >>>>>         int queue_pairs, r, i = 0, has_cvq = 0;
> >>>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>             goto err;
> >>>>>         }
> >>>>>
> >>>>> -    if (opts->x_svq) {
> >>>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>>>> -            goto err_svq;
> >>>>> -        }
> >>>>> -
> >>>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> >>>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>>>> +        goto err;
> >>>>>         }
> >>>>>
> >>>>>         ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >>>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>         for (i = 0; i < queue_pairs; i++) {
> >>>>>             ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>>>                                          vdpa_device_fd, i, 2, true, opts->x_svq,
> >>>>> -                                     iova_range, iova_tree);
> >>>>> +                                     iova_range);
> >>>>>             if (!ncs[i])
> >>>>>                 goto err;
> >>>>>         }
> >>>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>         if (has_cvq) {
> >>>>>             nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>>>                                      vdpa_device_fd, i, 1, false,
> >>>>> -                                 opts->x_svq, iova_range, iova_tree);
> >>>>> +                                 opts->x_svq, iova_range);
> >>>>>             if (!nc)
> >>>>>                 goto err;
> >>>>>         }
> >>>>>
> >>>>> -    /* iova_tree ownership belongs to last NetClientState */
> >>>>> -    g_steal_pointer(&iova_tree);
> >>>>>         return 0;
> >>>>>
> >>>>>     err:
> >>>>> @@ -849,7 +893,6 @@ err:
> >>>>>             }
> >>>>>         }
> >>>>>
> >>>>> -err_svq:
> >>>>>         qemu_close(vdpa_device_fd);
> >>>>>
> >>>>>         return -1;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration
  2023-02-15 18:40   ` Eugenio Perez Martin
@ 2023-02-16 13:50     ` Lei Yang
  0 siblings, 0 replies; 68+ messages in thread
From: Lei Yang @ 2023-02-16 13:50 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Gautam Dawar, qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

QE tested this series's v3 again. Creating two vdpa_sim devices, and
boot two VMs without shadow virtqueues. The migration was successful and
everything worked fine.

Tested-by: Lei Yang <leiyang@redhat.com>

Eugenio Perez Martin <eperezma@redhat.com> 于2023年2月16日周四 02:41写道:
>
> On Fri, Feb 10, 2023 at 1:58 PM Gautam Dawar <gdawar@amd.com> wrote:
> >
> > Hi Eugenio,
> >
> > I've tested this patch series on Xilinx/AMD SN1022 device without
> > control vq and VM Live Migration between two hosts worked fine.
> >
> > Tested-by: Gautam Dawar <gautam.dawar@amd.com>
> >
>
> Thanks for the testing!
>
> >
> > Here is some minor feedback:
> >
> > Pls fix the typo (Dynamycally -> Dynamically) in the Subject.
> >
> > On 2/8/23 15:12, Eugenio Pérez wrote:
> > > CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> > >
> > >
> > > It's possible to migrate vdpa net devices if they are shadowed from the
> > >
> > > start.  But to always shadow the dataplane is to effectively break its host
> > >
> > > passthrough, so its not convenient in vDPA scenarios.
> > I believe you meant efficient instead of convenient.
> > >
> > >
> > >
> > > This series enables dynamically switching to shadow mode only at
> > >
> > > migration time.  This allows full data virtqueues passthrough all the
> > >
> > > time qemu is not migrating.
> > >
> > >
> > >
> > > In this series only net devices with no CVQ are migratable.  CVQ adds
> > >
> > > additional state that would make the series bigger and still had some
> > >
> > > controversy on previous RFC, so let's split it.
> > >
> > >
> > >
> > > The first patch delays the creation of the iova tree until it is really needed,
> > >
> > > and makes it easier to dynamically move from and to SVQ mode.
> > It would help adding some detail on the iova tree being referred to here.
> > >
> > >
> > >
> > > Next patches from 02 to 05 handle the suspending and getting of vq state (base)
> > >
> > > of the device at the switch to SVQ mode.  The new _F_SUSPEND feature is
> > >
> > > negotiated and stop device flow is changed so the state can be fetched trusting
> > >
> > > the device will not modify it.
> > >
> > >
> > >
> > > Since vhost backend must offer VHOST_F_LOG_ALL to be migratable, last patches
> > >
> > > but the last one add the needed migration blockers so vhost-vdpa can offer it
> >
> > "last patches but the last one"?
> >
>
> I think I solved all of the above in v3, thanks for notifying them!
>
> Would it be possible to test with v3 too?
>
> > Thanks.
> >
> > >
> > > safely.  They also add the handling of this feature.
> > >
> > >
> > >
> > > Finally, the last patch makes virtio vhost-vdpa backend to offer
> > >
> > > VHOST_F_LOG_ALL so qemu migrate the device as long as no other blocker has been
> > >
> > > added.
> > >
> > >
> > >
> > > Successfully tested with vdpa_sim_net with patch [1] applied and with the qemu
> > >
> > > emulated device with vp_vdpa with some restrictions:
> > >
> > > * No CVQ. No feature that didn't work with SVQ previously (packed, ...)
> > >
> > > * VIRTIO_RING_F_STATE patches implementing [2].
> > >
> > > * Expose _F_SUSPEND, but ignore it and suspend on ring state fetch like
> > >
> > >    DPDK.
> > >
> > >
> > >
> > > Comments are welcome.
> > >
> > >
> > >
> > > v2:
> > >
> > > - Check for SUSPEND in vhost_dev.backend_cap, as .backend_features is empty at
> > >
> > >    the check moment.
> > >
> > >
> > >
> > > v1:
> > >
> > > - Omit all code working with CVQ and block migration if the device supports
> > >
> > >    CVQ.
> > >
> > > - Remove spurious kick.
> > Even with the spurious kick, datapath didn't resume at destination VM
> > after LM as kick happened before DRIVER_OK. So IMO, it will be required
> > that the vdpa parent driver simulates a kick after creating/starting HW
> > rings.
>
> Right, it did not solve the issue.
>
> If I'm not wrong all vdpa drivers are moving to that model, checking
> for new avail descriptors right after DRIVER_OK. Maybe it is better to
> keep this discussion at patch 12/13 on RFC v2?
>
> Thanks!
>
> > >
> > > - Move all possible checks for migration to vhost-vdpa instead of the net
> > >
> > >    backend. Move them to init code from start code.
> > >
> > > - Suspend on vhost_vdpa_dev_start(false) instead of in vhost-vdpa net backend.
> > >
> > > - Properly split suspend after geting base and adding of status_reset patches.
> > >
> > > - Add possible TODOs to points where this series can improve in the future.
> > >
> > > - Check the state of migration using migration_in_setup and
> > >
> > >    migration_has_failed instead of checking all the possible migration status in
> > >
> > >    a switch.
> > >
> > > - Add TODO with possible low hand fruit using RESUME ops.
> > >
> > > - Always offer _F_LOG from virtio/vhost-vdpa and let migration blockers do
> > >
> > >    their thing instead of adding a variable.
> > >
> > > - RFC v2 at https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02574.html
> > >
> > >
> > >
> > > RFC v2:
> > >
> > > - Use a migration listener instead of a memory listener to know when
> > >
> > >    the migration starts.
> > >
> > > - Add stuff not picked with ASID patches, like enable rings after
> > >
> > >    driver_ok
> > >
> > > - Add rewinding on the migration src, not in dst
> > >
> > > - RFC v1 at https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg01664.html
> > >
> > >
> > >
> > > [1] https://lore.kernel.org/lkml/20230203142501.300125-1-eperezma@redhat.com/T/
> > >
> > > [2] https://lists.oasis-open.org/archives/virtio-comment/202103/msg00036.html
> > >
> > >
> > >
> > > Eugenio Pérez (13):
> > >
> > >    vdpa net: move iova tree creation from init to start
> > >
> > >    vdpa: Negotiate _F_SUSPEND feature
> > >
> > >    vdpa: add vhost_vdpa_suspend
> > >
> > >    vdpa: move vhost reset after get vring base
> > >
> > >    vdpa: rewind at get_base, not set_base
> > >
> > >    vdpa net: allow VHOST_F_LOG_ALL
> > >
> > >    vdpa: add vdpa net migration state notifier
> > >
> > >    vdpa: disable RAM block discard only for the first device
> > >
> > >    vdpa net: block migration if the device has CVQ
> > >
> > >    vdpa: block migration if device has unsupported features
> > >
> > >    vdpa: block migration if dev does not have _F_SUSPEND
> > >
> > >    vdpa: block migration if SVQ does not admit a feature
> > >
> > >    vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
> > >
> > >
> > >
> > >   include/hw/virtio/vhost-backend.h |   4 +
> > >
> > >   hw/virtio/vhost-vdpa.c            | 126 +++++++++++++++-----
> > >
> > >   hw/virtio/vhost.c                 |   3 +
> > >
> > >   net/vhost-vdpa.c                  | 192 +++++++++++++++++++++++++-----
> > >
> > >   hw/virtio/trace-events            |   1 +
> > >
> > >   5 files changed, 267 insertions(+), 59 deletions(-)
> > >
> > >
> > >
> > > --
> > >
> > > 2.31.1
> > >
> > >
> > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-16  7:35             ` Eugenio Perez Martin
@ 2023-02-17  7:38                 ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-17  7:38 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella



On 2/15/2023 11:35 PM, Eugenio Perez Martin wrote:
> On Thu, Feb 16, 2023 at 3:15 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
>>> On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
>>>>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
>>>>>>> Only create iova_tree if and when it is needed.
>>>>>>>
>>>>>>> The cleanup keeps being responsible of last VQ but this change allows it
>>>>>>> to merge both cleanup functions.
>>>>>>>
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>>>> ---
>>>>>>>      net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>>>>>      1 file changed, 71 insertions(+), 28 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>>>>>> index de5ed8ff22..a9e6c8f28e 100644
>>>>>>> --- a/net/vhost-vdpa.c
>>>>>>> +++ b/net/vhost-vdpa.c
>>>>>>> @@ -178,13 +178,9 @@ err_init:
>>>>>>>      static void vhost_vdpa_cleanup(NetClientState *nc)
>>>>>>>      {
>>>>>>>          VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>>>>>
>>>>>>>          qemu_vfree(s->cvq_cmd_out_buffer);
>>>>>>>          qemu_vfree(s->status);
>>>>>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>>>> -    }
>>>>>>>          if (s->vhost_net) {
>>>>>>>              vhost_net_cleanup(s->vhost_net);
>>>>>>>              g_free(s->vhost_net);
>>>>>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>>>>>          return size;
>>>>>>>      }
>>>>>>>
>>>>>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>>>>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>>>>>> +{
>>>>>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>>>>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>>>>>> +
>>>>>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>>>>>> +{
>>>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>>>> +                                           v->iova_range.last);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>>>>>> +{
>>>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>>>> +
>>>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>>>> +
>>>>>>> +    if (v->index == 0) {
>>>>>>> +        vhost_vdpa_net_data_start_first(s);
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>>>>>> +{
>>>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>>>> +    struct vhost_dev *dev;
>>>>>>> +
>>>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>>>> +
>>>>>>> +    dev = s->vhost_vdpa.dev;
>>>>>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>>      static NetClientInfo net_vhost_vdpa_info = {
>>>>>>>              .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>>>>>              .size = sizeof(VhostVDPAState),
>>>>>>>              .receive = vhost_vdpa_receive,
>>>>>>> +        .start = vhost_vdpa_net_data_start,
>>>>>>> +        .stop = vhost_vdpa_net_client_stop,
>>>>>>>              .cleanup = vhost_vdpa_cleanup,
>>>>>>>              .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>>>>>              .has_ufo = vhost_vdpa_has_ufo,
>>>>>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>>>>>
>>>>>>>      static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>>>>>      {
>>>>>>> -    VhostVDPAState *s;
>>>>>>> +    VhostVDPAState *s, *s0;
>>>>>>>          struct vhost_vdpa *v;
>>>>>>>          uint64_t backend_features;
>>>>>>>          int64_t cvq_group;
>>>>>>> @@ -425,6 +475,15 @@ out:
>>>>>>>              return 0;
>>>>>>>          }
>>>>>>>
>>>>>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>>>> +    if (s0->vhost_vdpa.iova_tree) {
>>>>>>> +        /* SVQ is already configured for all virtqueues */
>>>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>>>> +    } else {
>>>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>>>> +                                           v->iova_range.last);
>>>>>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
>>>>>> should've allocated an iova tree on the first data vq. Is zero data vq
>>>>>> ever possible on net vhost-vdpa?
>>>>>>
>>>>> It's the case of the current qemu master when only CVQ is being
>>>>> shadowed. It's not that "there are no data vq": If that case were
>>>>> possible, CVQ vhost-vdpa state would be s0.
>>>>>
>>>>> The case is that since only CVQ vhost-vdpa is the one being migrated,
>>>>> only CVQ has an iova tree.
>>>> OK, so this corresponds to the case where live migration is not started
>>>> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
>>>> Thanks for explaining it!
>>>>
>>>>> With this series applied and with no migration running, the case is
>>>>> the same as before: only SVQ gets shadowed. When migration starts, all
>>>>> vqs are migrated, and share iova tree.
>>>> I wonder what is the reason to share the iova tree when migration
>>>> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
>>>>
>>>> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
>>>> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
>>>> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
>>>> collision I mentioned earlier:
>>>>
>>> There is no such change. This code only migrates devices with no CVQ,
>>> as they have their own difficulties.
>>>
>>> In the previous RFC there was no such change either. Since it's hard
>>> to modify passthrough devices IOVA tree, CVQ AS updates keep being
>>> VHOST_VDPA_NET_CVQ_ASID.
>> That's my understanding too, the current code doesn't support changing
>> AS once it is set, although uAPI doesn't prohibit it.
>>
>>> They both share the same IOVA tree though, just for simplicity.
>> It would be good to document this assumption somewhere in the code, it's
>> not easy to infer userspace doesn't have the same view as that in the
>> kernel in terms of the iova tree being used.
>>
>>>    If
>>> address space exhaustion is a problem we can make them independent,
>>> but this complicates the code a little bit.
>>>
>>>> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
>>>> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
>>>> avail_user_addr: 0x2000 log_guest_addr: 0x0
>>>> :
>>>> :
>>>> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
>>>> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
>>>> avail_user_addr: 0x17000 log_guest_addr: 0x0
>>>> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
>>>> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
>>>> avail_user_addr: 0x1b400 log_guest_addr: 0x0
>>>> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
>>>> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
>>>> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
>>>> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
>>>> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
>>>> 0x3 type: 2
>>>> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
>>>> errno=14 (Bad address)
>>>> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
>>>> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
>>>> failed, unable to continue
>>>>
>>> I'm not sure how you get to this. Maybe you were able to start the
>>> migration because the CVQ migration blocker was not effectively added?
>> It's something else, below line at the start of
>> vhost_vdpa_net_cvq_start() would override the shadow_data on the CVQ.
>>
>>       v->shadow_data = s->always_svq;
>>
>> Which leads to my previous question why shadow_data needs to apply to
>> the CVQ
>>
> Ok, I'm proposing some documentation here. I'll send a new patch
> adding it to the sources if you think it is complete.
It's fine, I don't intend to block on this. But what I really meant is 
that there is a bug in the line I pointed out earlier. shadow_data is 
already set by net_vhost_vdpa_init() at init time (for the x-svq=on 
case). For the x-svq=off case, vhost_vdpa_net_log_global_enable() sets 
shadow_data to true on the CVQ within the migration notifier, that's 
correct and expected; however, the subsequent vhost_net_start() function 
right after would call into vhost_vdpa_net_cvq_start(). The latter 
inadvertently sets the CVQ's shadow_data back to false, which defeats 
the purpose of using shadow_data to indicate translating iova on 
shadowed CVQ using the *shared* iova tree. You can say migration with 
CVQ is blocked anyway so this code path doesn't get exposed for now, but 
that somehow causes conflict and confusions for readers to understand 
what the code attempts to achieve. Maybe remove this line or move this 
line to vhost_vdpa_net_cvq_stop()?

> Shadow_data needs to apply to CVQ because memory_listener is
> registered against CVQ,
It's bound to the last virtqueue pair which is not necessarily a CVQ.
>   and memory listener needs to know if data vqs
> are passthrough or shadowed. We could apply a memory register to a
> different vhost_vdpa but then its lifecycle gets complicated.
The lifecycle can remain same but the code will be a lot messier for 
sure. :)

> ---
>
> For completion, the original discussion was [1].
>
>> and why the userspace iova is shared between data queues and CVQ.
> It's not shared unless the device does not support ASID. They only
> share the iova tree because iova tree itself is not used for tracking
> memory itself but only translations, so its lifecycle is easier. Each
> piece of memory's lifecycle is tracked differently:
> * Guest's memory is tracked by the memory listener itself, so we got
> all the regions at register / unregister and in its own updates.
> * SVQ vrings are tracked in vhost_vdpa->shadow_vqs[i].
> * CVQ shadow buffers are tracked in net VhostVDPAState.
> ---
>
> I'll send a new series adding the two pieces of doc if you think they
> are complete. Please let me know if you'd add or remove something.
No you don't have to. Just leave it as-is.

What I thought about making two iova trees independent was not just 
meant for translation but also keep sync with kernel's IOVA address 
space, so that it causes less fluctuations by sending down thinner iova 
update for the unmap and map cycle when switching mode. For now sharing 
the iova tree is fine. I'll see if there's other alternative to keep 
guest memory identity mapped 1:1 on the iova tree across the mode 
switch. Future work you don't have to worry about now.

Thanks,
-Siwei

>
> Note that this code is already on qemu master so this doc should not
> block this series, correct?
>
> Thanks!
>
> [1] https://mail.gnu.org/archive/html/qemu-devel/2022-11/msg02033.html
>
>> -Siwei
>>
>>
>>> Thanks!
>>>
>>>
>>>> Regards,
>>>> -Siwei
>>>>> Thanks!
>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>> +    }
>>>>>>> +
>>>>>>>          r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>>>>>                                     vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>>>>>          if (unlikely(r < 0)) {
>>>>>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>>>>>          if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>>>>>              vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>>>>>              vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>>>>>> -        if (!s->always_svq) {
>>>>>>> -            /*
>>>>>>> -             * If only the CVQ is shadowed we can delete this safely.
>>>>>>> -             * If all the VQs are shadows this will be needed by the time the
>>>>>>> -             * device is started again to register SVQ vrings and similar.
>>>>>>> -             */
>>>>>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>>>> -        }
>>>>>>>          }
>>>>>>> +
>>>>>>> +    vhost_vdpa_net_client_stop(nc);
>>>>>>>      }
>>>>>>>
>>>>>>>      static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>>>>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>>>                                             int nvqs,
>>>>>>>                                             bool is_datapath,
>>>>>>>                                             bool svq,
>>>>>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>>>>>> -                                       VhostIOVATree *iova_tree)
>>>>>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>>>>>      {
>>>>>>>          NetClientState *nc = NULL;
>>>>>>>          VhostVDPAState *s;
>>>>>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>>>          s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>>>>>          s->vhost_vdpa.iova_range = iova_range;
>>>>>>>          s->vhost_vdpa.shadow_data = svq;
>>>>>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>>>>>          if (!is_datapath) {
>>>>>>>              s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>>>>>                                                  vhost_vdpa_net_cvq_cmd_page_len());
>>>>>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>          uint64_t features;
>>>>>>>          int vdpa_device_fd;
>>>>>>>          g_autofree NetClientState **ncs = NULL;
>>>>>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>>>>>          struct vhost_vdpa_iova_range iova_range;
>>>>>>>          NetClientState *nc;
>>>>>>>          int queue_pairs, r, i = 0, has_cvq = 0;
>>>>>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>              goto err;
>>>>>>>          }
>>>>>>>
>>>>>>> -    if (opts->x_svq) {
>>>>>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>>>> -            goto err_svq;
>>>>>>> -        }
>>>>>>> -
>>>>>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>>>>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>>>> +        goto err;
>>>>>>>          }
>>>>>>>
>>>>>>>          ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>>>>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>          for (i = 0; i < queue_pairs; i++) {
>>>>>>>              ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>>>                                           vdpa_device_fd, i, 2, true, opts->x_svq,
>>>>>>> -                                     iova_range, iova_tree);
>>>>>>> +                                     iova_range);
>>>>>>>              if (!ncs[i])
>>>>>>>                  goto err;
>>>>>>>          }
>>>>>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>          if (has_cvq) {
>>>>>>>              nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>>>                                       vdpa_device_fd, i, 1, false,
>>>>>>> -                                 opts->x_svq, iova_range, iova_tree);
>>>>>>> +                                 opts->x_svq, iova_range);
>>>>>>>              if (!nc)
>>>>>>>                  goto err;
>>>>>>>          }
>>>>>>>
>>>>>>> -    /* iova_tree ownership belongs to last NetClientState */
>>>>>>> -    g_steal_pointer(&iova_tree);
>>>>>>>          return 0;
>>>>>>>
>>>>>>>      err:
>>>>>>> @@ -849,7 +893,6 @@ err:
>>>>>>>              }
>>>>>>>          }
>>>>>>>
>>>>>>> -err_svq:
>>>>>>>          qemu_close(vdpa_device_fd);
>>>>>>>
>>>>>>>          return -1;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
@ 2023-02-17  7:38                 ` Si-Wei Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Si-Wei Liu @ 2023-02-17  7:38 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong



On 2/15/2023 11:35 PM, Eugenio Perez Martin wrote:
> On Thu, Feb 16, 2023 at 3:15 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
>>> On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
>>>>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
>>>>>>> Only create iova_tree if and when it is needed.
>>>>>>>
>>>>>>> The cleanup keeps being responsible of last VQ but this change allows it
>>>>>>> to merge both cleanup functions.
>>>>>>>
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>>>> ---
>>>>>>>      net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>>>>>      1 file changed, 71 insertions(+), 28 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>>>>>> index de5ed8ff22..a9e6c8f28e 100644
>>>>>>> --- a/net/vhost-vdpa.c
>>>>>>> +++ b/net/vhost-vdpa.c
>>>>>>> @@ -178,13 +178,9 @@ err_init:
>>>>>>>      static void vhost_vdpa_cleanup(NetClientState *nc)
>>>>>>>      {
>>>>>>>          VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
>>>>>>>
>>>>>>>          qemu_vfree(s->cvq_cmd_out_buffer);
>>>>>>>          qemu_vfree(s->status);
>>>>>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>>>> -    }
>>>>>>>          if (s->vhost_net) {
>>>>>>>              vhost_net_cleanup(s->vhost_net);
>>>>>>>              g_free(s->vhost_net);
>>>>>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>>>>>>>          return size;
>>>>>>>      }
>>>>>>>
>>>>>>> +/** From any vdpa net client, get the netclient of first queue pair */
>>>>>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>>>>>>> +{
>>>>>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
>>>>>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
>>>>>>> +
>>>>>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>>>>>>> +{
>>>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>>>> +                                           v->iova_range.last);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
>>>>>>> +{
>>>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
>>>>>>> +
>>>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>>>> +
>>>>>>> +    if (v->index == 0) {
>>>>>>> +        vhost_vdpa_net_data_start_first(s);
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
>>>>>>> +{
>>>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>>>>>>> +    struct vhost_dev *dev;
>>>>>>> +
>>>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>>>>>>> +
>>>>>>> +    dev = s->vhost_vdpa.dev;
>>>>>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>>>>>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>>      static NetClientInfo net_vhost_vdpa_info = {
>>>>>>>              .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>>>>>>>              .size = sizeof(VhostVDPAState),
>>>>>>>              .receive = vhost_vdpa_receive,
>>>>>>> +        .start = vhost_vdpa_net_data_start,
>>>>>>> +        .stop = vhost_vdpa_net_client_stop,
>>>>>>>              .cleanup = vhost_vdpa_cleanup,
>>>>>>>              .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
>>>>>>>              .has_ufo = vhost_vdpa_has_ufo,
>>>>>>> @@ -351,7 +401,7 @@ dma_map_err:
>>>>>>>
>>>>>>>      static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>>>>>>>      {
>>>>>>> -    VhostVDPAState *s;
>>>>>>> +    VhostVDPAState *s, *s0;
>>>>>>>          struct vhost_vdpa *v;
>>>>>>>          uint64_t backend_features;
>>>>>>>          int64_t cvq_group;
>>>>>>> @@ -425,6 +475,15 @@ out:
>>>>>>>              return 0;
>>>>>>>          }
>>>>>>>
>>>>>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
>>>>>>> +    if (s0->vhost_vdpa.iova_tree) {
>>>>>>> +        /* SVQ is already configured for all virtqueues */
>>>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
>>>>>>> +    } else {
>>>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>>>>>>> +                                           v->iova_range.last);
>>>>>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
>>>>>> should've allocated an iova tree on the first data vq. Is zero data vq
>>>>>> ever possible on net vhost-vdpa?
>>>>>>
>>>>> It's the case of the current qemu master when only CVQ is being
>>>>> shadowed. It's not that "there are no data vq": If that case were
>>>>> possible, CVQ vhost-vdpa state would be s0.
>>>>>
>>>>> The case is that since only CVQ vhost-vdpa is the one being migrated,
>>>>> only CVQ has an iova tree.
>>>> OK, so this corresponds to the case where live migration is not started
>>>> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
>>>> Thanks for explaining it!
>>>>
>>>>> With this series applied and with no migration running, the case is
>>>>> the same as before: only SVQ gets shadowed. When migration starts, all
>>>>> vqs are migrated, and share iova tree.
>>>> I wonder what is the reason to share the iova tree when migration
>>>> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
>>>>
>>>> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
>>>> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
>>>> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
>>>> collision I mentioned earlier:
>>>>
>>> There is no such change. This code only migrates devices with no CVQ,
>>> as they have their own difficulties.
>>>
>>> In the previous RFC there was no such change either. Since it's hard
>>> to modify passthrough devices IOVA tree, CVQ AS updates keep being
>>> VHOST_VDPA_NET_CVQ_ASID.
>> That's my understanding too, the current code doesn't support changing
>> AS once it is set, although uAPI doesn't prohibit it.
>>
>>> They both share the same IOVA tree though, just for simplicity.
>> It would be good to document this assumption somewhere in the code, it's
>> not easy to infer userspace doesn't have the same view as that in the
>> kernel in terms of the iova tree being used.
>>
>>>    If
>>> address space exhaustion is a problem we can make them independent,
>>> but this complicates the code a little bit.
>>>
>>>> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
>>>> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
>>>> avail_user_addr: 0x2000 log_guest_addr: 0x0
>>>> :
>>>> :
>>>> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
>>>> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
>>>> avail_user_addr: 0x17000 log_guest_addr: 0x0
>>>> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
>>>> perm: 0x1 type: 2
>>>> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
>>>> perm: 0x3 type: 2
>>>> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
>>>> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
>>>> avail_user_addr: 0x1b400 log_guest_addr: 0x0
>>>> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
>>>> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
>>>> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
>>>> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
>>>> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
>>>> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
>>>> 0x3 type: 2
>>>> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
>>>> errno=14 (Bad address)
>>>> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
>>>> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
>>>> failed, unable to continue
>>>>
>>> I'm not sure how you get to this. Maybe you were able to start the
>>> migration because the CVQ migration blocker was not effectively added?
>> It's something else, below line at the start of
>> vhost_vdpa_net_cvq_start() would override the shadow_data on the CVQ.
>>
>>       v->shadow_data = s->always_svq;
>>
>> Which leads to my previous question why shadow_data needs to apply to
>> the CVQ
>>
> Ok, I'm proposing some documentation here. I'll send a new patch
> adding it to the sources if you think it is complete.
It's fine, I don't intend to block on this. But what I really meant is 
that there is a bug in the line I pointed out earlier. shadow_data is 
already set by net_vhost_vdpa_init() at init time (for the x-svq=on 
case). For the x-svq=off case, vhost_vdpa_net_log_global_enable() sets 
shadow_data to true on the CVQ within the migration notifier, that's 
correct and expected; however, the subsequent vhost_net_start() function 
right after would call into vhost_vdpa_net_cvq_start(). The latter 
inadvertently sets the CVQ's shadow_data back to false, which defeats 
the purpose of using shadow_data to indicate translating iova on 
shadowed CVQ using the *shared* iova tree. You can say migration with 
CVQ is blocked anyway so this code path doesn't get exposed for now, but 
that somehow causes conflict and confusions for readers to understand 
what the code attempts to achieve. Maybe remove this line or move this 
line to vhost_vdpa_net_cvq_stop()?

> Shadow_data needs to apply to CVQ because memory_listener is
> registered against CVQ,
It's bound to the last virtqueue pair which is not necessarily a CVQ.
>   and memory listener needs to know if data vqs
> are passthrough or shadowed. We could apply a memory register to a
> different vhost_vdpa but then its lifecycle gets complicated.
The lifecycle can remain same but the code will be a lot messier for 
sure. :)

> ---
>
> For completion, the original discussion was [1].
>
>> and why the userspace iova is shared between data queues and CVQ.
> It's not shared unless the device does not support ASID. They only
> share the iova tree because iova tree itself is not used for tracking
> memory itself but only translations, so its lifecycle is easier. Each
> piece of memory's lifecycle is tracked differently:
> * Guest's memory is tracked by the memory listener itself, so we got
> all the regions at register / unregister and in its own updates.
> * SVQ vrings are tracked in vhost_vdpa->shadow_vqs[i].
> * CVQ shadow buffers are tracked in net VhostVDPAState.
> ---
>
> I'll send a new series adding the two pieces of doc if you think they
> are complete. Please let me know if you'd add or remove something.
No you don't have to. Just leave it as-is.

What I thought about making two iova trees independent was not just 
meant for translation but also keep sync with kernel's IOVA address 
space, so that it causes less fluctuations by sending down thinner iova 
update for the unmap and map cycle when switching mode. For now sharing 
the iova tree is fine. I'll see if there's other alternative to keep 
guest memory identity mapped 1:1 on the iova tree across the mode 
switch. Future work you don't have to worry about now.

Thanks,
-Siwei

>
> Note that this code is already on qemu master so this doc should not
> block this series, correct?
>
> Thanks!
>
> [1] https://mail.gnu.org/archive/html/qemu-devel/2022-11/msg02033.html
>
>> -Siwei
>>
>>
>>> Thanks!
>>>
>>>
>>>> Regards,
>>>> -Siwei
>>>>> Thanks!
>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>> +    }
>>>>>>> +
>>>>>>>          r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
>>>>>>>                                     vhost_vdpa_net_cvq_cmd_page_len(), false);
>>>>>>>          if (unlikely(r < 0)) {
>>>>>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>>>>>>>          if (s->vhost_vdpa.shadow_vqs_enabled) {
>>>>>>>              vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
>>>>>>>              vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
>>>>>>> -        if (!s->always_svq) {
>>>>>>> -            /*
>>>>>>> -             * If only the CVQ is shadowed we can delete this safely.
>>>>>>> -             * If all the VQs are shadows this will be needed by the time the
>>>>>>> -             * device is started again to register SVQ vrings and similar.
>>>>>>> -             */
>>>>>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
>>>>>>> -        }
>>>>>>>          }
>>>>>>> +
>>>>>>> +    vhost_vdpa_net_client_stop(nc);
>>>>>>>      }
>>>>>>>
>>>>>>>      static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
>>>>>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>>>                                             int nvqs,
>>>>>>>                                             bool is_datapath,
>>>>>>>                                             bool svq,
>>>>>>> -                                       struct vhost_vdpa_iova_range iova_range,
>>>>>>> -                                       VhostIOVATree *iova_tree)
>>>>>>> +                                       struct vhost_vdpa_iova_range iova_range)
>>>>>>>      {
>>>>>>>          NetClientState *nc = NULL;
>>>>>>>          VhostVDPAState *s;
>>>>>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>>>>>>>          s->vhost_vdpa.shadow_vqs_enabled = svq;
>>>>>>>          s->vhost_vdpa.iova_range = iova_range;
>>>>>>>          s->vhost_vdpa.shadow_data = svq;
>>>>>>> -    s->vhost_vdpa.iova_tree = iova_tree;
>>>>>>>          if (!is_datapath) {
>>>>>>>              s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
>>>>>>>                                                  vhost_vdpa_net_cvq_cmd_page_len());
>>>>>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>          uint64_t features;
>>>>>>>          int vdpa_device_fd;
>>>>>>>          g_autofree NetClientState **ncs = NULL;
>>>>>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
>>>>>>>          struct vhost_vdpa_iova_range iova_range;
>>>>>>>          NetClientState *nc;
>>>>>>>          int queue_pairs, r, i = 0, has_cvq = 0;
>>>>>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>              goto err;
>>>>>>>          }
>>>>>>>
>>>>>>> -    if (opts->x_svq) {
>>>>>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>>>> -            goto err_svq;
>>>>>>> -        }
>>>>>>> -
>>>>>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>>>>>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
>>>>>>> +        goto err;
>>>>>>>          }
>>>>>>>
>>>>>>>          ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>>>>>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>          for (i = 0; i < queue_pairs; i++) {
>>>>>>>              ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>>>                                           vdpa_device_fd, i, 2, true, opts->x_svq,
>>>>>>> -                                     iova_range, iova_tree);
>>>>>>> +                                     iova_range);
>>>>>>>              if (!ncs[i])
>>>>>>>                  goto err;
>>>>>>>          }
>>>>>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>>>>>          if (has_cvq) {
>>>>>>>              nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>>>>>                                       vdpa_device_fd, i, 1, false,
>>>>>>> -                                 opts->x_svq, iova_range, iova_tree);
>>>>>>> +                                 opts->x_svq, iova_range);
>>>>>>>              if (!nc)
>>>>>>>                  goto err;
>>>>>>>          }
>>>>>>>
>>>>>>> -    /* iova_tree ownership belongs to last NetClientState */
>>>>>>> -    g_steal_pointer(&iova_tree);
>>>>>>>          return 0;
>>>>>>>
>>>>>>>      err:
>>>>>>> @@ -849,7 +893,6 @@ err:
>>>>>>>              }
>>>>>>>          }
>>>>>>>
>>>>>>> -err_svq:
>>>>>>>          qemu_close(vdpa_device_fd);
>>>>>>>
>>>>>>>          return -1;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start
  2023-02-17  7:38                 ` Si-Wei Liu
  (?)
@ 2023-02-17 13:55                 ` Eugenio Perez Martin
  -1 siblings, 0 replies; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-17 13:55 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Jason Wang, Cindy Lu, alvaro.karsz,
	Zhu Lingshan, Lei Yang, Liuxiangdong, Shannon Nelson,
	Parav Pandit, Gautam Dawar, Eli Cohen, Stefan Hajnoczi,
	Laurent Vivier, longpeng2, virtualization, Stefano Garzarella

On Fri, Feb 17, 2023 at 8:39 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 2/15/2023 11:35 PM, Eugenio Perez Martin wrote:
> > On Thu, Feb 16, 2023 at 3:15 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:
> >>> On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>
> >>>> On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:
> >>>>> On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>>> On 2/8/2023 1:42 AM, Eugenio Pérez wrote:
> >>>>>>> Only create iova_tree if and when it is needed.
> >>>>>>>
> >>>>>>> The cleanup keeps being responsible of last VQ but this change allows it
> >>>>>>> to merge both cleanup functions.
> >>>>>>>
> >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>>>>>> ---
> >>>>>>>      net/vhost-vdpa.c | 99 ++++++++++++++++++++++++++++++++++--------------
> >>>>>>>      1 file changed, 71 insertions(+), 28 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >>>>>>> index de5ed8ff22..a9e6c8f28e 100644
> >>>>>>> --- a/net/vhost-vdpa.c
> >>>>>>> +++ b/net/vhost-vdpa.c
> >>>>>>> @@ -178,13 +178,9 @@ err_init:
> >>>>>>>      static void vhost_vdpa_cleanup(NetClientState *nc)
> >>>>>>>      {
> >>>>>>>          VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>>>>> -    struct vhost_dev *dev = &s->vhost_net->dev;
> >>>>>>>
> >>>>>>>          qemu_vfree(s->cvq_cmd_out_buffer);
> >>>>>>>          qemu_vfree(s->status);
> >>>>>>> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>>>>>> -        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>>>>>> -    }
> >>>>>>>          if (s->vhost_net) {
> >>>>>>>              vhost_net_cleanup(s->vhost_net);
> >>>>>>>              g_free(s->vhost_net);
> >>>>>>> @@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
> >>>>>>>          return size;
> >>>>>>>      }
> >>>>>>>
> >>>>>>> +/** From any vdpa net client, get the netclient of first queue pair */
> >>>>>>> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >>>>>>> +{
> >>>>>>> +    NICState *nic = qemu_get_nic(s->nc.peer);
> >>>>>>> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> >>>>>>> +
> >>>>>>> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >>>>>>> +{
> >>>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>>>>>> +
> >>>>>>> +    if (v->shadow_vqs_enabled) {
> >>>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>>>>>> +                                           v->iova_range.last);
> >>>>>>> +    }
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static int vhost_vdpa_net_data_start(NetClientState *nc)
> >>>>>>> +{
> >>>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>>>>> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> >>>>>>> +
> >>>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>>>>> +
> >>>>>>> +    if (v->index == 0) {
> >>>>>>> +        vhost_vdpa_net_data_start_first(s);
> >>>>>>> +        return 0;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    if (v->shadow_vqs_enabled) {
> >>>>>>> +        VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    return 0;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >>>>>>> +{
> >>>>>>> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >>>>>>> +    struct vhost_dev *dev;
> >>>>>>> +
> >>>>>>> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >>>>>>> +
> >>>>>>> +    dev = s->vhost_vdpa.dev;
> >>>>>>> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >>>>>>> +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>>>>>> +    }
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>      static NetClientInfo net_vhost_vdpa_info = {
> >>>>>>>              .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> >>>>>>>              .size = sizeof(VhostVDPAState),
> >>>>>>>              .receive = vhost_vdpa_receive,
> >>>>>>> +        .start = vhost_vdpa_net_data_start,
> >>>>>>> +        .stop = vhost_vdpa_net_client_stop,
> >>>>>>>              .cleanup = vhost_vdpa_cleanup,
> >>>>>>>              .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> >>>>>>>              .has_ufo = vhost_vdpa_has_ufo,
> >>>>>>> @@ -351,7 +401,7 @@ dma_map_err:
> >>>>>>>
> >>>>>>>      static int vhost_vdpa_net_cvq_start(NetClientState *nc)
> >>>>>>>      {
> >>>>>>> -    VhostVDPAState *s;
> >>>>>>> +    VhostVDPAState *s, *s0;
> >>>>>>>          struct vhost_vdpa *v;
> >>>>>>>          uint64_t backend_features;
> >>>>>>>          int64_t cvq_group;
> >>>>>>> @@ -425,6 +475,15 @@ out:
> >>>>>>>              return 0;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> >>>>>>> +    if (s0->vhost_vdpa.iova_tree) {
> >>>>>>> +        /* SVQ is already configured for all virtqueues */
> >>>>>>> +        v->iova_tree = s0->vhost_vdpa.iova_tree;
> >>>>>>> +    } else {
> >>>>>>> +        v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >>>>>>> +                                           v->iova_range.last);
> >>>>>> I wonder how this case could happen, vhost_vdpa_net_data_start_first()
> >>>>>> should've allocated an iova tree on the first data vq. Is zero data vq
> >>>>>> ever possible on net vhost-vdpa?
> >>>>>>
> >>>>> It's the case of the current qemu master when only CVQ is being
> >>>>> shadowed. It's not that "there are no data vq": If that case were
> >>>>> possible, CVQ vhost-vdpa state would be s0.
> >>>>>
> >>>>> The case is that since only CVQ vhost-vdpa is the one being migrated,
> >>>>> only CVQ has an iova tree.
> >>>> OK, so this corresponds to the case where live migration is not started
> >>>> and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
> >>>> Thanks for explaining it!
> >>>>
> >>>>> With this series applied and with no migration running, the case is
> >>>>> the same as before: only SVQ gets shadowed. When migration starts, all
> >>>>> vqs are migrated, and share iova tree.
> >>>> I wonder what is the reason to share the iova tree when migration
> >>>> starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?
> >>>>
> >>>> Actually there's discrepancy in vhost_vdpa_net_log_global_enable(), I
> >>>> don't see explicit code to switch from VHOST_VDPA_NET_CVQ_ASID to
> >>>> VHOST_VDPA_GUEST_PA_ASID for the CVQ. This is the address space I
> >>>> collision I mentioned earlier:
> >>>>
> >>> There is no such change. This code only migrates devices with no CVQ,
> >>> as they have their own difficulties.
> >>>
> >>> In the previous RFC there was no such change either. Since it's hard
> >>> to modify passthrough devices IOVA tree, CVQ AS updates keep being
> >>> VHOST_VDPA_NET_CVQ_ASID.
> >> That's my understanding too, the current code doesn't support changing
> >> AS once it is set, although uAPI doesn't prohibit it.
> >>
> >>> They both share the same IOVA tree though, just for simplicity.
> >> It would be good to document this assumption somewhere in the code, it's
> >> not easy to infer userspace doesn't have the same view as that in the
> >> kernel in terms of the iova tree being used.
> >>
> >>>    If
> >>> address space exhaustion is a problem we can make them independent,
> >>> but this complicates the code a little bit.
> >>>
> >>>> 9585@1676093788.259201:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
> >>>> msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 uaddr: 0x55a5a7ff3000
> >>>> perm: 0x1 type: 2
> >>>> 9585@1676093788.279923:vhost_vdpa_dma_map vdpa:0x7ff13088a190 fd: 16
> >>>> msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 uaddr: 0x55a5a7ff6000
> >>>> perm: 0x3 type: 2
> >>>> 9585@1676093788.290529:vhost_vdpa_set_vring_addr dev: 0x55a5a77cec20
> >>>> index: 0 flags: 0x0 desc_user_addr: 0x1000 used_user_addr: 0x3000
> >>>> avail_user_addr: 0x2000 log_guest_addr: 0x0
> >>>> :
> >>>> :
> >>>> 9585@1676093788.543567:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
> >>>> msg_type: 2 asid: 0 iova: 0x16000 size: 0x2000 uaddr: 0x55a5a7959000
> >>>> perm: 0x1 type: 2
> >>>> 9585@1676093788.576923:vhost_vdpa_dma_map vdpa:0x7ff1302b6190 fd: 16
> >>>> msg_type: 2 asid: 0 iova: 0x18000 size: 0x1000 uaddr: 0x55a5a795c000
> >>>> perm: 0x3 type: 2
> >>>> 9585@1676093788.593881:vhost_vdpa_set_vring_addr dev: 0x55a5a7580930
> >>>> index: 7 flags: 0x0 desc_user_addr: 0x16000 used_user_addr: 0x18000
> >>>> avail_user_addr: 0x17000 log_guest_addr: 0x0
> >>>> 9585@1676093788.593904:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >>>> msg_type: 2 asid: 1 iova: 0x19000 size: 0x1000 uaddr: 0x55a5a77f8000
> >>>> perm: 0x1 type: 2
> >>>> 9585@1676093788.606448:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >>>> msg_type: 2 asid: 1 iova: 0x1a000 size: 0x1000 uaddr: 0x55a5a77fa000
> >>>> perm: 0x3 type: 2
> >>>> 9585@1676093788.616253:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >>>> msg_type: 2 asid: 1 iova: 0x1b000 size: 0x1000 uaddr: 0x55a5a795f000
> >>>> perm: 0x1 type: 2
> >>>> 9585@1676093788.625956:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >>>> msg_type: 2 asid: 1 iova: 0x1c000 size: 0x1000 uaddr: 0x55a5a7f4e000
> >>>> perm: 0x3 type: 2
> >>>> 9585@1676093788.635655:vhost_vdpa_set_vring_addr dev: 0x55a5a7580ec0
> >>>> index: 8 flags: 0x0 desc_user_addr: 0x1b000 used_user_addr: 0x1c000
> >>>> avail_user_addr: 0x1b400 log_guest_addr: 0x0
> >>>> 9585@1676093788.635667:vhost_vdpa_listener_region_add vdpa:
> >>>> 0x7ff13026d190 iova 0x0 llend 0xa0000 vaddr: 0x7fef1fe00000 read-only: 0
> >>>> 9585@1676093788.635670:vhost_vdpa_listener_begin_batch
> >>>> vdpa:0x7ff13026d190 fd: 16 msg_type: 2 type: 5
> >>>> 9585@1676093788.635677:vhost_vdpa_dma_map vdpa:0x7ff13026d190 fd: 16
> >>>> msg_type: 2 asid: 0 iova: 0x0 size: 0xa0000 uaddr: 0x7fef1fe00000 perm:
> >>>> 0x3 type: 2
> >>>> 2023-02-11T05:36:28.635686Z qemu-system-x86_64: failed to write, fd=16,
> >>>> errno=14 (Bad address)
> >>>> 2023-02-11T05:36:28.635721Z qemu-system-x86_64: vhost vdpa map fail!
> >>>> 2023-02-11T05:36:28.635744Z qemu-system-x86_64: vhost-vdpa: DMA mapping
> >>>> failed, unable to continue
> >>>>
> >>> I'm not sure how you get to this. Maybe you were able to start the
> >>> migration because the CVQ migration blocker was not effectively added?
> >> It's something else, below line at the start of
> >> vhost_vdpa_net_cvq_start() would override the shadow_data on the CVQ.
> >>
> >>       v->shadow_data = s->always_svq;
> >>
> >> Which leads to my previous question why shadow_data needs to apply to
> >> the CVQ
> >>
> > Ok, I'm proposing some documentation here. I'll send a new patch
> > adding it to the sources if you think it is complete.
> It's fine, I don't intend to block on this. But what I really meant is
> that there is a bug in the line I pointed out earlier. shadow_data is
> already set by net_vhost_vdpa_init() at init time (for the x-svq=on
> case). For the x-svq=off case, vhost_vdpa_net_log_global_enable() sets
> shadow_data to true on the CVQ within the migration notifier, that's
> correct and expected; however, the subsequent vhost_net_start() function
> right after would call into vhost_vdpa_net_cvq_start(). The latter
> inadvertently sets the CVQ's shadow_data back to false, which defeats
> the purpose of using shadow_data to indicate translating iova on
> shadowed CVQ using the *shared* iova tree. You can say migration with
> CVQ is blocked anyway so this code path doesn't get exposed for now, but
> that somehow causes conflict and confusions for readers to understand
> what the code attempts to achieve. Maybe remove this line or move this
> line to vhost_vdpa_net_cvq_stop()?
>

Ok now I get you. Thank you very much for the catches and
explanations! I'll remove that and those CVQ leftovers for the next
version.

> > Shadow_data needs to apply to CVQ because memory_listener is
> > registered against CVQ,
> It's bound to the last virtqueue pair which is not necessarily a CVQ.
> >   and memory listener needs to know if data vqs
> > are passthrough or shadowed. We could apply a memory register to a
> > different vhost_vdpa but then its lifecycle gets complicated.
> The lifecycle can remain same but the code will be a lot messier for
> sure. :)
>
> > ---
> >
> > For completion, the original discussion was [1].
> >
> >> and why the userspace iova is shared between data queues and CVQ.
> > It's not shared unless the device does not support ASID. They only
> > share the iova tree because iova tree itself is not used for tracking
> > memory itself but only translations, so its lifecycle is easier. Each
> > piece of memory's lifecycle is tracked differently:
> > * Guest's memory is tracked by the memory listener itself, so we got
> > all the regions at register / unregister and in its own updates.
> > * SVQ vrings are tracked in vhost_vdpa->shadow_vqs[i].
> > * CVQ shadow buffers are tracked in net VhostVDPAState.
> > ---
> >
> > I'll send a new series adding the two pieces of doc if you think they
> > are complete. Please let me know if you'd add or remove something.
> No you don't have to. Just leave it as-is.
>
> What I thought about making two iova trees independent was not just
> meant for translation but also keep sync with kernel's IOVA address
> space, so that it causes less fluctuations by sending down thinner iova
> update for the unmap and map cycle when switching mode. For now sharing
> the iova tree is fine. I'll see if there's other alternative to keep
> guest memory identity mapped 1:1 on the iova tree across the mode
> switch. Future work you don't have to worry about now.
>

Got it.

Thanks!

> Thanks,
> -Siwei
>
> >
> > Note that this code is already on qemu master so this doc should not
> > block this series, correct?
> >
> > Thanks!
> >
> > [1] https://mail.gnu.org/archive/html/qemu-devel/2022-11/msg02033.html
> >
> >> -Siwei
> >>
> >>
> >>> Thanks!
> >>>
> >>>
> >>>> Regards,
> >>>> -Siwei
> >>>>> Thanks!
> >>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>>          r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
> >>>>>>>                                     vhost_vdpa_net_cvq_cmd_page_len(), false);
> >>>>>>>          if (unlikely(r < 0)) {
> >>>>>>> @@ -449,15 +508,9 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
> >>>>>>>          if (s->vhost_vdpa.shadow_vqs_enabled) {
> >>>>>>>              vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> >>>>>>>              vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> >>>>>>> -        if (!s->always_svq) {
> >>>>>>> -            /*
> >>>>>>> -             * If only the CVQ is shadowed we can delete this safely.
> >>>>>>> -             * If all the VQs are shadows this will be needed by the time the
> >>>>>>> -             * device is started again to register SVQ vrings and similar.
> >>>>>>> -             */
> >>>>>>> -            g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> >>>>>>> -        }
> >>>>>>>          }
> >>>>>>> +
> >>>>>>> +    vhost_vdpa_net_client_stop(nc);
> >>>>>>>      }
> >>>>>>>
> >>>>>>>      static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len,
> >>>>>>> @@ -667,8 +720,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>>>>>                                             int nvqs,
> >>>>>>>                                             bool is_datapath,
> >>>>>>>                                             bool svq,
> >>>>>>> -                                       struct vhost_vdpa_iova_range iova_range,
> >>>>>>> -                                       VhostIOVATree *iova_tree)
> >>>>>>> +                                       struct vhost_vdpa_iova_range iova_range)
> >>>>>>>      {
> >>>>>>>          NetClientState *nc = NULL;
> >>>>>>>          VhostVDPAState *s;
> >>>>>>> @@ -690,7 +742,6 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >>>>>>>          s->vhost_vdpa.shadow_vqs_enabled = svq;
> >>>>>>>          s->vhost_vdpa.iova_range = iova_range;
> >>>>>>>          s->vhost_vdpa.shadow_data = svq;
> >>>>>>> -    s->vhost_vdpa.iova_tree = iova_tree;
> >>>>>>>          if (!is_datapath) {
> >>>>>>>              s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> >>>>>>>                                                  vhost_vdpa_net_cvq_cmd_page_len());
> >>>>>>> @@ -760,7 +811,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>>>          uint64_t features;
> >>>>>>>          int vdpa_device_fd;
> >>>>>>>          g_autofree NetClientState **ncs = NULL;
> >>>>>>> -    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >>>>>>>          struct vhost_vdpa_iova_range iova_range;
> >>>>>>>          NetClientState *nc;
> >>>>>>>          int queue_pairs, r, i = 0, has_cvq = 0;
> >>>>>>> @@ -812,12 +862,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>>>              goto err;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> -    if (opts->x_svq) {
> >>>>>>> -        if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>>>>>> -            goto err_svq;
> >>>>>>> -        }
> >>>>>>> -
> >>>>>>> -        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> >>>>>>> +    if (opts->x_svq && !vhost_vdpa_net_valid_svq_features(features, errp)) {
> >>>>>>> +        goto err;
> >>>>>>>          }
> >>>>>>>
> >>>>>>>          ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >>>>>>> @@ -825,7 +871,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>>>          for (i = 0; i < queue_pairs; i++) {
> >>>>>>>              ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>>>>>                                           vdpa_device_fd, i, 2, true, opts->x_svq,
> >>>>>>> -                                     iova_range, iova_tree);
> >>>>>>> +                                     iova_range);
> >>>>>>>              if (!ncs[i])
> >>>>>>>                  goto err;
> >>>>>>>          }
> >>>>>>> @@ -833,13 +879,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >>>>>>>          if (has_cvq) {
> >>>>>>>              nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >>>>>>>                                       vdpa_device_fd, i, 1, false,
> >>>>>>> -                                 opts->x_svq, iova_range, iova_tree);
> >>>>>>> +                                 opts->x_svq, iova_range);
> >>>>>>>              if (!nc)
> >>>>>>>                  goto err;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> -    /* iova_tree ownership belongs to last NetClientState */
> >>>>>>> -    g_steal_pointer(&iova_tree);
> >>>>>>>          return 0;
> >>>>>>>
> >>>>>>>      err:
> >>>>>>> @@ -849,7 +893,6 @@ err:
> >>>>>>>              }
> >>>>>>>          }
> >>>>>>>
> >>>>>>> -err_svq:
> >>>>>>>          qemu_close(vdpa_device_fd);
> >>>>>>>
> >>>>>>>          return -1;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend
  2023-02-08  9:42 ` [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend Eugenio Pérez
@ 2023-02-21  5:27     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:27 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
> state can be migrated to other devices.  However, this is unreliable in
> vdpa, since we didn't signal the device to suspend the queues, making
> the value fetched useless.
>
> Suspend the device if possible before fetching first and subsequent
> vring bases.
>
> Moreover, vdpa totally reset and wipes the device at the last device
> before fetch its vrings base, making that operation useless in the last
> device. This will be fixed in later patches of this series.


It would be better not introduce a bug first and fix it in the following 
patch.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 19 +++++++++++++++++++
>   hw/virtio/trace-events |  1 +
>   2 files changed, 20 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 2e79fbe4b2..cbbe92ffe8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1108,6 +1108,24 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>       }
>   }
>   
> +static void vhost_vdpa_suspend(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int r;
> +
> +    if (!vhost_vdpa_first_dev(dev) ||


Any reason we need to use vhost_vdpa_first_dev() instead of replacing the

if (started) {
} else {
     vhost_vdpa_reset_device(dev);
     ....
}


We check

if (dev->vq_index + dev->nvqs != dev->vq_index_end) in 
vhost_vdpa_dev_start() but vhost_vdpa_first_dev() inside 
vhost_vdpa_suspend(). This will result code that is hard to maintain.

Thanks


> +        !(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> +        return;
> +    }
> +
> +    trace_vhost_vdpa_suspend(dev);
> +    r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> +    if (unlikely(r)) {
> +        error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> +        /* Not aborting since we're called from stop context */
> +    }
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1122,6 +1140,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        vhost_vdpa_suspend(dev);
>           vhost_vdpa_svqs_stop(dev);
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index a87c5f39a2..8f8d05cf9b 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> +vhost_vdpa_suspend(void *dev) "dev: %p"
>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend
@ 2023-02-21  5:27     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:27 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
> state can be migrated to other devices.  However, this is unreliable in
> vdpa, since we didn't signal the device to suspend the queues, making
> the value fetched useless.
>
> Suspend the device if possible before fetching first and subsequent
> vring bases.
>
> Moreover, vdpa totally reset and wipes the device at the last device
> before fetch its vrings base, making that operation useless in the last
> device. This will be fixed in later patches of this series.


It would be better not introduce a bug first and fix it in the following 
patch.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 19 +++++++++++++++++++
>   hw/virtio/trace-events |  1 +
>   2 files changed, 20 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 2e79fbe4b2..cbbe92ffe8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1108,6 +1108,24 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>       }
>   }
>   
> +static void vhost_vdpa_suspend(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int r;
> +
> +    if (!vhost_vdpa_first_dev(dev) ||


Any reason we need to use vhost_vdpa_first_dev() instead of replacing the

if (started) {
} else {
     vhost_vdpa_reset_device(dev);
     ....
}


We check

if (dev->vq_index + dev->nvqs != dev->vq_index_end) in 
vhost_vdpa_dev_start() but vhost_vdpa_first_dev() inside 
vhost_vdpa_suspend(). This will result code that is hard to maintain.

Thanks


> +        !(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> +        return;
> +    }
> +
> +    trace_vhost_vdpa_suspend(dev);
> +    r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> +    if (unlikely(r)) {
> +        error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> +        /* Not aborting since we're called from stop context */
> +    }
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1122,6 +1140,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        vhost_vdpa_suspend(dev);
>           vhost_vdpa_svqs_stop(dev);
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index a87c5f39a2..8f8d05cf9b 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> +vhost_vdpa_suspend(void *dev) "dev: %p"
>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend
  2023-02-21  5:27     ` Jason Wang
@ 2023-02-21  5:33       ` Jason Wang
  -1 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:33 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/21 13:27, Jason Wang 写道:
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
>> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
>> state can be migrated to other devices.  However, this is unreliable in
>> vdpa, since we didn't signal the device to suspend the queues, making
>> the value fetched useless.
>>
>> Suspend the device if possible before fetching first and subsequent
>> vring bases.
>>
>> Moreover, vdpa totally reset and wipes the device at the last device
>> before fetch its vrings base, making that operation useless in the last
>> device. This will be fixed in later patches of this series.
>
>
> It would be better not introduce a bug first and fix it in the 
> following patch.
>
>
>>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   hw/virtio/vhost-vdpa.c | 19 +++++++++++++++++++
>>   hw/virtio/trace-events |  1 +
>>   2 files changed, 20 insertions(+)
>>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index 2e79fbe4b2..cbbe92ffe8 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -1108,6 +1108,24 @@ static void vhost_vdpa_svqs_stop(struct 
>> vhost_dev *dev)
>>       }
>>   }
>>   +static void vhost_vdpa_suspend(struct vhost_dev *dev)
>> +{
>> +    struct vhost_vdpa *v = dev->opaque;
>> +    int r;
>> +
>> +    if (!vhost_vdpa_first_dev(dev) ||
>
>
> Any reason we need to use vhost_vdpa_first_dev() instead of replacing the
>
> if (started) {
> } else {
>     vhost_vdpa_reset_device(dev);
>     ....
> }


Ok, I think I kind of understand, so I think we need re-order the 
patches, at least patch 4 should come before this patch?

Thanks


>
>
> We check
>
> if (dev->vq_index + dev->nvqs != dev->vq_index_end) in 
> vhost_vdpa_dev_start() but vhost_vdpa_first_dev() inside 
> vhost_vdpa_suspend(). This will result code that is hard to maintain.
>
> Thanks
>
>
>> +        !(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
>> +        return;
>> +    }
>> +
>> +    trace_vhost_vdpa_suspend(dev);
>> +    r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
>> +    if (unlikely(r)) {
>> +        error_report("Cannot suspend: %s(%d)", g_strerror(errno), 
>> errno);
>> +        /* Not aborting since we're called from stop context */
>> +    }
>> +}
>> +
>>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>   {
>>       struct vhost_vdpa *v = dev->opaque;
>> @@ -1122,6 +1140,7 @@ static int vhost_vdpa_dev_start(struct 
>> vhost_dev *dev, bool started)
>>           }
>>           vhost_vdpa_set_vring_ready(dev);
>>       } else {
>> +        vhost_vdpa_suspend(dev);
>>           vhost_vdpa_svqs_stop(dev);
>>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>       }
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index a87c5f39a2..8f8d05cf9b 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, 
>> uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 
>> 0x%"PRIx32
>>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) 
>> "dev: %p config: %p config_len: %"PRIu32
>> +vhost_vdpa_suspend(void *dev) "dev: %p"
>>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long 
>> long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" 
>> size: %llu refcnt: %d fd: %d log: %p"
>>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned 
>> int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t 
>> avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 
>> 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" 
>> avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend
@ 2023-02-21  5:33       ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:33 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/21 13:27, Jason Wang 写道:
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
>> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
>> state can be migrated to other devices.  However, this is unreliable in
>> vdpa, since we didn't signal the device to suspend the queues, making
>> the value fetched useless.
>>
>> Suspend the device if possible before fetching first and subsequent
>> vring bases.
>>
>> Moreover, vdpa totally reset and wipes the device at the last device
>> before fetch its vrings base, making that operation useless in the last
>> device. This will be fixed in later patches of this series.
>
>
> It would be better not introduce a bug first and fix it in the 
> following patch.
>
>
>>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   hw/virtio/vhost-vdpa.c | 19 +++++++++++++++++++
>>   hw/virtio/trace-events |  1 +
>>   2 files changed, 20 insertions(+)
>>
>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>> index 2e79fbe4b2..cbbe92ffe8 100644
>> --- a/hw/virtio/vhost-vdpa.c
>> +++ b/hw/virtio/vhost-vdpa.c
>> @@ -1108,6 +1108,24 @@ static void vhost_vdpa_svqs_stop(struct 
>> vhost_dev *dev)
>>       }
>>   }
>>   +static void vhost_vdpa_suspend(struct vhost_dev *dev)
>> +{
>> +    struct vhost_vdpa *v = dev->opaque;
>> +    int r;
>> +
>> +    if (!vhost_vdpa_first_dev(dev) ||
>
>
> Any reason we need to use vhost_vdpa_first_dev() instead of replacing the
>
> if (started) {
> } else {
>     vhost_vdpa_reset_device(dev);
>     ....
> }


Ok, I think I kind of understand, so I think we need re-order the 
patches, at least patch 4 should come before this patch?

Thanks


>
>
> We check
>
> if (dev->vq_index + dev->nvqs != dev->vq_index_end) in 
> vhost_vdpa_dev_start() but vhost_vdpa_first_dev() inside 
> vhost_vdpa_suspend(). This will result code that is hard to maintain.
>
> Thanks
>
>
>> +        !(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
>> +        return;
>> +    }
>> +
>> +    trace_vhost_vdpa_suspend(dev);
>> +    r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
>> +    if (unlikely(r)) {
>> +        error_report("Cannot suspend: %s(%d)", g_strerror(errno), 
>> errno);
>> +        /* Not aborting since we're called from stop context */
>> +    }
>> +}
>> +
>>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>   {
>>       struct vhost_vdpa *v = dev->opaque;
>> @@ -1122,6 +1140,7 @@ static int vhost_vdpa_dev_start(struct 
>> vhost_dev *dev, bool started)
>>           }
>>           vhost_vdpa_set_vring_ready(dev);
>>       } else {
>> +        vhost_vdpa_suspend(dev);
>>           vhost_vdpa_svqs_stop(dev);
>>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>       }
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index a87c5f39a2..8f8d05cf9b 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
>>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, 
>> uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 
>> 0x%"PRIx32
>>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) 
>> "dev: %p config: %p config_len: %"PRIu32
>> +vhost_vdpa_suspend(void *dev) "dev: %p"
>>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long 
>> long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" 
>> size: %llu refcnt: %d fd: %d log: %p"
>>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned 
>> int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t 
>> avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 
>> 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" 
>> avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/13] vdpa: move vhost reset after get vring base
  2023-02-08  9:42 ` [PATCH v2 04/13] vdpa: move vhost reset after get vring base Eugenio Pérez
@ 2023-02-21  5:36     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:36 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> The function vhost.c:vhost_dev_stop calls vhost operation
> vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> the device, making the fetching of the vring base (virtqueue state) totally
> useless.
>
> The kernel backend does not use vhost_dev_start vhost op callback, but
> vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> is desirable, but it can be added on top.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-backend.h |  4 ++++
>   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
>   hw/virtio/vhost.c                 |  3 +++
>   3 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index c5ab49051e..ec3fbae58d 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
>   
>   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
>                                          int fd);
> +
> +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> +
>   typedef struct VhostOps {
>       VhostBackendType backend_type;
>       vhost_backend_init vhost_backend_init;
> @@ -177,6 +180,7 @@ typedef struct VhostOps {
>       vhost_get_device_id_op vhost_get_device_id;
>       vhost_force_iommu_op vhost_force_iommu;
>       vhost_set_config_call_op vhost_set_config_call;
> +    vhost_reset_status_op vhost_reset_status;
>   } VhostOps;
>   
>   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index cbbe92ffe8..26e38a6aab 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       if (started) {
>           memory_listener_register(&v->listener, &address_space_memory);
>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> +    }
>   
> -        return 0;
> +    return 0;
> +}
> +
> +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +        return;
>       }
> +
> +    vhost_vdpa_reset_device(dev);
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                                VIRTIO_CONFIG_S_DRIVER);
> +    memory_listener_unregister(&v->listener);
>   }
>   
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> @@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
>           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
>           .vhost_force_iommu = vhost_vdpa_force_iommu,
>           .vhost_set_config_call = vhost_vdpa_set_config_call,
> +        .vhost_reset_status = vhost_vdpa_reset_status,
>   };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index eb8c4c378c..a266396576 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
>                                hdev->vqs + i,
>                                hdev->vq_index + i);
>       }
> +    if (hdev->vhost_ops->vhost_reset_status) {
> +        hdev->vhost_ops->vhost_reset_status(hdev);
> +    }


This looks racy, if we don't suspend/reset the device, device can move 
last_avail_idx even after get_vring_base()?

Instead of doing things like this, should we fallback to 
virtio_queue_restore_last_avail_idx() in this case?

Thanks


>   
>       if (vhost_dev_has_iommu(hdev)) {
>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/13] vdpa: move vhost reset after get vring base
@ 2023-02-21  5:36     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:36 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> The function vhost.c:vhost_dev_stop calls vhost operation
> vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> the device, making the fetching of the vring base (virtqueue state) totally
> useless.
>
> The kernel backend does not use vhost_dev_start vhost op callback, but
> vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> is desirable, but it can be added on top.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-backend.h |  4 ++++
>   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
>   hw/virtio/vhost.c                 |  3 +++
>   3 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index c5ab49051e..ec3fbae58d 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
>   
>   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
>                                          int fd);
> +
> +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> +
>   typedef struct VhostOps {
>       VhostBackendType backend_type;
>       vhost_backend_init vhost_backend_init;
> @@ -177,6 +180,7 @@ typedef struct VhostOps {
>       vhost_get_device_id_op vhost_get_device_id;
>       vhost_force_iommu_op vhost_force_iommu;
>       vhost_set_config_call_op vhost_set_config_call;
> +    vhost_reset_status_op vhost_reset_status;
>   } VhostOps;
>   
>   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index cbbe92ffe8..26e38a6aab 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       if (started) {
>           memory_listener_register(&v->listener, &address_space_memory);
>           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> +    }
>   
> -        return 0;
> +    return 0;
> +}
> +
> +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +        return;
>       }
> +
> +    vhost_vdpa_reset_device(dev);
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                                VIRTIO_CONFIG_S_DRIVER);
> +    memory_listener_unregister(&v->listener);
>   }
>   
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> @@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
>           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
>           .vhost_force_iommu = vhost_vdpa_force_iommu,
>           .vhost_set_config_call = vhost_vdpa_set_config_call,
> +        .vhost_reset_status = vhost_vdpa_reset_status,
>   };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index eb8c4c378c..a266396576 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
>                                hdev->vqs + i,
>                                hdev->vq_index + i);
>       }
> +    if (hdev->vhost_ops->vhost_reset_status) {
> +        hdev->vhost_ops->vhost_reset_status(hdev);
> +    }


This looks racy, if we don't suspend/reset the device, device can move 
last_avail_idx even after get_vring_base()?

Instead of doing things like this, should we fallback to 
virtio_queue_restore_last_avail_idx() in this case?

Thanks


>   
>       if (vhost_dev_has_iommu(hdev)) {
>           if (hdev->vhost_ops->vhost_set_iotlb_callback) {



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 05/13] vdpa: rewind at get_base, not set_base
  2023-02-08  9:42 ` [PATCH v2 05/13] vdpa: rewind at get_base, not set_base Eugenio Pérez
@ 2023-02-21  5:40     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:40 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> At this moment it is only possible to migrate to a vdpa device running
> with x-svq=on. As a protective measure, the rewind of the inflight
> descriptors was done at the destination. That way if the source sent a
> virtqueue with inuse descriptors they are always discarded.
>
> Since this series allows to migrate also to passthrough devices with no
> SVQ, the right thing to do is to rewind at the source so the base of
> vrings are correct.
>
> Support for inflight descriptors may be added in the future.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   hw/virtio/vhost-vdpa.c | 24 +++++++++++++-----------
>   1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 26e38a6aab..d99db0bd03 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1211,18 +1211,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> -    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>   
> -    /*
> -     * vhost-vdpa devices does not support in-flight requests. Set all of them
> -     * as available.
> -     *
> -     * TODO: This is ok for networking, but other kinds of devices might
> -     * have problems with these retransmissions.
> -     */
> -    while (virtqueue_rewind(vq, 1)) {
> -        continue;
> -    }
>       if (v->shadow_vqs_enabled) {
>           /*
>            * Device vring base was set at device start. SVQ base is handled by
> @@ -1241,6 +1230,19 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>       int ret;
>   
>       if (v->shadow_vqs_enabled) {
> +        VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> +
> +        /*
> +         * vhost-vdpa devices does not support in-flight requests. Set all of
> +         * them as available.
> +         *
> +         * TODO: This is ok for networking, but other kinds of devices might
> +         * have problems with these retransmissions.
> +         */
> +        while (virtqueue_rewind(vq, 1)) {
> +            continue;
> +        }
> +
>           ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
>           return 0;
>       }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 05/13] vdpa: rewind at get_base, not set_base
@ 2023-02-21  5:40     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-21  5:40 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> At this moment it is only possible to migrate to a vdpa device running
> with x-svq=on. As a protective measure, the rewind of the inflight
> descriptors was done at the destination. That way if the source sent a
> virtqueue with inuse descriptors they are always discarded.
>
> Since this series allows to migrate also to passthrough devices with no
> SVQ, the right thing to do is to rewind at the source so the base of
> vrings are correct.
>
> Support for inflight descriptors may be added in the future.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   hw/virtio/vhost-vdpa.c | 24 +++++++++++++-----------
>   1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 26e38a6aab..d99db0bd03 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1211,18 +1211,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> -    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>   
> -    /*
> -     * vhost-vdpa devices does not support in-flight requests. Set all of them
> -     * as available.
> -     *
> -     * TODO: This is ok for networking, but other kinds of devices might
> -     * have problems with these retransmissions.
> -     */
> -    while (virtqueue_rewind(vq, 1)) {
> -        continue;
> -    }
>       if (v->shadow_vqs_enabled) {
>           /*
>            * Device vring base was set at device start. SVQ base is handled by
> @@ -1241,6 +1230,19 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>       int ret;
>   
>       if (v->shadow_vqs_enabled) {
> +        VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
> +
> +        /*
> +         * vhost-vdpa devices does not support in-flight requests. Set all of
> +         * them as available.
> +         *
> +         * TODO: This is ok for networking, but other kinds of devices might
> +         * have problems with these retransmissions.
> +         */
> +        while (virtqueue_rewind(vq, 1)) {
> +            continue;
> +        }
> +
>           ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
>           return 0;
>       }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend
  2023-02-21  5:33       ` Jason Wang
  (?)
@ 2023-02-21  7:05       ` Eugenio Perez Martin
  -1 siblings, 0 replies; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-21  7:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Tue, Feb 21, 2023 at 6:33 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/21 13:27, Jason Wang 写道:
> >
> > 在 2023/2/8 17:42, Eugenio Pérez 写道:
> >> The function vhost.c:vhost_dev_stop fetches the vring base so the vq
> >> state can be migrated to other devices.  However, this is unreliable in
> >> vdpa, since we didn't signal the device to suspend the queues, making
> >> the value fetched useless.
> >>
> >> Suspend the device if possible before fetching first and subsequent
> >> vring bases.
> >>
> >> Moreover, vdpa totally reset and wipes the device at the last device
> >> before fetch its vrings base, making that operation useless in the last
> >> device. This will be fixed in later patches of this series.
> >
> >
> > It would be better not introduce a bug first and fix it in the
> > following patch.
> >
> >
> >>
> >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >> ---
> >>   hw/virtio/vhost-vdpa.c | 19 +++++++++++++++++++
> >>   hw/virtio/trace-events |  1 +
> >>   2 files changed, 20 insertions(+)
> >>
> >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >> index 2e79fbe4b2..cbbe92ffe8 100644
> >> --- a/hw/virtio/vhost-vdpa.c
> >> +++ b/hw/virtio/vhost-vdpa.c
> >> @@ -1108,6 +1108,24 @@ static void vhost_vdpa_svqs_stop(struct
> >> vhost_dev *dev)
> >>       }
> >>   }
> >>   +static void vhost_vdpa_suspend(struct vhost_dev *dev)
> >> +{
> >> +    struct vhost_vdpa *v = dev->opaque;
> >> +    int r;
> >> +
> >> +    if (!vhost_vdpa_first_dev(dev) ||
> >
> >
> > Any reason we need to use vhost_vdpa_first_dev() instead of replacing the
> >
> > if (started) {
> > } else {
> >     vhost_vdpa_reset_device(dev);
> >     ....
> > }

I can also move the check to vhost_vdpa_dev_start, for sure.

>
> Ok, I think I kind of understand, so I think we need re-order the
> patches, at least patch 4 should come before this patch?
>

I think it is doable, yes. I'll check and come back to you.

Thanks!

> Thanks
>
>
> >
> >
> > We check
> >
> > if (dev->vq_index + dev->nvqs != dev->vq_index_end) in
> > vhost_vdpa_dev_start() but vhost_vdpa_first_dev() inside
> > vhost_vdpa_suspend(). This will result code that is hard to maintain.
> >
> > Thanks
> >
> >
> >> +        !(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> >> +        return;
> >> +    }
> >> +
> >> +    trace_vhost_vdpa_suspend(dev);
> >> +    r = ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> >> +    if (unlikely(r)) {
> >> +        error_report("Cannot suspend: %s(%d)", g_strerror(errno),
> >> errno);
> >> +        /* Not aborting since we're called from stop context */
> >> +    }
> >> +}
> >> +
> >>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >>   {
> >>       struct vhost_vdpa *v = dev->opaque;
> >> @@ -1122,6 +1140,7 @@ static int vhost_vdpa_dev_start(struct
> >> vhost_dev *dev, bool started)
> >>           }
> >>           vhost_vdpa_set_vring_ready(dev);
> >>       } else {
> >> +        vhost_vdpa_suspend(dev);
> >>           vhost_vdpa_svqs_stop(dev);
> >>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >>       }
> >> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> >> index a87c5f39a2..8f8d05cf9b 100644
> >> --- a/hw/virtio/trace-events
> >> +++ b/hw/virtio/trace-events
> >> @@ -50,6 +50,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
> >>   vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
> >>   vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size,
> >> uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags:
> >> 0x%"PRIx32
> >>   vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len)
> >> "dev: %p config: %p config_len: %"PRIu32
> >> +vhost_vdpa_suspend(void *dev) "dev: %p"
> >>   vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
> >>   vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long
> >> long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64"
> >> size: %llu refcnt: %d fd: %d log: %p"
> >>   vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned
> >> int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t
> >> avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags:
> >> 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64"
> >> avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/13] vdpa: move vhost reset after get vring base
  2023-02-21  5:36     ` Jason Wang
  (?)
@ 2023-02-21  7:07     ` Eugenio Perez Martin
  2023-02-22  3:43         ` Jason Wang
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-21  7:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Tue, Feb 21, 2023 at 6:36 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > The function vhost.c:vhost_dev_stop calls vhost operation
> > vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> > the device, making the fetching of the vring base (virtqueue state) totally
> > useless.
> >
> > The kernel backend does not use vhost_dev_start vhost op callback, but
> > vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> > is desirable, but it can be added on top.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/vhost-backend.h |  4 ++++
> >   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
> >   hw/virtio/vhost.c                 |  3 +++
> >   3 files changed, 23 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index c5ab49051e..ec3fbae58d 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
> >
> >   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
> >                                          int fd);
> > +
> > +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> > +
> >   typedef struct VhostOps {
> >       VhostBackendType backend_type;
> >       vhost_backend_init vhost_backend_init;
> > @@ -177,6 +180,7 @@ typedef struct VhostOps {
> >       vhost_get_device_id_op vhost_get_device_id;
> >       vhost_force_iommu_op vhost_force_iommu;
> >       vhost_set_config_call_op vhost_set_config_call;
> > +    vhost_reset_status_op vhost_reset_status;
> >   } VhostOps;
> >
> >   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index cbbe92ffe8..26e38a6aab 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >       if (started) {
> >           memory_listener_register(&v->listener, &address_space_memory);
> >           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > -    } else {
> > -        vhost_vdpa_reset_device(dev);
> > -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > -                                   VIRTIO_CONFIG_S_DRIVER);
> > -        memory_listener_unregister(&v->listener);
> > +    }
> >
> > -        return 0;
> > +    return 0;
> > +}
> > +
> > +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +
> > +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > +        return;
> >       }
> > +
> > +    vhost_vdpa_reset_device(dev);
> > +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > +                                VIRTIO_CONFIG_S_DRIVER);
> > +    memory_listener_unregister(&v->listener);
> >   }
> >
> >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > @@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
> >           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
> >           .vhost_force_iommu = vhost_vdpa_force_iommu,
> >           .vhost_set_config_call = vhost_vdpa_set_config_call,
> > +        .vhost_reset_status = vhost_vdpa_reset_status,
> >   };
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index eb8c4c378c..a266396576 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
> >                                hdev->vqs + i,
> >                                hdev->vq_index + i);
> >       }
> > +    if (hdev->vhost_ops->vhost_reset_status) {
> > +        hdev->vhost_ops->vhost_reset_status(hdev);
> > +    }
>
>
> This looks racy, if we don't suspend/reset the device, device can move
> last_avail_idx even after get_vring_base()?
>
> Instead of doing things like this, should we fallback to
> virtio_queue_restore_last_avail_idx() in this case?
>

Right, we can track if the device is suspended / SVQ and then return
an error in vring_get_base if it is not. Would that work?

Thanks!

> Thanks
>
>
> >
> >       if (vhost_dev_has_iommu(hdev)) {
> >           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/13] vdpa: move vhost reset after get vring base
  2023-02-21  7:07     ` Eugenio Perez Martin
@ 2023-02-22  3:43         ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  3:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong

On Tue, Feb 21, 2023 at 3:08 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 21, 2023 at 6:36 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > > The function vhost.c:vhost_dev_stop calls vhost operation
> > > vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> > > the device, making the fetching of the vring base (virtqueue state) totally
> > > useless.
> > >
> > > The kernel backend does not use vhost_dev_start vhost op callback, but
> > > vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> > > is desirable, but it can be added on top.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   include/hw/virtio/vhost-backend.h |  4 ++++
> > >   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
> > >   hw/virtio/vhost.c                 |  3 +++
> > >   3 files changed, 23 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > index c5ab49051e..ec3fbae58d 100644
> > > --- a/include/hw/virtio/vhost-backend.h
> > > +++ b/include/hw/virtio/vhost-backend.h
> > > @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
> > >
> > >   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
> > >                                          int fd);
> > > +
> > > +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> > > +
> > >   typedef struct VhostOps {
> > >       VhostBackendType backend_type;
> > >       vhost_backend_init vhost_backend_init;
> > > @@ -177,6 +180,7 @@ typedef struct VhostOps {
> > >       vhost_get_device_id_op vhost_get_device_id;
> > >       vhost_force_iommu_op vhost_force_iommu;
> > >       vhost_set_config_call_op vhost_set_config_call;
> > > +    vhost_reset_status_op vhost_reset_status;
> > >   } VhostOps;
> > >
> > >   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index cbbe92ffe8..26e38a6aab 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >       if (started) {
> > >           memory_listener_register(&v->listener, &address_space_memory);
> > >           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > > -    } else {
> > > -        vhost_vdpa_reset_device(dev);
> > > -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > > -                                   VIRTIO_CONFIG_S_DRIVER);
> > > -        memory_listener_unregister(&v->listener);
> > > +    }
> > >
> > > -        return 0;
> > > +    return 0;
> > > +}
> > > +
> > > +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> > > +{
> > > +    struct vhost_vdpa *v = dev->opaque;
> > > +
> > > +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > > +        return;
> > >       }
> > > +
> > > +    vhost_vdpa_reset_device(dev);
> > > +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > > +                                VIRTIO_CONFIG_S_DRIVER);
> > > +    memory_listener_unregister(&v->listener);
> > >   }
> > >
> > >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > > @@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
> > >           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
> > >           .vhost_force_iommu = vhost_vdpa_force_iommu,
> > >           .vhost_set_config_call = vhost_vdpa_set_config_call,
> > > +        .vhost_reset_status = vhost_vdpa_reset_status,
> > >   };
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index eb8c4c378c..a266396576 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
> > >                                hdev->vqs + i,
> > >                                hdev->vq_index + i);
> > >       }
> > > +    if (hdev->vhost_ops->vhost_reset_status) {
> > > +        hdev->vhost_ops->vhost_reset_status(hdev);
> > > +    }
> >
> >
> > This looks racy, if we don't suspend/reset the device, device can move
> > last_avail_idx even after get_vring_base()?
> >
> > Instead of doing things like this, should we fallback to
> > virtio_queue_restore_last_avail_idx() in this case?
> >
>
> Right, we can track if the device is suspended / SVQ and then return
> an error in vring_get_base if it is not. Would that work?

When we don't support suspend, yes.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > >       if (vhost_dev_has_iommu(hdev)) {
> > >           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
> >
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 04/13] vdpa: move vhost reset after get vring base
@ 2023-02-22  3:43         ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  3:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Tue, Feb 21, 2023 at 3:08 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 21, 2023 at 6:36 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > > The function vhost.c:vhost_dev_stop calls vhost operation
> > > vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> > > the device, making the fetching of the vring base (virtqueue state) totally
> > > useless.
> > >
> > > The kernel backend does not use vhost_dev_start vhost op callback, but
> > > vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> > > is desirable, but it can be added on top.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   include/hw/virtio/vhost-backend.h |  4 ++++
> > >   hw/virtio/vhost-vdpa.c            | 22 ++++++++++++++++------
> > >   hw/virtio/vhost.c                 |  3 +++
> > >   3 files changed, 23 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > index c5ab49051e..ec3fbae58d 100644
> > > --- a/include/hw/virtio/vhost-backend.h
> > > +++ b/include/hw/virtio/vhost-backend.h
> > > @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
> > >
> > >   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
> > >                                          int fd);
> > > +
> > > +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> > > +
> > >   typedef struct VhostOps {
> > >       VhostBackendType backend_type;
> > >       vhost_backend_init vhost_backend_init;
> > > @@ -177,6 +180,7 @@ typedef struct VhostOps {
> > >       vhost_get_device_id_op vhost_get_device_id;
> > >       vhost_force_iommu_op vhost_force_iommu;
> > >       vhost_set_config_call_op vhost_set_config_call;
> > > +    vhost_reset_status_op vhost_reset_status;
> > >   } VhostOps;
> > >
> > >   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index cbbe92ffe8..26e38a6aab 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >       if (started) {
> > >           memory_listener_register(&v->listener, &address_space_memory);
> > >           return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > > -    } else {
> > > -        vhost_vdpa_reset_device(dev);
> > > -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > > -                                   VIRTIO_CONFIG_S_DRIVER);
> > > -        memory_listener_unregister(&v->listener);
> > > +    }
> > >
> > > -        return 0;
> > > +    return 0;
> > > +}
> > > +
> > > +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> > > +{
> > > +    struct vhost_vdpa *v = dev->opaque;
> > > +
> > > +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > > +        return;
> > >       }
> > > +
> > > +    vhost_vdpa_reset_device(dev);
> > > +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > > +                                VIRTIO_CONFIG_S_DRIVER);
> > > +    memory_listener_unregister(&v->listener);
> > >   }
> > >
> > >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > > @@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
> > >           .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
> > >           .vhost_force_iommu = vhost_vdpa_force_iommu,
> > >           .vhost_set_config_call = vhost_vdpa_set_config_call,
> > > +        .vhost_reset_status = vhost_vdpa_reset_status,
> > >   };
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index eb8c4c378c..a266396576 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
> > >                                hdev->vqs + i,
> > >                                hdev->vq_index + i);
> > >       }
> > > +    if (hdev->vhost_ops->vhost_reset_status) {
> > > +        hdev->vhost_ops->vhost_reset_status(hdev);
> > > +    }
> >
> >
> > This looks racy, if we don't suspend/reset the device, device can move
> > last_avail_idx even after get_vring_base()?
> >
> > Instead of doing things like this, should we fallback to
> > virtio_queue_restore_last_avail_idx() in this case?
> >
>
> Right, we can track if the device is suspended / SVQ and then return
> an error in vring_get_base if it is not. Would that work?

When we don't support suspend, yes.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > >       if (vhost_dev_has_iommu(hdev)) {
> > >           if (hdev->vhost_ops->vhost_set_iotlb_callback) {
> >
>
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
  2023-02-08  9:42 ` [PATCH v2 07/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
@ 2023-02-22  3:55     ` Jason Wang
  2023-02-22  3:55     ` Jason Wang
  1 sibling, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  3:55 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> This allows net to restart the device backend to configure SVQ on it.
>
> Ideally, these changes should not be net specific. However, the vdpa net
> backend is the one with enough knowledge to configure everything because
> of some reasons:
> * Queues might need to be shadowed or not depending on its kind (control
>    vs data).
> * Queues need to share the same map translations (iova tree).
>
> Because of that it is cleaner to restart the whole net backend and
> configure again as expected, similar to how vhost-kernel moves between
> userspace and passthrough.
>
> If more kinds of devices need dynamic switching to SVQ we can create a
> callback struct like VhostOps and move most of the code there.
> VhostOps cannot be reused since all vdpa backend share them, and to
> personalize just for networking would be too heavy.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3:
> * Add TODO to use the resume operation in the future.
> * Use migration_in_setup and migration_has_failed instead of a
>    complicated switch case.
> ---
>   net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index dd686b4514..bca13f97fd 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -26,12 +26,14 @@
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
>   #include "monitor/monitor.h"
> +#include "migration/misc.h"
>   #include "hw/virtio/vhost.h"
>   
>   /* Todo:need to add the multiqueue support here */
>   typedef struct VhostVDPAState {
>       NetClientState nc;
>       struct vhost_vdpa vhost_vdpa;
> +    Notifier migration_state;
>       VHostNetState *vhost_net;
>   
>       /* Control commands shadow buffers */
> @@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>       return DO_UPCAST(VhostVDPAState, nc, nc0);
>   }
>   
> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    VirtIONet *n;
> +    VirtIODevice *vdev;
> +    int data_queue_pairs, cvq, r;
> +    NetClientState *peer;
> +
> +    /* We are only called on the first data vqs and only if x-svq is not set */
> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> +        return;
> +    }
> +
> +    vdev = v->dev->vdev;
> +    n = VIRTIO_NET(vdev);


Let's tweak the code to move those initialization to the beginning of 
the function.


> +    if (!n->vhost_started) {
> +        return;
> +    }


What happens if the vhost is started during the live migration?


> +
> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +                                  n->max_ncs - n->max_queue_pairs : 0;
> +    /*
> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> +     * in the future and resume the device if read-only operations between
> +     * suspend and reset goes wrong.
> +     */
> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +
> +    peer = s->nc.peer;
> +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> +        VhostVDPAState *vdpa_state;
> +        NetClientState *nc;
> +
> +        if (i < data_queue_pairs) {
> +            nc = qemu_get_peer(peer, i);
> +        } else {
> +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> +        }
> +
> +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> +        vdpa_state->vhost_vdpa.shadow_data = enable;
> +
> +        if (i < data_queue_pairs) {
> +            /* Do not override CVQ shadow_vqs_enabled */
> +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> +        }


I wonder what happens if the number of queue pairs is changed during 
live migration? Should we assign all qps in this case?

Thanks


> +    }
> +
> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +    if (unlikely(r < 0)) {
> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> +    }
> +}
> +
> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> +{
> +    MigrationState *migration = data;
> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> +                                     migration_state);
> +
> +    if (migration_in_setup(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, true);
> +    } else if (migration_has_failed(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, false);
> +    }
> +}
> +
>   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>   {
>       struct vhost_vdpa *v = &s->vhost_vdpa;
>   
> +    add_migration_state_change_notifier(&s->migration_state);
>       if (v->shadow_vqs_enabled) {
>           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>                                              v->iova_range.last);
> @@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> +    if (s->vhost_vdpa.index == 0) {
> +        remove_migration_state_change_notifier(&s->migration_state);
> +    }
> +
>       dev = s->vhost_vdpa.dev;
>       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> @@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
>       s->always_svq = svq;
> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
@ 2023-02-22  3:55     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  3:55 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> This allows net to restart the device backend to configure SVQ on it.
>
> Ideally, these changes should not be net specific. However, the vdpa net
> backend is the one with enough knowledge to configure everything because
> of some reasons:
> * Queues might need to be shadowed or not depending on its kind (control
>    vs data).
> * Queues need to share the same map translations (iova tree).
>
> Because of that it is cleaner to restart the whole net backend and
> configure again as expected, similar to how vhost-kernel moves between
> userspace and passthrough.
>
> If more kinds of devices need dynamic switching to SVQ we can create a
> callback struct like VhostOps and move most of the code there.
> VhostOps cannot be reused since all vdpa backend share them, and to
> personalize just for networking would be too heavy.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> v3:
> * Add TODO to use the resume operation in the future.
> * Use migration_in_setup and migration_has_failed instead of a
>    complicated switch case.
> ---
>   net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index dd686b4514..bca13f97fd 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -26,12 +26,14 @@
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
>   #include "monitor/monitor.h"
> +#include "migration/misc.h"
>   #include "hw/virtio/vhost.h"
>   
>   /* Todo:need to add the multiqueue support here */
>   typedef struct VhostVDPAState {
>       NetClientState nc;
>       struct vhost_vdpa vhost_vdpa;
> +    Notifier migration_state;
>       VHostNetState *vhost_net;
>   
>       /* Control commands shadow buffers */
> @@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
>       return DO_UPCAST(VhostVDPAState, nc, nc0);
>   }
>   
> +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> +{
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    VirtIONet *n;
> +    VirtIODevice *vdev;
> +    int data_queue_pairs, cvq, r;
> +    NetClientState *peer;
> +
> +    /* We are only called on the first data vqs and only if x-svq is not set */
> +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> +        return;
> +    }
> +
> +    vdev = v->dev->vdev;
> +    n = VIRTIO_NET(vdev);


Let's tweak the code to move those initialization to the beginning of 
the function.


> +    if (!n->vhost_started) {
> +        return;
> +    }


What happens if the vhost is started during the live migration?


> +
> +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> +                                  n->max_ncs - n->max_queue_pairs : 0;
> +    /*
> +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> +     * in the future and resume the device if read-only operations between
> +     * suspend and reset goes wrong.
> +     */
> +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +
> +    peer = s->nc.peer;
> +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> +        VhostVDPAState *vdpa_state;
> +        NetClientState *nc;
> +
> +        if (i < data_queue_pairs) {
> +            nc = qemu_get_peer(peer, i);
> +        } else {
> +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> +        }
> +
> +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> +        vdpa_state->vhost_vdpa.shadow_data = enable;
> +
> +        if (i < data_queue_pairs) {
> +            /* Do not override CVQ shadow_vqs_enabled */
> +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> +        }


I wonder what happens if the number of queue pairs is changed during 
live migration? Should we assign all qps in this case?

Thanks


> +    }
> +
> +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> +    if (unlikely(r < 0)) {
> +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> +    }
> +}
> +
> +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> +{
> +    MigrationState *migration = data;
> +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> +                                     migration_state);
> +
> +    if (migration_in_setup(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, true);
> +    } else if (migration_has_failed(migration)) {
> +        vhost_vdpa_net_log_global_enable(s, false);
> +    }
> +}
> +
>   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>   {
>       struct vhost_vdpa *v = &s->vhost_vdpa;
>   
> +    add_migration_state_change_notifier(&s->migration_state);
>       if (v->shadow_vqs_enabled) {
>           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
>                                              v->iova_range.last);
> @@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
>   
>       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>   
> +    if (s->vhost_vdpa.index == 0) {
> +        remove_migration_state_change_notifier(&s->migration_state);
> +    }
> +
>       dev = s->vhost_vdpa.dev;
>       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
>           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> @@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
>       s->always_svq = svq;
> +    s->migration_state.notify = vdpa_net_migration_state_notifier;
>       s->vhost_vdpa.shadow_vqs_enabled = svq;
>       s->vhost_vdpa.iova_range = iova_range;
>       s->vhost_vdpa.shadow_data = svq;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
  2023-02-08  9:42 ` [PATCH v2 09/13] vdpa net: block migration if the device has CVQ Eugenio Pérez
@ 2023-02-22  4:00     ` Jason Wang
  2023-02-22  4:00     ` Jason Wang
  1 sibling, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  4:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> to future series.


I may miss something but what is missed to support CVQ/MQ?

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index bca13f97fd..309861e56c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       }
>   
>       if (has_cvq) {
> +        VhostVDPAState *s;
> +
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
>                                    opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
> +
> +        s = DO_UPCAST(VhostVDPAState, nc, nc);
> +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
> +                   "net vdpa cannot migrate with MQ feature");
>       }
>   
>       return 0;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
@ 2023-02-22  4:00     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  4:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> to future series.


I may miss something but what is missed to support CVQ/MQ?

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   net/vhost-vdpa.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index bca13f97fd..309861e56c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       }
>   
>       if (has_cvq) {
> +        VhostVDPAState *s;
> +
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                    vdpa_device_fd, i, 1, false,
>                                    opts->x_svq, iova_range);
>           if (!nc)
>               goto err;
> +
> +        s = DO_UPCAST(VhostVDPAState, nc, nc);
> +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
> +                   "net vdpa cannot migrate with MQ feature");
>       }
>   
>       return 0;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
  2023-02-08  9:42 ` [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND Eugenio Pérez
@ 2023-02-22  4:05     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  4:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> Next patches enable devices to be migrated even if vdpa netdev has not
> been started with x-svq. However, not all devices are migratable, so we
> need to block migration if we detect that.
>
> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
> has not been started with x-svq.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 84a6b9690b..9d30cf9b3c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>           return 0;
>       }
>   
> +    /*
> +     * If dev->shadow_vqs_enabled at initialization that means the device has
> +     * been started with x-svq=on, so don't block migration
> +     */
> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
> +        uint64_t backend_features;
> +
> +        /* We don't have dev->backend_features yet */
> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
> +                              &backend_features);
> +        if (unlikely(ret)) {
> +            error_setg_errno(errp, -ret, "Could not get backend features");
> +            return ret;
> +        }
> +
> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> +            error_setg(&dev->migration_blocker,
> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
> +        }


I wonder why not let the device to decide? For networking device, we can 
live without suspend probably.

Thanks


> +    }
> +
>       /*
>        * Similar to VFIO, we end up pinning all guest memory and have to
>        * disable discarding of RAM.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
@ 2023-02-22  4:05     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  4:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> Next patches enable devices to be migrated even if vdpa netdev has not
> been started with x-svq. However, not all devices are migratable, so we
> need to block migration if we detect that.
>
> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
> has not been started with x-svq.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 84a6b9690b..9d30cf9b3c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>           return 0;
>       }
>   
> +    /*
> +     * If dev->shadow_vqs_enabled at initialization that means the device has
> +     * been started with x-svq=on, so don't block migration
> +     */
> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
> +        uint64_t backend_features;
> +
> +        /* We don't have dev->backend_features yet */
> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
> +                              &backend_features);
> +        if (unlikely(ret)) {
> +            error_setg_errno(errp, -ret, "Could not get backend features");
> +            return ret;
> +        }
> +
> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> +            error_setg(&dev->migration_blocker,
> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
> +        }


I wonder why not let the device to decide? For networking device, we can 
live without suspend probably.

Thanks


> +    }
> +
>       /*
>        * Similar to VFIO, we end up pinning all guest memory and have to
>        * disable discarding of RAM.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
  2023-02-08  9:42 ` [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices Eugenio Pérez
@ 2023-02-22  4:07     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  4:07 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/8 17:42, Eugenio Pérez 写道:
> vhost-vdpa devices can return this features now that blockers have been
> set in case some features are not met.
>
> Expose VHOST_F_LOG_ALL only in that case.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


>   hw/virtio/vhost-vdpa.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 13a86a2bb1..5fddc77c5c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1319,10 +1319,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>   static int vhost_vdpa_get_features(struct vhost_dev *dev,
>                                        uint64_t *features)
>   {
> -    struct vhost_vdpa *v = dev->opaque;
>       int ret = vhost_vdpa_get_dev_features(dev, features);
>   
> -    if (ret == 0 && v->shadow_vqs_enabled) {
> +    if (ret == 0) {
>           /* Add SVQ logging capabilities */
>           *features |= BIT_ULL(VHOST_F_LOG_ALL);
>       }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices
@ 2023-02-22  4:07     ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-22  4:07 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, virtualization, Harpreet Singh Anand, Lei Yang,
	Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/8 17:42, Eugenio Pérez 写道:
> vhost-vdpa devices can return this features now that blockers have been
> set in case some features are not met.
>
> Expose VHOST_F_LOG_ALL only in that case.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


>   hw/virtio/vhost-vdpa.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 13a86a2bb1..5fddc77c5c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1319,10 +1319,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>   static int vhost_vdpa_get_features(struct vhost_dev *dev,
>                                        uint64_t *features)
>   {
> -    struct vhost_vdpa *v = dev->opaque;
>       int ret = vhost_vdpa_get_dev_features(dev, features);
>   
> -    if (ret == 0 && v->shadow_vqs_enabled) {
> +    if (ret == 0) {
>           /* Add SVQ logging capabilities */
>           *features |= BIT_ULL(VHOST_F_LOG_ALL);
>       }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier
  2023-02-22  3:55     ` Jason Wang
  (?)
@ 2023-02-22  7:23     ` Eugenio Perez Martin
  -1 siblings, 0 replies; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-22  7:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Wed, Feb 22, 2023 at 4:56 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > This allows net to restart the device backend to configure SVQ on it.
> >
> > Ideally, these changes should not be net specific. However, the vdpa net
> > backend is the one with enough knowledge to configure everything because
> > of some reasons:
> > * Queues might need to be shadowed or not depending on its kind (control
> >    vs data).
> > * Queues need to share the same map translations (iova tree).
> >
> > Because of that it is cleaner to restart the whole net backend and
> > configure again as expected, similar to how vhost-kernel moves between
> > userspace and passthrough.
> >
> > If more kinds of devices need dynamic switching to SVQ we can create a
> > callback struct like VhostOps and move most of the code there.
> > VhostOps cannot be reused since all vdpa backend share them, and to
> > personalize just for networking would be too heavy.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> > v3:
> > * Add TODO to use the resume operation in the future.
> > * Use migration_in_setup and migration_has_failed instead of a
> >    complicated switch case.
> > ---
> >   net/vhost-vdpa.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 76 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index dd686b4514..bca13f97fd 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -26,12 +26,14 @@
> >   #include <err.h>
> >   #include "standard-headers/linux/virtio_net.h"
> >   #include "monitor/monitor.h"
> > +#include "migration/misc.h"
> >   #include "hw/virtio/vhost.h"
> >
> >   /* Todo:need to add the multiqueue support here */
> >   typedef struct VhostVDPAState {
> >       NetClientState nc;
> >       struct vhost_vdpa vhost_vdpa;
> > +    Notifier migration_state;
> >       VHostNetState *vhost_net;
> >
> >       /* Control commands shadow buffers */
> > @@ -241,10 +243,79 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >       return DO_UPCAST(VhostVDPAState, nc, nc0);
> >   }
> >
> > +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> > +{
> > +    struct vhost_vdpa *v = &s->vhost_vdpa;
> > +    VirtIONet *n;
> > +    VirtIODevice *vdev;
> > +    int data_queue_pairs, cvq, r;
> > +    NetClientState *peer;
> > +
> > +    /* We are only called on the first data vqs and only if x-svq is not set */
> > +    if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> > +        return;
> > +    }
> > +
> > +    vdev = v->dev->vdev;
> > +    n = VIRTIO_NET(vdev);
>
>
> Let's tweak the code to move those initialization to the beginning of
> the function.
>

Sure.

>
> > +    if (!n->vhost_started) {
> > +        return;
> > +    }
>
>
> What happens if the vhost is started during the live migration?
>

This is solved at v3, checking the migrate state at
vhost_vdpa_net_data_start_first too [1]. However, this created another
few complications / complex code as Si-Wei points out.

Recent changes due to virtio reset makes it easier to move all this
code to hw/virtio/vhost-vdpa.c, where different kinds of vDPA devices
can share the code. I'll send a new version that way.

>
> > +
> > +    data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +    cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> > +                                  n->max_ncs - n->max_queue_pairs : 0;
> > +    /*
> > +     * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
> > +     * in the future and resume the device if read-only operations between
> > +     * suspend and reset goes wrong.
> > +     */
> > +    vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +
> > +    peer = s->nc.peer;
> > +    for (int i = 0; i < data_queue_pairs + cvq; i++) {
> > +        VhostVDPAState *vdpa_state;
> > +        NetClientState *nc;
> > +
> > +        if (i < data_queue_pairs) {
> > +            nc = qemu_get_peer(peer, i);
> > +        } else {
> > +            nc = qemu_get_peer(peer, n->max_queue_pairs);
> > +        }
> > +
> > +        vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> > +        vdpa_state->vhost_vdpa.shadow_data = enable;
> > +
> > +        if (i < data_queue_pairs) {
> > +            /* Do not override CVQ shadow_vqs_enabled */
> > +            vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> > +        }
>
>
> I wonder what happens if the number of queue pairs is changed during
> live migration? Should we assign all qps in this case?
>

Migration is blocked if the device has CVQ feature in this series.

Thanks!

[1] https://patchwork.kernel.org/project/qemu-devel/patch/20230215173850.298832-9-eperezma@redhat.com/

> Thanks
>
>
> > +    }
> > +
> > +    r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +    if (unlikely(r < 0)) {
> > +        error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> > +    }
> > +}
> > +
> > +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> > +{
> > +    MigrationState *migration = data;
> > +    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> > +                                     migration_state);
> > +
> > +    if (migration_in_setup(migration)) {
> > +        vhost_vdpa_net_log_global_enable(s, true);
> > +    } else if (migration_has_failed(migration)) {
> > +        vhost_vdpa_net_log_global_enable(s, false);
> > +    }
> > +}
> > +
> >   static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> >   {
> >       struct vhost_vdpa *v = &s->vhost_vdpa;
> >
> > +    add_migration_state_change_notifier(&s->migration_state);
> >       if (v->shadow_vqs_enabled) {
> >           v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> >                                              v->iova_range.last);
> > @@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >
> >       assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > +    if (s->vhost_vdpa.index == 0) {
> > +        remove_migration_state_change_notifier(&s->migration_state);
> > +    }
> > +
> >       dev = s->vhost_vdpa.dev;
> >       if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> >           g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > @@ -741,6 +816,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> >       s->always_svq = svq;
> > +    s->migration_state.notify = vdpa_net_migration_state_notifier;
> >       s->vhost_vdpa.shadow_vqs_enabled = svq;
> >       s->vhost_vdpa.iova_range = iova_range;
> >       s->vhost_vdpa.shadow_data = svq;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
  2023-02-22  4:00     ` Jason Wang
  (?)
@ 2023-02-22  7:28     ` Eugenio Perez Martin
  2023-02-23  2:41         ` Jason Wang
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-22  7:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Wed, Feb 22, 2023 at 5:01 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> > to future series.
>
>
> I may miss something but what is missed to support CVQ/MQ?
>

To restore all the device state set by CVQ in the migration source
(MAC, MQ, ...) before data vqs start. We don't have a reliable way to
not start data vqs until the device [1].

Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02652.html

> Thanks
>
>
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   net/vhost-vdpa.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index bca13f97fd..309861e56c 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       }
> >
> >       if (has_cvq) {
> > +        VhostVDPAState *s;
> > +
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                    vdpa_device_fd, i, 1, false,
> >                                    opts->x_svq, iova_range);
> >           if (!nc)
> >               goto err;
> > +
> > +        s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
> > +                   "net vdpa cannot migrate with MQ feature");
> >       }
> >
> >       return 0;
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
  2023-02-22  4:05     ` Jason Wang
  (?)
@ 2023-02-22 14:25     ` Eugenio Perez Martin
  2023-02-23  2:38         ` Jason Wang
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-22 14:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Wed, Feb 22, 2023 at 5:05 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > Next patches enable devices to be migrated even if vdpa netdev has not
> > been started with x-svq. However, not all devices are migratable, so we
> > need to block migration if we detect that.
> >
> > Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
> > has not been started with x-svq.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
> >   1 file changed, 21 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 84a6b9690b..9d30cf9b3c 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >           return 0;
> >       }
> >
> > +    /*
> > +     * If dev->shadow_vqs_enabled at initialization that means the device has
> > +     * been started with x-svq=on, so don't block migration
> > +     */
> > +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
> > +        uint64_t backend_features;
> > +
> > +        /* We don't have dev->backend_features yet */
> > +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
> > +                              &backend_features);
> > +        if (unlikely(ret)) {
> > +            error_setg_errno(errp, -ret, "Could not get backend features");
> > +            return ret;
> > +        }
> > +
> > +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> > +            error_setg(&dev->migration_blocker,
> > +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
> > +        }
>
>
> I wonder why not let the device to decide? For networking device, we can
> live without suspend probably.
>

Right, but how can we know if this is a net device in init? I don't
think a switch (vhost_vdpa_get_device_id(dev)) is elegant.

If the parent device does not need to be suspended i'd go with
exposing a suspend ioctl but do nothing in the parent device. After
that, it could even choose to return an error for GET_VRING_BASE.

If we want to implement it as a fallback in qemu, I'd go for
implementing it on top of this series. There are a few operations we
could move to a device-kind specific ops.

Would it make sense to you?

Thanks!


> Thanks
>
>
> > +    }
> > +
> >       /*
> >        * Similar to VFIO, we end up pinning all guest memory and have to
> >        * disable discarding of RAM.
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
  2023-02-22 14:25     ` Eugenio Perez Martin
@ 2023-02-23  2:38         ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-23  2:38 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/22 22:25, Eugenio Perez Martin 写道:
> On Wed, Feb 22, 2023 at 5:05 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/2/8 17:42, Eugenio Pérez 写道:
>>> Next patches enable devices to be migrated even if vdpa netdev has not
>>> been started with x-svq. However, not all devices are migratable, so we
>>> need to block migration if we detect that.
>>>
>>> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
>>> has not been started with x-svq.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
>>>    1 file changed, 21 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 84a6b9690b..9d30cf9b3c 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>            return 0;
>>>        }
>>>
>>> +    /*
>>> +     * If dev->shadow_vqs_enabled at initialization that means the device has
>>> +     * been started with x-svq=on, so don't block migration
>>> +     */
>>> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
>>> +        uint64_t backend_features;
>>> +
>>> +        /* We don't have dev->backend_features yet */
>>> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
>>> +                              &backend_features);
>>> +        if (unlikely(ret)) {
>>> +            error_setg_errno(errp, -ret, "Could not get backend features");
>>> +            return ret;
>>> +        }
>>> +
>>> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
>>> +            error_setg(&dev->migration_blocker,
>>> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
>>> +        }
>>
>> I wonder why not let the device to decide? For networking device, we can
>> live without suspend probably.
>>
> Right, but how can we know if this is a net device in init? I don't
> think a switch (vhost_vdpa_get_device_id(dev)) is elegant.


I meant the caller of vhost_vdpa_init() which is net_init_vhost_vdpa().

Thanks


>
> If the parent device does not need to be suspended i'd go with
> exposing a suspend ioctl but do nothing in the parent device. After
> that, it could even choose to return an error for GET_VRING_BASE.
>
> If we want to implement it as a fallback in qemu, I'd go for
> implementing it on top of this series. There are a few operations we
> could move to a device-kind specific ops.
>
> Would it make sense to you?
>
> Thanks!
>
>
>> Thanks
>>
>>
>>> +    }
>>> +
>>>        /*
>>>         * Similar to VFIO, we end up pinning all guest memory and have to
>>>         * disable discarding of RAM.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
@ 2023-02-23  2:38         ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-23  2:38 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/22 22:25, Eugenio Perez Martin 写道:
> On Wed, Feb 22, 2023 at 5:05 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/2/8 17:42, Eugenio Pérez 写道:
>>> Next patches enable devices to be migrated even if vdpa netdev has not
>>> been started with x-svq. However, not all devices are migratable, so we
>>> need to block migration if we detect that.
>>>
>>> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
>>> has not been started with x-svq.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
>>>    1 file changed, 21 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 84a6b9690b..9d30cf9b3c 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>            return 0;
>>>        }
>>>
>>> +    /*
>>> +     * If dev->shadow_vqs_enabled at initialization that means the device has
>>> +     * been started with x-svq=on, so don't block migration
>>> +     */
>>> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
>>> +        uint64_t backend_features;
>>> +
>>> +        /* We don't have dev->backend_features yet */
>>> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
>>> +                              &backend_features);
>>> +        if (unlikely(ret)) {
>>> +            error_setg_errno(errp, -ret, "Could not get backend features");
>>> +            return ret;
>>> +        }
>>> +
>>> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
>>> +            error_setg(&dev->migration_blocker,
>>> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
>>> +        }
>>
>> I wonder why not let the device to decide? For networking device, we can
>> live without suspend probably.
>>
> Right, but how can we know if this is a net device in init? I don't
> think a switch (vhost_vdpa_get_device_id(dev)) is elegant.


I meant the caller of vhost_vdpa_init() which is net_init_vhost_vdpa().

Thanks


>
> If the parent device does not need to be suspended i'd go with
> exposing a suspend ioctl but do nothing in the parent device. After
> that, it could even choose to return an error for GET_VRING_BASE.
>
> If we want to implement it as a fallback in qemu, I'd go for
> implementing it on top of this series. There are a few operations we
> could move to a device-kind specific ops.
>
> Would it make sense to you?
>
> Thanks!
>
>
>> Thanks
>>
>>
>>> +    }
>>> +
>>>        /*
>>>         * Similar to VFIO, we end up pinning all guest memory and have to
>>>         * disable discarding of RAM.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
  2023-02-22  7:28     ` Eugenio Perez Martin
@ 2023-02-23  2:41         ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-23  2:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu


在 2023/2/22 15:28, Eugenio Perez Martin 写道:
> On Wed, Feb 22, 2023 at 5:01 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/2/8 17:42, Eugenio Pérez 写道:
>>> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
>>> to future series.
>>
>> I may miss something but what is missed to support CVQ/MQ?
>>
> To restore all the device state set by CVQ in the migration source
> (MAC, MQ, ...) before data vqs start. We don't have a reliable way to
> not start data vqs until the device [1].
>
> Thanks!
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02652.html


Right. It might be mention this defect in either the change log or 
somewhere in the code as a comment.

(Btw, I think we should fix those vDPA drivers).

Thanks


>
>> Thanks
>>
>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    net/vhost-vdpa.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>> index bca13f97fd..309861e56c 100644
>>> --- a/net/vhost-vdpa.c
>>> +++ b/net/vhost-vdpa.c
>>> @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        }
>>>
>>>        if (has_cvq) {
>>> +        VhostVDPAState *s;
>>> +
>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                     vdpa_device_fd, i, 1, false,
>>>                                     opts->x_svq, iova_range);
>>>            if (!nc)
>>>                goto err;
>>> +
>>> +        s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
>>> +                   "net vdpa cannot migrate with MQ feature");
>>>        }
>>>
>>>        return 0;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ
@ 2023-02-23  2:41         ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-23  2:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong


在 2023/2/22 15:28, Eugenio Perez Martin 写道:
> On Wed, Feb 22, 2023 at 5:01 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/2/8 17:42, Eugenio Pérez 写道:
>>> Devices with CVQ needs to migrate state beyond vq state.  Leaving this
>>> to future series.
>>
>> I may miss something but what is missed to support CVQ/MQ?
>>
> To restore all the device state set by CVQ in the migration source
> (MAC, MQ, ...) before data vqs start. We don't have a reliable way to
> not start data vqs until the device [1].
>
> Thanks!
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02652.html


Right. It might be mention this defect in either the change log or 
somewhere in the code as a comment.

(Btw, I think we should fix those vDPA drivers).

Thanks


>
>> Thanks
>>
>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    net/vhost-vdpa.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>>> index bca13f97fd..309861e56c 100644
>>> --- a/net/vhost-vdpa.c
>>> +++ b/net/vhost-vdpa.c
>>> @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>>>        }
>>>
>>>        if (has_cvq) {
>>> +        VhostVDPAState *s;
>>> +
>>>            nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>>>                                     vdpa_device_fd, i, 1, false,
>>>                                     opts->x_svq, iova_range);
>>>            if (!nc)
>>>                goto err;
>>> +
>>> +        s = DO_UPCAST(VhostVDPAState, nc, nc);
>>> +        error_setg(&s->vhost_vdpa.dev->migration_blocker,
>>> +                   "net vdpa cannot migrate with MQ feature");
>>>        }
>>>
>>>        return 0;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
  2023-02-23  2:38         ` Jason Wang
  (?)
@ 2023-02-23 11:06         ` Eugenio Perez Martin
  2023-02-24  3:16             ` Jason Wang
  -1 siblings, 1 reply; 68+ messages in thread
From: Eugenio Perez Martin @ 2023-02-23 11:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Thu, Feb 23, 2023 at 3:38 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2023/2/22 22:25, Eugenio Perez Martin 写道:
> > On Wed, Feb 22, 2023 at 5:05 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> >>> Next patches enable devices to be migrated even if vdpa netdev has not
> >>> been started with x-svq. However, not all devices are migratable, so we
> >>> need to block migration if we detect that.
> >>>
> >>> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
> >>> has not been started with x-svq.
> >>>
> >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>> ---
> >>>    hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
> >>>    1 file changed, 21 insertions(+)
> >>>
> >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> >>> index 84a6b9690b..9d30cf9b3c 100644
> >>> --- a/hw/virtio/vhost-vdpa.c
> >>> +++ b/hw/virtio/vhost-vdpa.c
> >>> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >>>            return 0;
> >>>        }
> >>>
> >>> +    /*
> >>> +     * If dev->shadow_vqs_enabled at initialization that means the device has
> >>> +     * been started with x-svq=on, so don't block migration
> >>> +     */
> >>> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
> >>> +        uint64_t backend_features;
> >>> +
> >>> +        /* We don't have dev->backend_features yet */
> >>> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
> >>> +                              &backend_features);
> >>> +        if (unlikely(ret)) {
> >>> +            error_setg_errno(errp, -ret, "Could not get backend features");
> >>> +            return ret;
> >>> +        }
> >>> +
> >>> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> >>> +            error_setg(&dev->migration_blocker,
> >>> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
> >>> +        }
> >>
> >> I wonder why not let the device to decide? For networking device, we can
> >> live without suspend probably.
> >>
> > Right, but how can we know if this is a net device in init? I don't
> > think a switch (vhost_vdpa_get_device_id(dev)) is elegant.
>
>
> I meant the caller of vhost_vdpa_init() which is net_init_vhost_vdpa().
>

That's doable but I'm not sure if it is convenient.

Since we're always offering _F_LOG I thought of the lack of _F_SUSPEND
as the default migration blocker for other kinds of devices like blk.
If we move this code to net_init_vhost_vdpa, all other devices are in
charge of block migration by themselves.

I guess the right action is to use a variable similar to
vhost_vdpa->f_log_all. It defaults to false, and the device can choose
if it should export it or not. This way, the device does not migrate
by default, and the equivalent of net_init_vhost_vdpa could choose
whether to offer _F_LOG with SVQ or not.

OTOH I guess other kinds of devices already must place blockers beyond
_F_LOG, so maybe it makes sense to always offer _F_LOG even if
_F_SUSPEND is not offered? Stefano G., would that break vhost-vdpa-blk
support?

Thanks!

> Thanks
>
>
> >
> > If the parent device does not need to be suspended i'd go with
> > exposing a suspend ioctl but do nothing in the parent device. After
> > that, it could even choose to return an error for GET_VRING_BASE.
> >
> > If we want to implement it as a fallback in qemu, I'd go for
> > implementing it on top of this series. There are a few operations we
> > could move to a device-kind specific ops.
> >
> > Would it make sense to you?
> >
> > Thanks!
> >
> >
> >> Thanks
> >>
> >>
> >>> +    }
> >>> +
> >>>        /*
> >>>         * Similar to VFIO, we end up pinning all guest memory and have to
> >>>         * disable discarding of RAM.
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
  2023-02-23 11:06         ` Eugenio Perez Martin
@ 2023-02-24  3:16             ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-24  3:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, virtualization, Harpreet Singh Anand,
	Lei Yang, Stefan Hajnoczi, Eli Cohen, longpeng2, Shannon Nelson,
	Liuxiangdong

On Thu, Feb 23, 2023 at 7:07 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Feb 23, 2023 at 3:38 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/2/22 22:25, Eugenio Perez Martin 写道:
> > > On Wed, Feb 22, 2023 at 5:05 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > >>> Next patches enable devices to be migrated even if vdpa netdev has not
> > >>> been started with x-svq. However, not all devices are migratable, so we
> > >>> need to block migration if we detect that.
> > >>>
> > >>> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
> > >>> has not been started with x-svq.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
> > >>>    1 file changed, 21 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index 84a6b9690b..9d30cf9b3c 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> > >>>            return 0;
> > >>>        }
> > >>>
> > >>> +    /*
> > >>> +     * If dev->shadow_vqs_enabled at initialization that means the device has
> > >>> +     * been started with x-svq=on, so don't block migration
> > >>> +     */
> > >>> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
> > >>> +        uint64_t backend_features;
> > >>> +
> > >>> +        /* We don't have dev->backend_features yet */
> > >>> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
> > >>> +                              &backend_features);
> > >>> +        if (unlikely(ret)) {
> > >>> +            error_setg_errno(errp, -ret, "Could not get backend features");
> > >>> +            return ret;
> > >>> +        }
> > >>> +
> > >>> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> > >>> +            error_setg(&dev->migration_blocker,
> > >>> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
> > >>> +        }
> > >>
> > >> I wonder why not let the device to decide? For networking device, we can
> > >> live without suspend probably.
> > >>
> > > Right, but how can we know if this is a net device in init? I don't
> > > think a switch (vhost_vdpa_get_device_id(dev)) is elegant.
> >
> >
> > I meant the caller of vhost_vdpa_init() which is net_init_vhost_vdpa().
> >
>
> That's doable but I'm not sure if it is convenient.

So it's a question whether or not we try to let migration work without
suspending. If we don't, there's no need to bother. Looking at the
current vhost-net implementation, it tries to make migration work upon
the error of get_vring_base() so maybe it's worth a try if it doesn't
bother too much. But I'm fine to go either way.

>
> Since we're always offering _F_LOG I thought of the lack of _F_SUSPEND
> as the default migration blocker for other kinds of devices like blk.

Or we can have this by default and allow a specific type of device to clear?

> If we move this code to net_init_vhost_vdpa, all other devices are in
> charge of block migration by themselves.
>
> I guess the right action is to use a variable similar to
> vhost_vdpa->f_log_all. It defaults to false, and the device can choose
> if it should export it or not. This way, the device does not migrate
> by default, and the equivalent of net_init_vhost_vdpa could choose
> whether to offer _F_LOG with SVQ or not.

Looks similar to what I think above.

>
> OTOH I guess other kinds of devices already must place blockers beyond
> _F_LOG, so maybe it makes sense to always offer _F_LOG even if
> _F_SUSPEND is not offered?

I don't see any dependency between the two features. Technically,
there could be devices that have neither _F_LOG nor _F_SUSPEND.

Thanks

> Stefano G., would that break vhost-vdpa-blk
> support?
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > If the parent device does not need to be suspended i'd go with
> > > exposing a suspend ioctl but do nothing in the parent device. After
> > > that, it could even choose to return an error for GET_VRING_BASE.
> > >
> > > If we want to implement it as a fallback in qemu, I'd go for
> > > implementing it on top of this series. There are a few operations we
> > > could move to a device-kind specific ops.
> > >
> > > Would it make sense to you?
> > >
> > > Thanks!
> > >
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +    }
> > >>> +
> > >>>        /*
> > >>>         * Similar to VFIO, we end up pinning all guest memory and have to
> > >>>         * disable discarding of RAM.
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND
@ 2023-02-24  3:16             ` Jason Wang
  0 siblings, 0 replies; 68+ messages in thread
From: Jason Wang @ 2023-02-24  3:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, Harpreet Singh Anand, Gonglei (Arei),
	Michael S. Tsirkin, Cindy Lu, alvaro.karsz, Zhu Lingshan,
	Lei Yang, Liuxiangdong, Shannon Nelson, Parav Pandit,
	Gautam Dawar, Eli Cohen, Stefan Hajnoczi, Laurent Vivier,
	longpeng2, virtualization, Stefano Garzarella, si-wei.liu

On Thu, Feb 23, 2023 at 7:07 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Feb 23, 2023 at 3:38 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2023/2/22 22:25, Eugenio Perez Martin 写道:
> > > On Wed, Feb 22, 2023 at 5:05 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > >>> Next patches enable devices to be migrated even if vdpa netdev has not
> > >>> been started with x-svq. However, not all devices are migratable, so we
> > >>> need to block migration if we detect that.
> > >>>
> > >>> Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
> > >>> has not been started with x-svq.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-vdpa.c | 21 +++++++++++++++++++++
> > >>>    1 file changed, 21 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index 84a6b9690b..9d30cf9b3c 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> > >>>            return 0;
> > >>>        }
> > >>>
> > >>> +    /*
> > >>> +     * If dev->shadow_vqs_enabled at initialization that means the device has
> > >>> +     * been started with x-svq=on, so don't block migration
> > >>> +     */
> > >>> +    if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
> > >>> +        uint64_t backend_features;
> > >>> +
> > >>> +        /* We don't have dev->backend_features yet */
> > >>> +        ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
> > >>> +                              &backend_features);
> > >>> +        if (unlikely(ret)) {
> > >>> +            error_setg_errno(errp, -ret, "Could not get backend features");
> > >>> +            return ret;
> > >>> +        }
> > >>> +
> > >>> +        if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
> > >>> +            error_setg(&dev->migration_blocker,
> > >>> +                "vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
> > >>> +        }
> > >>
> > >> I wonder why not let the device to decide? For networking device, we can
> > >> live without suspend probably.
> > >>
> > > Right, but how can we know if this is a net device in init? I don't
> > > think a switch (vhost_vdpa_get_device_id(dev)) is elegant.
> >
> >
> > I meant the caller of vhost_vdpa_init() which is net_init_vhost_vdpa().
> >
>
> That's doable but I'm not sure if it is convenient.

So it's a question whether or not we try to let migration work without
suspending. If we don't, there's no need to bother. Looking at the
current vhost-net implementation, it tries to make migration work upon
the error of get_vring_base() so maybe it's worth a try if it doesn't
bother too much. But I'm fine to go either way.

>
> Since we're always offering _F_LOG I thought of the lack of _F_SUSPEND
> as the default migration blocker for other kinds of devices like blk.

Or we can have this by default and allow a specific type of device to clear?

> If we move this code to net_init_vhost_vdpa, all other devices are in
> charge of block migration by themselves.
>
> I guess the right action is to use a variable similar to
> vhost_vdpa->f_log_all. It defaults to false, and the device can choose
> if it should export it or not. This way, the device does not migrate
> by default, and the equivalent of net_init_vhost_vdpa could choose
> whether to offer _F_LOG with SVQ or not.

Looks similar to what I think above.

>
> OTOH I guess other kinds of devices already must place blockers beyond
> _F_LOG, so maybe it makes sense to always offer _F_LOG even if
> _F_SUSPEND is not offered?

I don't see any dependency between the two features. Technically,
there could be devices that have neither _F_LOG nor _F_SUSPEND.

Thanks

> Stefano G., would that break vhost-vdpa-blk
> support?
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > If the parent device does not need to be suspended i'd go with
> > > exposing a suspend ioctl but do nothing in the parent device. After
> > > that, it could even choose to return an error for GET_VRING_BASE.
> > >
> > > If we want to implement it as a fallback in qemu, I'd go for
> > > implementing it on top of this series. There are a few operations we
> > > could move to a device-kind specific ops.
> > >
> > > Would it make sense to you?
> > >
> > > Thanks!
> > >
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +    }
> > >>> +
> > >>>        /*
> > >>>         * Similar to VFIO, we end up pinning all guest memory and have to
> > >>>         * disable discarding of RAM.
> >
>



^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2023-02-24  3:17 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08  9:42 [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
2023-02-08  9:42 ` [PATCH v2 01/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
2023-02-13  6:50   ` Si-Wei Liu
2023-02-13  6:50     ` Si-Wei Liu
2023-02-13 11:14     ` Eugenio Perez Martin
2023-02-14  1:45       ` Si-Wei Liu
2023-02-14  1:45         ` Si-Wei Liu
2023-02-14 19:07         ` Eugenio Perez Martin
2023-02-16  2:14           ` Si-Wei Liu
2023-02-16  2:14             ` Si-Wei Liu
2023-02-16  7:35             ` Eugenio Perez Martin
2023-02-17  7:38               ` Si-Wei Liu
2023-02-17  7:38                 ` Si-Wei Liu
2023-02-17 13:55                 ` Eugenio Perez Martin
2023-02-08  9:42 ` [PATCH v2 02/13] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
2023-02-08  9:42 ` [PATCH v2 03/13] vdpa: add vhost_vdpa_suspend Eugenio Pérez
2023-02-21  5:27   ` Jason Wang
2023-02-21  5:27     ` Jason Wang
2023-02-21  5:33     ` Jason Wang
2023-02-21  5:33       ` Jason Wang
2023-02-21  7:05       ` Eugenio Perez Martin
2023-02-08  9:42 ` [PATCH v2 04/13] vdpa: move vhost reset after get vring base Eugenio Pérez
2023-02-21  5:36   ` Jason Wang
2023-02-21  5:36     ` Jason Wang
2023-02-21  7:07     ` Eugenio Perez Martin
2023-02-22  3:43       ` Jason Wang
2023-02-22  3:43         ` Jason Wang
2023-02-08  9:42 ` [PATCH v2 05/13] vdpa: rewind at get_base, not set_base Eugenio Pérez
2023-02-21  5:40   ` Jason Wang
2023-02-21  5:40     ` Jason Wang
2023-02-08  9:42 ` [PATCH v2 06/13] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
2023-02-08  9:42 ` [PATCH v2 07/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
2023-02-13  6:50   ` Si-Wei Liu
2023-02-13  6:50     ` Si-Wei Liu
2023-02-13 15:51     ` Eugenio Perez Martin
2023-02-22  3:55   ` Jason Wang
2023-02-22  3:55     ` Jason Wang
2023-02-22  7:23     ` Eugenio Perez Martin
2023-02-08  9:42 ` [PATCH v2 08/13] vdpa: disable RAM block discard only for the first device Eugenio Pérez
2023-02-08  9:42 ` [PATCH v2 09/13] vdpa net: block migration if the device has CVQ Eugenio Pérez
2023-02-13  6:50   ` Si-Wei Liu
2023-02-13  6:50     ` Si-Wei Liu
2023-02-14 18:06     ` Eugenio Perez Martin
2023-02-22  4:00   ` Jason Wang
2023-02-22  4:00     ` Jason Wang
2023-02-22  7:28     ` Eugenio Perez Martin
2023-02-23  2:41       ` Jason Wang
2023-02-23  2:41         ` Jason Wang
2023-02-08  9:42 ` [PATCH v2 10/13] vdpa: block migration if device has unsupported features Eugenio Pérez
2023-02-08  9:42 ` [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND Eugenio Pérez
2023-02-22  4:05   ` Jason Wang
2023-02-22  4:05     ` Jason Wang
2023-02-22 14:25     ` Eugenio Perez Martin
2023-02-23  2:38       ` Jason Wang
2023-02-23  2:38         ` Jason Wang
2023-02-23 11:06         ` Eugenio Perez Martin
2023-02-24  3:16           ` Jason Wang
2023-02-24  3:16             ` Jason Wang
2023-02-08  9:42 ` [PATCH v2 12/13] vdpa: block migration if SVQ does not admit a feature Eugenio Pérez
2023-02-08  9:42 ` [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices Eugenio Pérez
2023-02-22  4:07   ` Jason Wang
2023-02-22  4:07     ` Jason Wang
2023-02-08 10:29 ` [PATCH v2 00/13] Dynamycally switch to vhost shadow virtqueues at vdpa net migration Alvaro Karsz
2023-02-08 10:29   ` Alvaro Karsz
2023-02-09 14:38   ` Lei Yang
2023-02-10 12:57 ` Gautam Dawar
2023-02-15 18:40   ` Eugenio Perez Martin
2023-02-16 13:50     ` Lei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.