All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB
@ 2023-12-07 17:39 Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h Si-Wei Liu
                   ` (41 more replies)
  0 siblings, 42 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

This patch series contain several enhancements to SVQ live migration downtime
for vDPA-net hardware device, specifically on mlx5_vdpa. Currently it is based
off of Eugenio's RFC v2 .load_setup series [1] to utilize the shared facility
and reduce frictions in merging or duplicating code if at all possible.

It's stacked up in particular order as below, as the optimization for one on
the top has to depend on others on the bottom. Here's a breakdown for what
each part does respectively:

Patch #  |          Feature / optimization
---------V-------------------------------------------------------------------
35 - 40  | trace events
34       | migrate_cancel bug fix
21 - 33  | (Un)map batching at stop-n-copy to further optimize LM down time
11 - 20  | persistent IOTLB [3] to improve LM down time
02 - 10  | SVQ descriptor ASID [2] to optimize SVQ switching
01       | dependent linux headers
         V 

Let's first define 2 sources of downtime that this work is concerned with:

* SVQ switching downtime (Downtime #1): downtime at the start of migration.
  Time spent on teardown and setup for SVQ mode switching, and this downtime
  is regarded as the maxium time for an individual vdpa-net device.
  No memory transfer is involved during SVQ switching, hence no .

* LM downtime (Downtime #2): aggregated downtime for all vdpa-net devices on
  resource teardown and setup in the last stop-n-copy phase on source host.

With each part of the optimizations applied bottom up, the effective outcome
in terms of down time (in seconds) performance can be observed in this table:


                    |    Downtime #1    |    Downtime #2
--------------------+-------------------+-------------------
Baseline QEMU       |     20s ~ 30s     |        20s
                    |                   |
Iterative map       |                   |
at destination[1]   |        5s         |        20s
                    |                   |
SVQ descriptor      |                   |
    ASID [2]        |        2s         |         5s
                    |                   |
                    |                   |
persistent IOTLB    |        2s         |         2s
      [3]           |                   |
                    |                   |
(Un)map batching    |                   |
at stop-n-copy      |      1.7s         |       1.5s 
before switchover   |                   |

(VM config: 128GB mem, 2 mlx5_vdpa devices, each w/ 4 data vqs)

Please find the details regarding each enhancement on the commit log.

Thanks,
-Siwei


[1] [RFC PATCH v2 00/10] Map memory at destination .load_setup in vDPA-net migration
https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05711.html
[2] VHOST_BACKEND_F_DESC_ASID
https://lore.kernel.org/virtualization/20231018171456.1624030-2-dtatulea@nvidia.com/
[3] VHOST_BACKEND_F_IOTLB_PERSIST
https://lore.kernel.org/virtualization/1698304480-18463-1-git-send-email-si-wei.liu@oracle.com/

---

Si-Wei Liu (40):
  linux-headers: add vhost_types.h and vhost.h
  vdpa: add vhost_vdpa_get_vring_desc_group
  vdpa: probe descriptor group index for data vqs
  vdpa: piggyback desc_group index when probing isolated cvq
  vdpa: populate desc_group from net_vhost_vdpa_init
  vhost: make svq work with gpa without iova translation
  vdpa: move around vhost_vdpa_set_address_space_id
  vdpa: add back vhost_vdpa_net_first_nc_vdpa
  vdpa: no repeat setting shadow_data
  vdpa: assign svq descriptors a separate ASID when possible
  vdpa: factor out vhost_vdpa_last_dev
  vdpa: check map_thread_enabled before join maps thread
  vdpa: ref counting VhostVDPAShared
  vdpa: convert iova_tree to ref count based
  vdpa: add svq_switching and flush_map to header
  vdpa: indicate SVQ switching via flag
  vdpa: judge if map can be kept across reset
  vdpa: unregister listener on last dev cleanup
  vdpa: should avoid map flushing with persistent iotlb
  vdpa: avoid mapping flush across reset
  vdpa: vhost_vdpa_dma_batch_end_once rename
  vdpa: factor out vhost_vdpa_map_batch_begin
  vdpa: vhost_vdpa_dma_batch_begin_once rename
  vdpa: factor out vhost_vdpa_dma_batch_end
  vdpa: add asid to dma_batch_once API
  vdpa: return int for dma_batch_once API
  vdpa: add asid to all dma_batch call sites
  vdpa: support iotlb_batch_asid
  vdpa: expose API vhost_vdpa_dma_batch_once
  vdpa: batch map/unmap op per svq pair basis
  vdpa: batch map and unmap around cvq svq start/stop
  vdpa: factor out vhost_vdpa_net_get_nc_vdpa
  vdpa: batch multiple dma_unmap to a single call for vm stop
  vdpa: fix network breakage after cancelling migration
  vdpa: add vhost_vdpa_set_address_space_id trace
  vdpa: add vhost_vdpa_get_vring_base trace for svq mode
  vdpa: add vhost_vdpa_set_dev_vring_base trace for svq mode
  vdpa: add trace events for eval_flush
  vdpa: add trace events for vhost_vdpa_net_load_cmd
  vdpa: add trace event for vhost_vdpa_net_load_mq

 hw/virtio/trace-events                       |   9 +-
 hw/virtio/vhost-shadow-virtqueue.c           |  35 ++-
 hw/virtio/vhost-vdpa.c                       | 156 +++++++---
 include/hw/virtio/vhost-vdpa.h               |  16 +
 include/standard-headers/linux/vhost_types.h |  13 +
 linux-headers/linux/vhost.h                  |   9 +
 net/trace-events                             |   8 +
 net/vhost-vdpa.c                             | 434 ++++++++++++++++++++++-----
 8 files changed, 558 insertions(+), 122 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11  7:47   ` Eugenio Perez Martin
  2024-01-11  3:32   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group Si-Wei Liu
                   ` (40 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 include/standard-headers/linux/vhost_types.h | 13 +++++++++++++
 linux-headers/linux/vhost.h                  |  9 +++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
index 5ad07e1..c39199b 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -185,5 +185,18 @@ struct vhost_vdpa_iova_range {
  * DRIVER_OK
  */
 #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
+/* Device can be resumed */
+#define VHOST_BACKEND_F_RESUME  0x5
+/* Device supports the driver enabling virtqueues both before and after
+ * DRIVER_OK
+ */
+#define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
+/* Device may expose the virtqueue's descriptor area, driver area and
+ * device area to a different group for ASID binding than where its
+ * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
+ */
+#define VHOST_BACKEND_F_DESC_ASID    0x7
+/* IOTLB don't flush memory mapping across device reset */
+#define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
 
 #endif
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index f5c48b6..c61c687 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -219,4 +219,13 @@
  */
 #define VHOST_VDPA_RESUME		_IO(VHOST_VIRTIO, 0x7E)
 
+/* Get the dedicated group for the descriptor table of a virtqueue:
+ * read index, write group in num.
+ * The virtqueue index is stored in the index field of vhost_vring_state.
+ * The group id for the descriptor table of this specific virtqueue
+ * is returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_DESC_GROUP	_IOWR(VHOST_VIRTIO, 0x7F,	\
+					      struct vhost_vring_state)
+
 #endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  3:51   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 03/40] vdpa: probe descriptor group index for data vqs Si-Wei Liu
                   ` (39 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Internal API to get the descriptor group index for a specific virtqueue
through the VHOST_VDPA_GET_VRING_DESC_GROUP ioctl.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 90f4128..887c329 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -471,6 +471,25 @@ static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index,
     return state.num;
 }
 
+static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
+                                               unsigned vq_index,
+                                               Error **errp)
+{
+    struct vhost_vring_state state = {
+        .index = vq_index,
+    };
+    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_DESC_GROUP, &state);
+
+    if (unlikely(r < 0)) {
+        r = -errno;
+        error_setg_errno(errp, errno, "Cannot get VQ %u descriptor group",
+                         vq_index);
+        return r;
+    }
+
+    return state.num;
+}
+
 static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
                                            unsigned vq_group,
                                            unsigned asid_num)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 03/40] vdpa: probe descriptor group index for data vqs
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 18:49   ` Eugenio Perez Martin
  2024-01-11  4:02   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq Si-Wei Liu
                   ` (38 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Getting it ahead at initialization time instead of start time allows
decision making independent of device status, while reducing failure
possibility in starting device or during migration.

Adding function vhost_vdpa_probe_desc_group() for that end. This
function will be used to probe the descriptor group for data vqs.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 887c329..0cf3147 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -1688,6 +1688,95 @@ out:
     return r;
 }
 
+static int vhost_vdpa_probe_desc_group(int device_fd, uint64_t features,
+                                       int vq_index, int64_t *desc_grpidx,
+                                       Error **errp)
+{
+    uint64_t backend_features;
+    int64_t vq_group, desc_group;
+    uint8_t saved_status = 0;
+    uint8_t status = 0;
+    int r;
+
+    ERRP_GUARD();
+
+    r = ioctl(device_fd, VHOST_GET_BACKEND_FEATURES, &backend_features);
+    if (unlikely(r < 0)) {
+        error_setg_errno(errp, errno, "Cannot get vdpa backend_features");
+        return r;
+    }
+
+    if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
+        return 0;
+    }
+
+    if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID))) {
+        return 0;
+    }
+
+    r = ioctl(device_fd, VHOST_VDPA_GET_STATUS, &saved_status);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "Cannot get device status");
+        goto out;
+    }
+
+    r = ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "Cannot reset device");
+        goto out;
+    }
+
+    r = ioctl(device_fd, VHOST_SET_FEATURES, &features);
+    if (unlikely(r)) {
+        error_setg_errno(errp, errno, "Cannot set features");
+    }
+
+    status = VIRTIO_CONFIG_S_ACKNOWLEDGE |
+             VIRTIO_CONFIG_S_DRIVER |
+             VIRTIO_CONFIG_S_FEATURES_OK;
+
+    r = ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
+    if (unlikely(r)) {
+        error_setg_errno(errp, -r, "Cannot set device status");
+        goto out;
+    }
+
+    vq_group = vhost_vdpa_get_vring_group(device_fd, vq_index, errp);
+    if (unlikely(vq_group < 0)) {
+        if (vq_group != -ENOTSUP) {
+            r = vq_group;
+            goto out;
+        }
+
+        /*
+         * The kernel report VHOST_BACKEND_F_IOTLB_ASID if the vdpa frontend
+         * support ASID even if the parent driver does not.
+         */
+        error_free(*errp);
+        *errp = NULL;
+        r = 0;
+        goto out;
+    }
+
+    desc_group = vhost_vdpa_get_vring_desc_group(device_fd, vq_index,
+                                                 errp);
+    if (unlikely(desc_group < 0)) {
+        r = desc_group;
+        goto out;
+    } else if (desc_group != vq_group) {
+        *desc_grpidx = desc_group;
+    }
+    r = 1;
+
+out:
+    status = 0;
+    ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
+    if (saved_status) {
+        ioctl(device_fd, VHOST_VDPA_SET_STATUS, &saved_status);
+    }
+    return r;
+}
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        const char *device,
                                        const char *name,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (2 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 03/40] vdpa: probe descriptor group index for data vqs Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  7:06   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init Si-Wei Liu
                   ` (37 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Same as the previous commit, but do it for cvq instead of data vqs.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 0cf3147..cb5705d 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -1601,16 +1601,19 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
 };
 
 /**
- * Probe if CVQ is isolated
+ * Probe if CVQ is isolated, and piggyback its descriptor group
+ * index if supported
  *
  * @device_fd         The vdpa device fd
  * @features          Features offered by the device.
  * @cvq_index         The control vq pair index
+ * @desc_grpidx       The CVQ's descriptor group index to return
  *
- * Returns <0 in case of failure, 0 if false and 1 if true.
+ * Returns <0 in case of failure, 0 if false and 1 if true (isolated).
  */
 static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
-                                          int cvq_index, Error **errp)
+                                          int cvq_index, int64_t *desc_grpidx,
+                                          Error **errp)
 {
     uint64_t backend_features;
     int64_t cvq_group;
@@ -1667,6 +1670,13 @@ static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
         goto out;
     }
 
+    if (backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) {
+        int64_t desc_group = vhost_vdpa_get_vring_desc_group(device_fd,
+                                                             cvq_index, errp);
+        if (likely(desc_group >= 0) && desc_group != cvq_group)
+            *desc_grpidx = desc_group;
+    }
+
     for (int i = 0; i < cvq_index; ++i) {
         int64_t group = vhost_vdpa_get_vring_group(device_fd, i, errp);
         if (unlikely(group < 0)) {
@@ -1685,6 +1695,8 @@ static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
 out:
     status = 0;
     ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
+    status = VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER;
+    ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
     return r;
 }
 
@@ -1791,6 +1803,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        Error **errp)
 {
     NetClientState *nc = NULL;
+    int64_t desc_group = -1;
     VhostVDPAState *s;
     int ret = 0;
     assert(name);
@@ -1802,7 +1815,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     } else {
         cvq_isolated = vhost_vdpa_probe_cvq_isolation(vdpa_device_fd, features,
                                                       queue_pair_index * 2,
-                                                      errp);
+                                                      &desc_group, errp);
         if (unlikely(cvq_isolated < 0)) {
             return NULL;
         }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (3 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 10:46   ` Eugenio Perez Martin
  2024-01-11  7:09   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 06/40] vhost: make svq work with gpa without iova translation Si-Wei Liu
                   ` (36 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Add the desc_group field to struct vhost_vdpa, and get it
populated when the corresponding vq is initialized at
net_vhost_vdpa_init. If the vq does not have descriptor
group capability, or it doesn't have a dedicated ASID
group to host descriptors other than the data buffers,
desc_group will be set to a negative value -1.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 include/hw/virtio/vhost-vdpa.h |  1 +
 net/vhost-vdpa.c               | 15 +++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 6533ad2..63493ff 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -87,6 +87,7 @@ typedef struct vhost_vdpa {
     Error *migration_blocker;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
     IOMMUNotifier n;
+    int64_t desc_group;
 } VhostVDPA;
 
 int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index cb5705d..1a738b2 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -1855,11 +1855,22 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
-        qemu_del_net_client(nc);
-        return NULL;
+        goto err;
     }
 
+    if (is_datapath) {
+        ret = vhost_vdpa_probe_desc_group(vdpa_device_fd, features,
+                                          0, &desc_group, errp);
+        if (unlikely(ret < 0)) {
+            goto err;
+        }
+    }
+    s->vhost_vdpa.desc_group = desc_group;
     return nc;
+
+err:
+    qemu_del_net_client(nc);
+    return NULL;
 }
 
 static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 06/40] vhost: make svq work with gpa without iova translation
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (4 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 11:17   ` Eugenio Perez Martin
  2024-01-11  7:31   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id Si-Wei Liu
                   ` (35 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Make vhost_svq_vring_write_descs able to work with GPA directly
without going through iova tree for translation. This will be
needed in the next few patches where the SVQ has dedicated
address space to host its virtqueues. Instead of having to
translate qemu's VA to IOVA via the iova tree, with dedicated
or isolated address space for SVQ descriptors, the IOVA is
exactly same as the guest GPA space where translation would
not be needed any more.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index fc5f408..97ccd45 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -136,8 +136,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
  * Return true if success, false otherwise and print error.
  */
 static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-                                        const struct iovec *iovec, size_t num,
-                                        bool more_descs, bool write)
+                                        const struct iovec *iovec, hwaddr *addr,
+                                        size_t num, bool more_descs, bool write)
 {
     uint16_t i = svq->free_head, last = svq->free_head;
     unsigned n;
@@ -149,8 +149,15 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
         return true;
     }
 
-    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
-    if (unlikely(!ok)) {
+    if (svq->iova_tree) {
+        ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+        if (unlikely(!ok)) {
+            return false;
+        }
+    } else if (!addr) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "No translation found for vaddr 0x%p\n",
+                      iovec[0].iov_base);
         return false;
     }
 
@@ -161,7 +168,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
         } else {
             descs[i].flags = flags;
         }
-        descs[i].addr = cpu_to_le64(sg[n]);
+        descs[i].addr = cpu_to_le64(svq->iova_tree ? sg[n] : addr[n]);
         descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
         last = i;
@@ -173,9 +180,10 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                const struct iovec *out_sg, size_t out_num,
-                                const struct iovec *in_sg, size_t in_num,
-                                unsigned *head)
+                                const struct iovec *out_sg, hwaddr *out_addr,
+                                size_t out_num,
+                                const struct iovec *in_sg, hwaddr *in_addr,
+                                size_t in_num, unsigned *head)
 {
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
@@ -191,13 +199,14 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
-                                     false);
+    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_addr, out_num,
+                                     in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
+    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_addr, in_num,
+                                     false, true);
     if (unlikely(!ok)) {
         return false;
     }
@@ -258,7 +267,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return -ENOSPC;
     }
 
-    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
+    ok = vhost_svq_add_split(svq, out_sg, elem ? elem->out_addr : NULL,
+                             out_num, in_sg, elem ? elem->in_addr : NULL,
+                             in_num, &qemu_head);
     if (unlikely(!ok)) {
         return -EINVAL;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (5 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 06/40] vhost: make svq work with gpa without iova translation Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 11:18   ` Eugenio Perez Martin
  2024-01-11  7:33   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa Si-Wei Liu
                   ` (34 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Move it a few lines ahead to make function call easier for those
before it.  No funtional change involved.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1a738b2..dbfa192 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -335,6 +335,24 @@ static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
     }
 }
 
+static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
+                                           unsigned vq_group,
+                                           unsigned asid_num)
+{
+    struct vhost_vring_state asid = {
+        .index = vq_group,
+        .num = asid_num,
+    };
+    int r;
+
+    r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
+    if (unlikely(r < 0)) {
+        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
+                     asid.index, asid.num, errno, g_strerror(errno));
+    }
+    return r;
+}
+
 static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
 {
     struct vhost_vdpa *v = &s->vhost_vdpa;
@@ -490,24 +508,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
     return state.num;
 }
 
-static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
-                                           unsigned vq_group,
-                                           unsigned asid_num)
-{
-    struct vhost_vring_state asid = {
-        .index = vq_group,
-        .num = asid_num,
-    };
-    int r;
-
-    r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
-    if (unlikely(r < 0)) {
-        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
-                     asid.index, asid.num, errno, g_strerror(errno));
-    }
-    return r;
-}
-
 static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
 {
     VhostIOVATree *tree = v->shared->iova_tree;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (6 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 11:19   ` Eugenio Perez Martin
  2024-01-11  7:37   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 09/40] vdpa: no repeat setting shadow_data Si-Wei Liu
                   ` (33 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Previous commits had it removed. Now adding it back because
this function will be needed by next patches.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index dbfa192..c9bfc6f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -287,6 +287,16 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
     return size;
 }
 
+
+/** From any vdpa net client, get the netclient of the first queue pair */
+static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
+{
+    NICState *nic = qemu_get_nic(s->nc.peer);
+    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
+
+    return DO_UPCAST(VhostVDPAState, nc, nc0);
+}
+
 static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
 {
     struct vhost_vdpa *v = &s->vhost_vdpa;
@@ -566,7 +576,7 @@ dma_map_err:
 
 static int vhost_vdpa_net_cvq_start(NetClientState *nc)
 {
-    VhostVDPAState *s;
+    VhostVDPAState *s, *s0;
     struct vhost_vdpa *v;
     int64_t cvq_group;
     int r;
@@ -577,7 +587,8 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
     s = DO_UPCAST(VhostVDPAState, nc, nc);
     v = &s->vhost_vdpa;
 
-    v->shadow_vqs_enabled = v->shared->shadow_data;
+    s0 = vhost_vdpa_net_first_nc_vdpa(s);
+    v->shadow_vqs_enabled = s0->vhost_vdpa.shadow_vqs_enabled;
     s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
 
     if (v->shared->shadow_data) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 09/40] vdpa: no repeat setting shadow_data
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (7 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 11:21   ` Eugenio Perez Martin
  2024-01-11  7:34   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible Si-Wei Liu
                   ` (32 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Since shadow_data is now shared in the parent data struct, it
just needs to be set only once by the first vq. This change
will make shadow_data independent of svq enabled state, which
can be optionally turned off when SVQ descritors and device
driver areas are all isolated to a separate address space.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index c9bfc6f..2555897 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -387,13 +387,12 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
     if (s->always_svq ||
         migration_is_setup_or_active(migrate_get_current()->state)) {
         v->shadow_vqs_enabled = true;
-        v->shared->shadow_data = true;
     } else {
         v->shadow_vqs_enabled = false;
-        v->shared->shadow_data = false;
     }
 
     if (v->index == 0) {
+        v->shared->shadow_data = v->shadow_vqs_enabled;
         vhost_vdpa_net_data_start_first(s);
         return 0;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (8 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 09/40] vdpa: no repeat setting shadow_data Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 13:35   ` Eugenio Perez Martin
  2024-01-11  8:02   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev Si-Wei Liu
                   ` (31 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

When backend supports the VHOST_BACKEND_F_DESC_ASID feature
and all the data vqs can support one or more descriptor group
to host SVQ vrings and descriptors, we assign them a different
ASID than where its buffers reside in guest memory address
space. With this dedicated ASID for SVQs, the IOVA for what
vdpa device may care effectively becomes the GPA, thus there's
no need to translate IOVA address. For this reason, shadow_data
can be turned off accordingly. It doesn't mean the SVQ is not
enabled, but just that the translation is not needed from iova
tree perspective.

We can reuse CVQ's address space ID to host SVQ descriptors
because both CVQ and SVQ are emulated in the same QEMU
process, which will share the same VA address space.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c |  5 ++++-
 net/vhost-vdpa.c       | 57 ++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 24844b5..30dff95 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -627,6 +627,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     uint64_t qemu_backend_features = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
                                      0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
                                      0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
+                                     0x1ULL << VHOST_BACKEND_F_DESC_ASID |
                                      0x1ULL << VHOST_BACKEND_F_SUSPEND;
     int ret;
 
@@ -1249,7 +1250,9 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
             goto err;
         }
 
-        vhost_svq_start(svq, dev->vdev, vq, v->shared->iova_tree);
+        vhost_svq_start(svq, dev->vdev, vq,
+                        v->desc_group >= 0 && v->address_space_id ?
+                        NULL : v->shared->iova_tree);
         ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err);
         if (unlikely(!ok)) {
             goto err_map;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2555897..aebaa53 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -366,20 +366,50 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
 static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
 {
     struct vhost_vdpa *v = &s->vhost_vdpa;
+    int r;
 
     migration_add_notifier(&s->migration_state,
                            vdpa_net_migration_state_notifier);
 
+    if (!v->shadow_vqs_enabled) {
+        if (v->desc_group >= 0 &&
+            v->address_space_id != VHOST_VDPA_GUEST_PA_ASID) {
+            vhost_vdpa_set_address_space_id(v, v->desc_group,
+                                            VHOST_VDPA_GUEST_PA_ASID);
+            s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
+        }
+        return;
+    }
+
     /* iova_tree may be initialized by vhost_vdpa_net_load_setup */
-    if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
+    if (!v->shared->iova_tree) {
         v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
                                                    v->shared->iova_range.last);
     }
+
+    if (s->always_svq || v->desc_group < 0) {
+        return;
+    }
+
+    r = vhost_vdpa_set_address_space_id(v, v->desc_group,
+                                        VHOST_VDPA_NET_CVQ_ASID);
+    if (unlikely(r < 0)) {
+        /* The other data vqs should also fall back to using the same ASID */
+        s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
+        return;
+    }
+
+    /* No translation needed on data SVQ when descriptor group is used */
+    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
+    s->vhost_vdpa.shared->shadow_data = false;
+    return;
 }
 
 static int vhost_vdpa_net_data_start(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
+
     struct vhost_vdpa *v = &s->vhost_vdpa;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
@@ -397,6 +427,18 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
         return 0;
     }
 
+    if (v->desc_group >= 0 && v->desc_group != s0->vhost_vdpa.desc_group) {
+        unsigned asid;
+        asid = v->shadow_vqs_enabled ?
+            s0->vhost_vdpa.address_space_id : VHOST_VDPA_GUEST_PA_ASID;
+        if (asid != s->vhost_vdpa.address_space_id) {
+            vhost_vdpa_set_address_space_id(v, v->desc_group, asid);
+        }
+        s->vhost_vdpa.address_space_id = asid;
+    } else {
+        s->vhost_vdpa.address_space_id = s0->vhost_vdpa.address_space_id;
+    }
+
     return 0;
 }
 
@@ -603,13 +645,19 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
         return 0;
     }
 
-    if (!s->cvq_isolated) {
+    if (!s->cvq_isolated && v->desc_group < 0) {
+        if (s0->vhost_vdpa.shadow_vqs_enabled &&
+            s0->vhost_vdpa.desc_group >= 0 &&
+            s0->vhost_vdpa.address_space_id) {
+            v->shadow_vqs_enabled = false;
+        }
         return 0;
     }
 
-    cvq_group = vhost_vdpa_get_vring_group(v->shared->device_fd,
+    cvq_group = s->cvq_isolated ?
+                vhost_vdpa_get_vring_group(v->shared->device_fd,
                                            v->dev->vq_index_end - 1,
-                                           &err);
+                                           &err) : v->desc_group;
     if (unlikely(cvq_group < 0)) {
         error_report_err(err);
         return cvq_group;
@@ -1840,6 +1888,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->always_svq = svq;
     s->migration_state.notify = NULL;
     s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
     if (queue_pair_index == 0) {
         vhost_vdpa_net_valid_svq_features(features,
                                           &s->vhost_vdpa.migration_blocker);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (9 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 13:36   ` Eugenio Perez Martin
  2024-01-11  8:03   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 12/40] vdpa: check map_thread_enabled before join maps thread Si-Wei Liu
                   ` (30 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Generalize duplicated condition check for the last vq of vdpa
device to a common function.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 30dff95..2b1cc14 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -593,6 +593,11 @@ static bool vhost_vdpa_first_dev(struct vhost_dev *dev)
     return v->index == 0;
 }
 
+static bool vhost_vdpa_last_dev(struct vhost_dev *dev)
+{
+    return dev->vq_index + dev->nvqs == dev->vq_index_end;
+}
+
 static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
                                        uint64_t *features)
 {
@@ -1432,7 +1437,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         goto out_stop;
     }
 
-    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+    if (!vhost_vdpa_last_dev(dev)) {
         return 0;
     }
 
@@ -1467,7 +1472,7 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v = dev->opaque;
 
-    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+    if (!vhost_vdpa_last_dev(dev)) {
         return;
     }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 12/40] vdpa: check map_thread_enabled before join maps thread
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (10 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 13/40] vdpa: ref counting VhostVDPAShared Si-Wei Liu
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

The next patches will also register memory listener on
demand, hence the need to differentiate the map_thread
case from the rest.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2b1cc14..4f026db 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1450,7 +1450,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         if (!v->shared->listener_registered) {
             memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
             v->shared->listener_registered = true;
-        } else {
+        } else if (v->shared->map_thread_enabled) {
             ok = vhost_vdpa_join_maps_thread(v->shared);
             if (unlikely(!ok)) {
                 goto out_stop;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 13/40] vdpa: ref counting VhostVDPAShared
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (11 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 12/40] vdpa: check map_thread_enabled before join maps thread Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  8:12   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 14/40] vdpa: convert iova_tree to ref count based Si-Wei Liu
                   ` (28 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Subsequent patches attempt to release VhostVDPAShared resources,
for example iova tree to free and memory listener to unregister,
in vdpa_dev_cleanup(). Instead of checking against the vq index,
which is not always available in all of the callers, counting
the usage by reference. Then it'll be easy to free resource
upon the last deref.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 include/hw/virtio/vhost-vdpa.h |  2 ++
 net/vhost-vdpa.c               | 14 ++++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 63493ff..7b8d3bf 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -70,6 +70,8 @@ typedef struct vhost_vdpa_shared {
 
     /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
     bool shadow_data;
+
+    unsigned refcnt;
 } VhostVDPAShared;
 
 typedef struct vhost_vdpa {
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index aebaa53..a126e5c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -236,11 +236,11 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
         g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
-    if (s->vhost_vdpa.index != 0) {
-        return;
+    if (--s->vhost_vdpa.shared->refcnt == 0) {
+        qemu_close(s->vhost_vdpa.shared->device_fd);
+        g_free(s->vhost_vdpa.shared);
     }
-    qemu_close(s->vhost_vdpa.shared->device_fd);
-    g_free(s->vhost_vdpa.shared);
+    s->vhost_vdpa.shared = NULL;
 }
 
 /** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend  */
@@ -1896,6 +1896,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
         s->vhost_vdpa.shared->device_fd = vdpa_device_fd;
         s->vhost_vdpa.shared->iova_range = iova_range;
         s->vhost_vdpa.shared->shadow_data = svq;
+        s->vhost_vdpa.shared->refcnt++;
     } else if (!is_datapath) {
         s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
                                      PROT_READ | PROT_WRITE,
@@ -1910,6 +1911,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     }
     if (queue_pair_index != 0) {
         s->vhost_vdpa.shared = shared;
+        s->vhost_vdpa.shared->refcnt++;
     }
 
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
@@ -1928,6 +1930,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     return nc;
 
 err:
+    if (--s->vhost_vdpa.shared->refcnt == 0) {
+        g_free(s->vhost_vdpa.shared);
+    }
+    s->vhost_vdpa.shared = NULL;
     qemu_del_net_client(nc);
     return NULL;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 14/40] vdpa: convert iova_tree to ref count based
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (12 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 13/40] vdpa: ref counting VhostVDPAShared Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 17:21   ` Eugenio Perez Martin
  2024-01-11  8:15   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 15/40] vdpa: add svq_switching and flush_map to header Si-Wei Liu
                   ` (27 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

So that it can be freed from vhost_vdpa_cleanup on
the last deref. The next few patches will try to
make iova tree life cycle not depend on memory
listener, and there's possiblity to keep iova tree
around when memory mapping is not changed across
device reset.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a126e5c..7b8f047 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -238,6 +238,8 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
     }
     if (--s->vhost_vdpa.shared->refcnt == 0) {
         qemu_close(s->vhost_vdpa.shared->device_fd);
+        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
+                        vhost_iova_tree_delete);
         g_free(s->vhost_vdpa.shared);
     }
     s->vhost_vdpa.shared = NULL;
@@ -461,19 +463,12 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
 static void vhost_vdpa_net_client_stop(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
-    struct vhost_dev *dev;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
     if (s->vhost_vdpa.index == 0) {
         migration_remove_notifier(&s->migration_state);
     }
-
-    dev = s->vhost_vdpa.dev;
-    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
-        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
-                        vhost_iova_tree_delete);
-    }
 }
 
 static int vhost_vdpa_net_load_setup(NetClientState *nc, NICState *nic)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 15/40] vdpa: add svq_switching and flush_map to header
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (13 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 14/40] vdpa: convert iova_tree to ref count based Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  8:16   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 16/40] vdpa: indicate SVQ switching via flag Si-Wei Liu
                   ` (26 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Will be used in next patches.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 include/hw/virtio/vhost-vdpa.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 7b8d3bf..0fe0f60 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -72,6 +72,12 @@ typedef struct vhost_vdpa_shared {
     bool shadow_data;
 
     unsigned refcnt;
+
+    /* SVQ switching is in progress? 1: turn on SVQ, -1: turn off SVQ */
+    int svq_switching;
+
+    /* Flush mappings on reset due to shared address space */
+    bool flush_map;
 } VhostVDPAShared;
 
 typedef struct vhost_vdpa {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 16/40] vdpa: indicate SVQ switching via flag
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (14 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 15/40] vdpa: add svq_switching and flush_map to header Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  8:17   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 17/40] vdpa: judge if map can be kept across reset Si-Wei Liu
                   ` (25 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

svq_switching indicates the case where SVQ mode change
is on going. Positive (1) means switching from the
normal passthrough mode to SVQ mode, and negative (-1)
meaning switch SVQ back to the passthrough; zero (0)
indicates that there's no SVQ mode switch taking place.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 7b8f047..04718b2 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -320,6 +320,7 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
     data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
     cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
                                   n->max_ncs - n->max_queue_pairs : 0;
+    v->shared->svq_switching = enable ? 1 : -1;
     /*
      * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
      * in the future and resume the device if read-only operations between
@@ -332,6 +333,7 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
     if (unlikely(r < 0)) {
         error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
     }
+    v->shared->svq_switching = 0;
 }
 
 static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 17/40] vdpa: judge if map can be kept across reset
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (15 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 16/40] vdpa: indicate SVQ switching via flag Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-13  9:51   ` Eugenio Perez Martin
  2024-01-11  8:24   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 18/40] vdpa: unregister listener on last dev cleanup Si-Wei Liu
                   ` (24 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

The descriptor group for SVQ ASID allows the guest memory mapping
to retain across SVQ switching, same as how isolated CVQ can do
with a different ASID than the guest GPA space. Introduce an
evaluation function to judge whether to flush or keep iotlb maps
based on virtqueue's descriptor group and cvq isolation capability.

Have to hook the evaluation function to NetClient's .poll op as
.vhost_reset_status runs ahead of .stop, and .vhost_dev_start
don't have access to the vhost-vdpa net's information.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 04718b2..e9b96ed 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -504,12 +504,36 @@ static int vhost_vdpa_net_load_cleanup(NetClientState *nc, NICState *nic)
                              n->parent_obj.status & VIRTIO_CONFIG_S_DRIVER_OK);
 }
 
+static void vhost_vdpa_net_data_eval_flush(NetClientState *nc, bool stop)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    if (!stop) {
+        return;
+    }
+
+    if (s->vhost_vdpa.index == 0) {
+        if (s->always_svq) {
+            v->shared->flush_map = true;
+        } else if (!v->shared->svq_switching || v->desc_group >= 0) {
+            v->shared->flush_map = false;
+        } else {
+            v->shared->flush_map = true;
+        }
+    } else if (!s->always_svq && v->shared->svq_switching &&
+               v->desc_group < 0) {
+        v->shared->flush_map = true;
+    }
+}
+
 static NetClientInfo net_vhost_vdpa_info = {
         .type = NET_CLIENT_DRIVER_VHOST_VDPA,
         .size = sizeof(VhostVDPAState),
         .receive = vhost_vdpa_receive,
         .start = vhost_vdpa_net_data_start,
         .load = vhost_vdpa_net_data_load,
+        .poll = vhost_vdpa_net_data_eval_flush,
         .stop = vhost_vdpa_net_client_stop,
         .cleanup = vhost_vdpa_cleanup,
         .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
@@ -1368,12 +1392,28 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
     return 0;
 }
 
+static void vhost_vdpa_net_cvq_eval_flush(NetClientState *nc, bool stop)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+
+    if (!stop) {
+        return;
+    }
+
+    if (!v->shared->flush_map && !v->shared->svq_switching &&
+        !s->cvq_isolated && v->desc_group < 0) {
+        v->shared->flush_map = true;
+    }
+}
+
 static NetClientInfo net_vhost_vdpa_cvq_info = {
     .type = NET_CLIENT_DRIVER_VHOST_VDPA,
     .size = sizeof(VhostVDPAState),
     .receive = vhost_vdpa_receive,
     .start = vhost_vdpa_net_cvq_start,
     .load = vhost_vdpa_net_cvq_load,
+    .poll = vhost_vdpa_net_cvq_eval_flush,
     .stop = vhost_vdpa_net_cvq_stop,
     .cleanup = vhost_vdpa_cleanup,
     .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 18/40] vdpa: unregister listener on last dev cleanup
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (16 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 17/40] vdpa: judge if map can be kept across reset Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 17:37   ` Eugenio Perez Martin
  2024-01-11  8:26   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 19/40] vdpa: should avoid map flushing with persistent iotlb Si-Wei Liu
                   ` (23 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

So that the free of iova tree struct can be safely deferred to
until the last vq referencing it goes away.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 4f026db..ea2dfc8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -815,7 +815,10 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     }
 
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
-    memory_listener_unregister(&v->shared->listener);
+    if (vhost_vdpa_last_dev(dev) && v->shared->listener_registered) {
+        memory_listener_unregister(&v->shared->listener);
+        v->shared->listener_registered = false;
+    }
     vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 19/40] vdpa: should avoid map flushing with persistent iotlb
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (17 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 18/40] vdpa: unregister listener on last dev cleanup Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  8:28   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 20/40] vdpa: avoid mapping flush across reset Si-Wei Liu
                   ` (22 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Today memory listener is unregistered in vhost_vdpa_reset_status
unconditionally, due to which all the maps will be flushed away
from the iotlb. However, map flush is not always needed, and
doing it from performance hot path may have innegligible latency
impact that affects VM reboot time or brown out period during
live migration.

Leverage the IOTLB_PERSIST backend featuae, which ensures durable
iotlb maps and not disappearing even across reset. When it is
supported, we may conditionally keep the maps for cases where the
guest memory mapping doesn't change. Prepare a function so that
the next patch will be able to use it to keep the maps.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/trace-events |  1 +
 hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 77905d1..9725d44 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -66,6 +66,7 @@ vhost_vdpa_set_owner(void *dev) "dev: %p"
 vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
 vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
 vhost_vdpa_set_config_call(void *dev, int fd)"dev: %p fd: %d"
+vhost_vdpa_maybe_flush_map(void *dev, bool reg, bool flush, bool persist) "dev: %p registered: %d flush_map: %d iotlb_persistent: %d"
 
 # virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index ea2dfc8..31e0a55 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1471,6 +1471,26 @@ out_stop:
     return ok ? 0 : -1;
 }
 
+static void vhost_vdpa_maybe_flush_map(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    trace_vhost_vdpa_maybe_flush_map(dev, v->shared->listener_registered,
+                                     v->shared->flush_map,
+                                     !!(dev->backend_cap &
+                                     BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)));
+
+    if (!v->shared->listener_registered) {
+        return;
+    }
+
+    if (!(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)) ||
+        v->shared->flush_map) {
+        memory_listener_unregister(&v->shared->listener);
+        v->shared->listener_registered = false;
+    }
+}
+
 static void vhost_vdpa_reset_status(struct vhost_dev *dev)
 {
     struct vhost_vdpa *v = dev->opaque;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 20/40] vdpa: avoid mapping flush across reset
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (18 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 19/40] vdpa: should avoid map flushing with persistent iotlb Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-11  8:30   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename Si-Wei Liu
                   ` (21 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Leverage the IOTLB_PERSIST and DESC_ASID features to achieve
a slightly light weight reset path, without resorting to
suspend and resume. Not as best but it offers significant
time saving too, which should somehow play its role in live
migration down time reduction by large.

It benefits two cases:
  - normal virtio reset in the VM, e.g. guest reboot, don't
    have to tear down all iotlb mapping and set up again.
  - SVQ switching, in which data vq's descriptor table and
    vrings are moved to a different ASID than where its
    buffers reside. Along with the use of persistent iotlb,
    it would save substantial time from pinning and mapping
    unneccessarily when moving descriptors on to or out of
    shadow mode.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 31e0a55..47c764b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -633,6 +633,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
                                      0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
                                      0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
                                      0x1ULL << VHOST_BACKEND_F_DESC_ASID |
+                                     0x1ULL << VHOST_BACKEND_F_IOTLB_PERSIST |
                                      0x1ULL << VHOST_BACKEND_F_SUSPEND;
     int ret;
 
@@ -1493,8 +1494,6 @@ static void vhost_vdpa_maybe_flush_map(struct vhost_dev *dev)
 
 static void vhost_vdpa_reset_status(struct vhost_dev *dev)
 {
-    struct vhost_vdpa *v = dev->opaque;
-
     if (!vhost_vdpa_last_dev(dev)) {
         return;
     }
@@ -1502,9 +1501,7 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
     vhost_vdpa_reset_device(dev);
     vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                VIRTIO_CONFIG_S_DRIVER);
-    memory_listener_unregister(&v->shared->listener);
-    v->shared->listener_registered = false;
-
+    vhost_vdpa_maybe_flush_map(dev);
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (19 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 20/40] vdpa: avoid mapping flush across reset Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  2:40   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 22/40] vdpa: factor out vhost_vdpa_map_batch_begin Si-Wei Liu
                   ` (20 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

No functional changes. Rename only.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 47c764b..013bfa2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -191,7 +191,7 @@ static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
     s->iotlb_batch_begin_sent = true;
 }
 
-static void vhost_vdpa_dma_end_batch(VhostVDPAShared *s)
+static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
 {
     struct vhost_msg_v2 msg = {};
     int fd = s->device_fd;
@@ -229,7 +229,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
 {
     VhostVDPAShared *s = container_of(listener, VhostVDPAShared, listener);
 
-    vhost_vdpa_dma_end_batch(s);
+    vhost_vdpa_dma_batch_end_once(s);
 }
 
 static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
@@ -1367,7 +1367,7 @@ static void *vhost_vdpa_load_map(void *opaque)
             vhost_vdpa_iotlb_batch_begin_once(shared);
             break;
         case VHOST_IOTLB_BATCH_END:
-            vhost_vdpa_dma_end_batch(shared);
+            vhost_vdpa_dma_batch_end_once(shared);
             break;
         default:
             error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 22/40] vdpa: factor out vhost_vdpa_map_batch_begin
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (20 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:02   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 23/40] vdpa: vhost_vdpa_dma_batch_begin_once rename Si-Wei Liu
                   ` (19 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Refactoring only. No functional change.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/trace-events |  2 +-
 hw/virtio/vhost-vdpa.c | 25 ++++++++++++++++---------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 9725d44..b0239b8 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -32,7 +32,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
 # vhost-vdpa.c
 vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
 vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
-vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 013bfa2..7a1b7f4 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -161,7 +161,7 @@ int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
     return ret;
 }
 
-static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
+static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
 {
     int fd = s->device_fd;
     struct vhost_msg_v2 msg = {
@@ -169,26 +169,33 @@ static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
         .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
     };
 
-    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
-        s->iotlb_batch_begin_sent) {
-        return;
-    }
-
     if (s->map_thread_enabled && !qemu_thread_is_self(&s->map_thread)) {
         struct vhost_msg_v2 *new_msg = g_new(struct vhost_msg_v2, 1);
 
         *new_msg = msg;
         g_async_queue_push(s->map_queue, new_msg);
 
-        return;
+        return false;
     }
 
-    trace_vhost_vdpa_listener_begin_batch(s, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_map_batch_begin(s, fd, msg.type, msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
     }
-    s->iotlb_batch_begin_sent = true;
+    return true;
+}
+
+static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
+{
+    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
+        s->iotlb_batch_begin_sent) {
+        return;
+    }
+
+    if (vhost_vdpa_map_batch_begin(s)) {
+        s->iotlb_batch_begin_sent = true;
+    }
 }
 
 static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 23/40] vdpa: vhost_vdpa_dma_batch_begin_once rename
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (21 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 22/40] vdpa: factor out vhost_vdpa_map_batch_begin Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:03   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 24/40] vdpa: factor out vhost_vdpa_dma_batch_end Si-Wei Liu
                   ` (18 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

No functional changes. Rename only.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7a1b7f4..a6c6fe5 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -186,7 +186,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
     return true;
 }
 
-static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
+static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
         s->iotlb_batch_begin_sent) {
@@ -411,7 +411,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
         iova = mem_region.iova;
     }
 
-    vhost_vdpa_iotlb_batch_begin_once(s);
+    vhost_vdpa_dma_batch_begin_once(s);
     ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
                              int128_get64(llsize), vaddr, section->readonly);
     if (ret) {
@@ -493,7 +493,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
         iova = result->iova;
         vhost_iova_tree_remove(s->iova_tree, *result);
     }
-    vhost_vdpa_iotlb_batch_begin_once(s);
+    vhost_vdpa_dma_batch_begin_once(s);
     /*
      * The unmap ioctl doesn't accept a full 64-bit. need to check it
      */
@@ -1371,7 +1371,7 @@ static void *vhost_vdpa_load_map(void *opaque)
                                      msg->iotlb.size);
             break;
         case VHOST_IOTLB_BATCH_BEGIN:
-            vhost_vdpa_iotlb_batch_begin_once(shared);
+            vhost_vdpa_dma_batch_begin_once(shared);
             break;
         case VHOST_IOTLB_BATCH_END:
             vhost_vdpa_dma_batch_end_once(shared);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 24/40] vdpa: factor out vhost_vdpa_dma_batch_end
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (22 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 23/40] vdpa: vhost_vdpa_dma_batch_begin_once rename Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:05   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 25/40] vdpa: add asid to dma_batch_once API Si-Wei Liu
                   ` (17 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Refactoring only. No functional change.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/trace-events |  2 +-
 hw/virtio/vhost-vdpa.c | 30 ++++++++++++++++++------------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index b0239b8..3411a07 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
 vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
 vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
 vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
-vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
 vhost_vdpa_listener_region_del_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index a6c6fe5..999a97a 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -198,19 +198,11 @@ static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
     }
 }
 
-static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
+static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s)
 {
     struct vhost_msg_v2 msg = {};
     int fd = s->device_fd;
 
-    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
-        return;
-    }
-
-    if (!s->iotlb_batch_begin_sent) {
-        return;
-    }
-
     msg.type = VHOST_IOTLB_MSG_V2;
     msg.iotlb.type = VHOST_IOTLB_BATCH_END;
 
@@ -220,16 +212,30 @@ static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
         *new_msg = msg;
         g_async_queue_push(s->map_queue, new_msg);
 
-        return;
+        return false;
     }
 
-    trace_vhost_vdpa_listener_commit(s, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_dma_batch_end(s, fd, msg.type, msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
     }
+    return true;
+}
+
+static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
+{
+    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
+        return;
+    }
+
+    if (!s->iotlb_batch_begin_sent) {
+        return;
+    }
 
-    s->iotlb_batch_begin_sent = false;
+    if (vhost_vdpa_dma_batch_end(s)) {
+        s->iotlb_batch_begin_sent = false;
+    }
 }
 
 static void vhost_vdpa_listener_commit(MemoryListener *listener)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 25/40] vdpa: add asid to dma_batch_once API
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (23 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 24/40] vdpa: factor out vhost_vdpa_dma_batch_end Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-13 15:42   ` Eugenio Perez Martin
  2024-01-15  3:07   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 26/40] vdpa: return int for " Si-Wei Liu
                   ` (16 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

So that DMA batching API can operate on other ASID than 0.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/trace-events |  4 ++--
 hw/virtio/vhost-vdpa.c | 14 ++++++++------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 3411a07..196f32f 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -32,8 +32,8 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
 # vhost-vdpa.c
 vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
 vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
-vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
-vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type, uint32_t asid)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8" asid: %"PRIu32
+vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type, uint32_t asid)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8" asid: %"PRIu32
 vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
 vhost_vdpa_listener_region_del_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 999a97a..2db2832 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -161,11 +161,12 @@ int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
     return ret;
 }
 
-static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
+static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
 {
     int fd = s->device_fd;
     struct vhost_msg_v2 msg = {
         .type = VHOST_IOTLB_MSG_V2,
+        .asid = asid,
         .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
     };
 
@@ -178,7 +179,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
         return false;
     }
 
-    trace_vhost_vdpa_map_batch_begin(s, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_map_batch_begin(s, fd, msg.type, msg.iotlb.type, msg.asid);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -193,17 +194,18 @@ static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
         return;
     }
 
-    if (vhost_vdpa_map_batch_begin(s)) {
+    if (vhost_vdpa_map_batch_begin(s, 0)) {
         s->iotlb_batch_begin_sent = true;
     }
 }
 
-static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s)
+static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
 {
     struct vhost_msg_v2 msg = {};
     int fd = s->device_fd;
 
     msg.type = VHOST_IOTLB_MSG_V2;
+    msg.asid = asid;
     msg.iotlb.type = VHOST_IOTLB_BATCH_END;
 
     if (s->map_thread_enabled && !qemu_thread_is_self(&s->map_thread)) {
@@ -215,7 +217,7 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s)
         return false;
     }
 
-    trace_vhost_vdpa_dma_batch_end(s, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_dma_batch_end(s, fd, msg.type, msg.iotlb.type, msg.asid);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -233,7 +235,7 @@ static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
         return;
     }
 
-    if (vhost_vdpa_dma_batch_end(s)) {
+    if (vhost_vdpa_dma_batch_end(s, 0)) {
         s->iotlb_batch_begin_sent = false;
     }
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 26/40] vdpa: return int for dma_batch_once API
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (24 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 25/40] vdpa: add asid to dma_batch_once API Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 27/40] vdpa: add asid to all dma_batch call sites Si-Wei Liu
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Return zero for success for now. Prepare for non-zero return
in the next few patches.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 2db2832..e0137f0 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -187,16 +187,18 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
     return true;
 }
 
-static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
+static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
         s->iotlb_batch_begin_sent) {
-        return;
+        return 0;
     }
 
     if (vhost_vdpa_map_batch_begin(s, 0)) {
         s->iotlb_batch_begin_sent = true;
     }
+
+    return 0;
 }
 
 static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
@@ -225,19 +227,21 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
     return true;
 }
 
-static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
+static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
-        return;
+        return 0;
     }
 
     if (!s->iotlb_batch_begin_sent) {
-        return;
+        return 0;
     }
 
     if (vhost_vdpa_dma_batch_end(s, 0)) {
         s->iotlb_batch_begin_sent = false;
     }
+
+    return 0;
 }
 
 static void vhost_vdpa_listener_commit(MemoryListener *listener)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 27/40] vdpa: add asid to all dma_batch call sites
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (25 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 26/40] vdpa: return int for " Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-07 17:39 ` [PATCH 28/40] vdpa: support iotlb_batch_asid Si-Wei Liu
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Will allow other callers to specifcy asid when calling the
dma_batch API.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index e0137f0..d3f5721 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -187,14 +187,14 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
     return true;
 }
 
-static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
+static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
         s->iotlb_batch_begin_sent) {
         return 0;
     }
 
-    if (vhost_vdpa_map_batch_begin(s, 0)) {
+    if (vhost_vdpa_map_batch_begin(s, asid)) {
         s->iotlb_batch_begin_sent = true;
     }
 
@@ -227,7 +227,7 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
     return true;
 }
 
-static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
+static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
         return 0;
@@ -237,7 +237,7 @@ static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
         return 0;
     }
 
-    if (vhost_vdpa_dma_batch_end(s, 0)) {
+    if (vhost_vdpa_dma_batch_end(s, asid)) {
         s->iotlb_batch_begin_sent = false;
     }
 
@@ -248,7 +248,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
 {
     VhostVDPAShared *s = container_of(listener, VhostVDPAShared, listener);
 
-    vhost_vdpa_dma_batch_end_once(s);
+    vhost_vdpa_dma_batch_end_once(s, VHOST_VDPA_GUEST_PA_ASID);
 }
 
 static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
@@ -423,7 +423,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
         iova = mem_region.iova;
     }
 
-    vhost_vdpa_dma_batch_begin_once(s);
+    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
     ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
                              int128_get64(llsize), vaddr, section->readonly);
     if (ret) {
@@ -505,7 +505,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
         iova = result->iova;
         vhost_iova_tree_remove(s->iova_tree, *result);
     }
-    vhost_vdpa_dma_batch_begin_once(s);
+    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
     /*
      * The unmap ioctl doesn't accept a full 64-bit. need to check it
      */
@@ -1383,10 +1383,10 @@ static void *vhost_vdpa_load_map(void *opaque)
                                      msg->iotlb.size);
             break;
         case VHOST_IOTLB_BATCH_BEGIN:
-            vhost_vdpa_dma_batch_begin_once(shared);
+            vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
             break;
         case VHOST_IOTLB_BATCH_END:
-            vhost_vdpa_dma_batch_end_once(shared);
+            vhost_vdpa_dma_batch_end_once(shared, msg->asid);
             break;
         default:
             error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 28/40] vdpa: support iotlb_batch_asid
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (26 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 27/40] vdpa: add asid to all dma_batch call sites Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-13 15:42   ` Eugenio Perez Martin
  2024-01-15  3:19   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once Si-Wei Liu
                   ` (13 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Then it's possible to specify ASID when calling the DMA
batching API. If the ASID to work on doesn't align with
the ASID for ongoing transaction, the API will fail the
request and return negative, and the transaction will
remain intact as if no failed request ever had occured.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++++++------
 include/hw/virtio/vhost-vdpa.h |  1 +
 net/vhost-vdpa.c               |  1 +
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d3f5721..b7896a8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -189,15 +189,25 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
 
 static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
 {
-    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
-        s->iotlb_batch_begin_sent) {
+    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
         return 0;
     }
 
-    if (vhost_vdpa_map_batch_begin(s, asid)) {
-        s->iotlb_batch_begin_sent = true;
+    if (s->iotlb_batch_begin_sent && s->iotlb_batch_asid != asid) {
+        return -1;
+    }
+
+    if (s->iotlb_batch_begin_sent) {
+        return 0;
     }
 
+    if (!vhost_vdpa_map_batch_begin(s, asid)) {
+        return 0;
+    }
+
+    s->iotlb_batch_begin_sent = true;
+    s->iotlb_batch_asid = asid;
+
     return 0;
 }
 
@@ -237,10 +247,13 @@ static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
         return 0;
     }
 
-    if (vhost_vdpa_dma_batch_end(s, asid)) {
-        s->iotlb_batch_begin_sent = false;
+    if (!vhost_vdpa_dma_batch_end(s, asid)) {
+        return 0;
     }
 
+    s->iotlb_batch_begin_sent = false;
+    s->iotlb_batch_asid = -1;
+
     return 0;
 }
 
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 0fe0f60..219316f 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -61,6 +61,7 @@ typedef struct vhost_vdpa_shared {
     bool map_thread_enabled;
 
     bool iotlb_batch_begin_sent;
+    uint32_t iotlb_batch_asid;
 
     /*
      * The memory listener has been registered, so DMA maps have been sent to
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index e9b96ed..bc72345 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -1933,6 +1933,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
         s->vhost_vdpa.shared->device_fd = vdpa_device_fd;
         s->vhost_vdpa.shared->iova_range = iova_range;
         s->vhost_vdpa.shared->shadow_data = svq;
+        s->vhost_vdpa.shared->iotlb_batch_asid = -1;
         s->vhost_vdpa.shared->refcnt++;
     } else if (!is_datapath) {
         s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (27 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 28/40] vdpa: support iotlb_batch_asid Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-13 15:42   ` Eugenio Perez Martin
  2024-01-15  3:32   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 30/40] vdpa: batch map/unmap op per svq pair basis Si-Wei Liu
                   ` (12 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

So that the batching API can be called from other file
externally than the local.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c         | 21 +++++++++++++++------
 include/hw/virtio/vhost-vdpa.h |  3 +++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index b7896a8..68dc01b 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -187,7 +187,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
     return true;
 }
 
-static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
+int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
         return 0;
@@ -237,7 +237,7 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
     return true;
 }
 
-static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
+int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
 {
     if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
         return 0;
@@ -436,7 +436,12 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
         iova = mem_region.iova;
     }
 
-    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
+    ret = vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
+    if (unlikely(ret)) {
+        error_report("Can't batch mapping on asid 0 (%p)", s);
+        goto fail_map;
+    }
+
     ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
                              int128_get64(llsize), vaddr, section->readonly);
     if (ret) {
@@ -518,7 +523,11 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
         iova = result->iova;
         vhost_iova_tree_remove(s->iova_tree, *result);
     }
-    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
+    ret = vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
+    if (ret) {
+        error_report("Can't batch mapping on asid 0 (%p)", s);
+    }
+
     /*
      * The unmap ioctl doesn't accept a full 64-bit. need to check it
      */
@@ -1396,10 +1405,10 @@ static void *vhost_vdpa_load_map(void *opaque)
                                      msg->iotlb.size);
             break;
         case VHOST_IOTLB_BATCH_BEGIN:
-            vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
+            r = vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
             break;
         case VHOST_IOTLB_BATCH_END:
-            vhost_vdpa_dma_batch_end_once(shared, msg->asid);
+            r = vhost_vdpa_dma_batch_end_once(shared, msg->asid);
             break;
         default:
             error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 219316f..aa13679 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -106,6 +106,9 @@ int vhost_vdpa_dma_map(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
                        hwaddr size, void *vaddr, bool readonly);
 int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
                          hwaddr size);
+int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid);
+int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid);
+
 int vhost_vdpa_load_setup(VhostVDPAShared *s, AddressSpace *dma_as);
 int vhost_vdpa_load_cleanup(VhostVDPAShared *s, bool vhost_will_start);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 30/40] vdpa: batch map/unmap op per svq pair basis
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (28 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:33   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 31/40] vdpa: batch map and unmap around cvq svq start/stop Si-Wei Liu
                   ` (11 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Coalesce multiple map or unmap operations to just one
so that all mapping setup or teardown can occur in a
single DMA batch.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 68dc01b..d98704a 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1288,6 +1288,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
         return true;
     }
 
+    vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
     for (i = 0; i < v->shadow_vqs->len; ++i) {
         VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -1315,6 +1316,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
             goto err_set_addr;
         }
     }
+    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
 
     return true;
 
@@ -1323,6 +1325,7 @@ err_set_addr:
 
 err_map:
     vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, i));
+    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
 
 err:
     error_reportf_err(err, "Cannot setup SVQ %u: ", i);
@@ -1343,6 +1346,7 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
         return;
     }
 
+    vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
     for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
 
@@ -1352,6 +1356,7 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
         event_notifier_cleanup(&svq->hdev_kick);
         event_notifier_cleanup(&svq->hdev_call);
     }
+    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
 }
 
 static void vhost_vdpa_suspend(struct vhost_dev *dev)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 31/40] vdpa: batch map and unmap around cvq svq start/stop
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (29 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 30/40] vdpa: batch map/unmap op per svq pair basis Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:34   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 32/40] vdpa: factor out vhost_vdpa_net_get_nc_vdpa Si-Wei Liu
                   ` (10 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Coalesce map or unmap operations to exact one DMA
batch to reduce potential impact on performance.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index bc72345..1c1d61f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -715,10 +715,11 @@ out:
                                                    v->shared->iova_range.last);
     }
 
+    vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
     r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
                                vhost_vdpa_net_cvq_cmd_page_len(), false);
     if (unlikely(r < 0)) {
-        return r;
+        goto err;
     }
 
     r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->status,
@@ -727,18 +728,23 @@ out:
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
     }
 
+err:
+    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
     return r;
 }
 
 static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
     if (s->vhost_vdpa.shadow_vqs_enabled) {
+        vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
         vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
+        vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
     }
 
     vhost_vdpa_net_client_stop(nc);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 32/40] vdpa: factor out vhost_vdpa_net_get_nc_vdpa
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (30 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 31/40] vdpa: batch map and unmap around cvq svq start/stop Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:35   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop Si-Wei Liu
                   ` (9 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Introduce new API. No functional change on existing API.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/vhost-vdpa.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1c1d61f..683619f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -290,13 +290,18 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
 }
 
 
-/** From any vdpa net client, get the netclient of the first queue pair */
-static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
+/** From any vdpa net client, get the netclient of the i-th queue pair */
+static VhostVDPAState *vhost_vdpa_net_get_nc_vdpa(VhostVDPAState *s, int i)
 {
     NICState *nic = qemu_get_nic(s->nc.peer);
-    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
+    NetClientState *nc_i = qemu_get_peer(nic->ncs, i);
+
+    return DO_UPCAST(VhostVDPAState, nc, nc_i);
+}
 
-    return DO_UPCAST(VhostVDPAState, nc, nc0);
+static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
+{
+    return vhost_vdpa_net_get_nc_vdpa(s, 0);
 }
 
 static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (31 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 32/40] vdpa: factor out vhost_vdpa_net_get_nc_vdpa Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-13 16:46   ` Eugenio Perez Martin
  2024-01-15  3:47   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 34/40] vdpa: fix network breakage after cancelling migration Si-Wei Liu
                   ` (8 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Should help live migration downtime on source host. Below are the
coalesced dma_unmap time series on 2 queue pair config (no
dedicated descriptor group ASID for SVQ).

109531@1693367276.853503:vhost_vdpa_reset_device dev: 0x55c933926890
109531@1693367276.853513:vhost_vdpa_add_status dev: 0x55c933926890 status: 0x3
109531@1693367276.853520:vhost_vdpa_flush_map dev: 0x55c933926890 doit: 1 svq_flush: 0 persist: 1
109531@1693367276.853524:vhost_vdpa_set_config_call dev: 0x55c933926890 fd: -1
109531@1693367276.853579:vhost_vdpa_iotlb_begin_batch vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 type: 5
109531@1693367276.853586:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 type: 3
109531@1693367276.853600:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 type: 3
109531@1693367276.853618:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x4000 size: 0x2000 type: 3
109531@1693367276.853625:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x6000 size: 0x1000 type: 3
109531@1693367276.853630:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0x7000 size: 0x2000 type: 3
109531@1693367276.853636:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0x9000 size: 0x1000 type: 3
109531@1693367276.853642:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0xa000 size: 0x2000 type: 3
109531@1693367276.853648:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0xc000 size: 0x1000 type: 3
109531@1693367276.853654:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xf000 size: 0x1000 type: 3
109531@1693367276.853660:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0x10000 size: 0x1000 type: 3
109531@1693367276.853666:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xd000 size: 0x1000 type: 3
109531@1693367276.853670:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xe000 size: 0x1000 type: 3
109531@1693367276.853675:vhost_vdpa_iotlb_end_batch vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 type: 6
109531@1693367277.014697:vhost_vdpa_get_vq_index dev: 0x55c933925de0 idx: 0 vq idx: 0
109531@1693367277.014747:vhost_vdpa_get_vq_index dev: 0x55c933925de0 idx: 1 vq idx: 1
109531@1693367277.014753:vhost_vdpa_get_vq_index dev: 0x55c9339262e0 idx: 2 vq idx: 2
109531@1693367277.014756:vhost_vdpa_get_vq_index dev: 0x55c9339262e0 idx: 3 vq idx: 3

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c         |   7 +--
 include/hw/virtio/vhost-vdpa.h |   3 ++
 net/vhost-vdpa.c               | 112 +++++++++++++++++++++++++++--------------
 3 files changed, 80 insertions(+), 42 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d98704a..4010fd9 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1162,8 +1162,8 @@ static void vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr addr)
     vhost_iova_tree_remove(v->shared->iova_tree, *result);
 }
 
-static void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
-                                       const VhostShadowVirtqueue *svq)
+void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
+                                const VhostShadowVirtqueue *svq)
 {
     struct vhost_vdpa *v = dev->opaque;
     struct vhost_vring_addr svq_addr;
@@ -1346,17 +1346,14 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
         return;
     }
 
-    vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
     for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
 
         vhost_svq_stop(svq);
-        vhost_vdpa_svq_unmap_rings(dev, svq);
 
         event_notifier_cleanup(&svq->hdev_kick);
         event_notifier_cleanup(&svq->hdev_call);
     }
-    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
 }
 
 static void vhost_vdpa_suspend(struct vhost_dev *dev)
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index aa13679..f426e2c 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -112,6 +112,9 @@ int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid);
 int vhost_vdpa_load_setup(VhostVDPAShared *s, AddressSpace *dma_as);
 int vhost_vdpa_load_cleanup(VhostVDPAShared *s, bool vhost_will_start);
 
+void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
+                                const VhostShadowVirtqueue *svq);
+
 typedef struct vdpa_iommu {
     VhostVDPAShared *dev_shared;
     IOMMUMemoryRegion *iommu_mr;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 683619f..41714d1 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -29,6 +29,7 @@
 #include "migration/migration.h"
 #include "migration/misc.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-vdpa.h"
 
 /* Todo:need to add the multiqueue support here */
 typedef struct VhostVDPAState {
@@ -467,15 +468,89 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
     return 0;
 }
 
+static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
+{
+    VhostIOVATree *tree = v->shared->iova_tree;
+    DMAMap needle = {
+        /*
+         * No need to specify size or to look for more translations since
+         * this contiguous chunk was allocated by us.
+         */
+        .translated_addr = (hwaddr)(uintptr_t)addr,
+    };
+    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
+    int r;
+
+    if (unlikely(!map)) {
+        error_report("Cannot locate expected map");
+        return;
+    }
+
+    r = vhost_vdpa_dma_unmap(v->shared, v->address_space_id, map->iova,
+                             map->size + 1);
+    if (unlikely(r != 0)) {
+        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
+    }
+
+    vhost_iova_tree_remove(tree, *map);
+}
+
 static void vhost_vdpa_net_client_stop(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+    struct vhost_vdpa *last_vi = NULL;
+    bool has_cvq = v->dev->vq_index_end % 2;
+    int nvqp;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
     if (s->vhost_vdpa.index == 0) {
         migration_remove_notifier(&s->migration_state);
     }
+
+    if (v->dev->vq_index + v->dev->nvqs != v->dev->vq_index_end) {
+        return;
+    }
+
+    nvqp = (v->dev->vq_index_end + 1) / 2;
+    for (int i = 0; i < nvqp; ++i) {
+        VhostVDPAState *s_i = vhost_vdpa_net_get_nc_vdpa(s, i);
+        struct vhost_vdpa *v_i = &s_i->vhost_vdpa;
+
+        if (!v_i->shadow_vqs_enabled) {
+            continue;
+        }
+        if (!last_vi) {
+            vhost_vdpa_dma_batch_begin_once(v_i->shared,
+                                            v_i->address_space_id);
+            last_vi = v_i;
+        } else if (last_vi->address_space_id != v_i->address_space_id) {
+            vhost_vdpa_dma_batch_end_once(last_vi->shared,
+                                          last_vi->address_space_id);
+            vhost_vdpa_dma_batch_begin_once(v_i->shared,
+                                            v_i->address_space_id);
+            last_vi = v_i;
+        }
+
+        for (unsigned j = 0; j < v_i->shadow_vqs->len; ++j) {
+            VhostShadowVirtqueue *svq = g_ptr_array_index(v_i->shadow_vqs, j);
+
+            vhost_vdpa_svq_unmap_rings(v_i->dev, svq);
+        }
+    }
+    if (has_cvq) {
+        if (last_vi) {
+            assert(last_vi->address_space_id == v->address_space_id);
+        }
+        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
+        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
+    }
+    if (last_vi) {
+        vhost_vdpa_dma_batch_end_once(last_vi->shared,
+                                      last_vi->address_space_id);
+        last_vi = NULL;
+    }
 }
 
 static int vhost_vdpa_net_load_setup(NetClientState *nc, NICState *nic)
@@ -585,33 +660,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
     return state.num;
 }
 
-static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
-{
-    VhostIOVATree *tree = v->shared->iova_tree;
-    DMAMap needle = {
-        /*
-         * No need to specify size or to look for more translations since
-         * this contiguous chunk was allocated by us.
-         */
-        .translated_addr = (hwaddr)(uintptr_t)addr,
-    };
-    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
-    int r;
-
-    if (unlikely(!map)) {
-        error_report("Cannot locate expected map");
-        return;
-    }
-
-    r = vhost_vdpa_dma_unmap(v->shared, v->address_space_id, map->iova,
-                             map->size + 1);
-    if (unlikely(r != 0)) {
-        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
-    }
-
-    vhost_iova_tree_remove(tree, *map);
-}
-
 /** Map CVQ buffer. */
 static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
                                   bool write)
@@ -740,18 +788,8 @@ err:
 
 static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
 {
-    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
-    struct vhost_vdpa *v = &s->vhost_vdpa;
-
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
-    if (s->vhost_vdpa.shadow_vqs_enabled) {
-        vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
-        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
-        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
-        vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
-    }
-
     vhost_vdpa_net_client_stop(nc);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 34/40] vdpa: fix network breakage after cancelling migration
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (32 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:48   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace Si-Wei Liu
                   ` (7 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

Fix an issue where cancellation of ongoing migration ends up
with no network connectivity.

When canceling migration, SVQ will be switched back to the
passthrough mode, but the right call fd is not programed to
the device and the svq's own call fd is still used. At the
point of this transitioning period, the shadow_vqs_enabled
hadn't been set back to false yet, causing the installation
of call fd inadvertently bypassed.

Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
Cc: Eugenio Pérez <eperezma@redhat.com>

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 4010fd9..8ba390d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1647,7 +1647,12 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
 
     /* Remember last call fd because we can switch to SVQ anytime. */
     vhost_svq_set_svq_call_fd(svq, file->fd);
-    if (v->shadow_vqs_enabled) {
+    /*
+     * In the event of SVQ switching to off, shadow_vqs_enabled has
+     * not been set to false yet, but the underlying call fd will
+     * switch back to the guest notifier for passthrough VQs.
+     */
+    if (v->shadow_vqs_enabled && v->shared->svq_switching >= 0) {
         return 0;
     }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (33 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 34/40] vdpa: fix network breakage after cancelling migration Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 18:13   ` Eugenio Perez Martin
  2024-01-15  3:50   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode Si-Wei Liu
                   ` (6 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

For better debuggability and observability.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/trace-events | 3 +++
 net/vhost-vdpa.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/net/trace-events b/net/trace-events
index 823a071..aab666a 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -23,3 +23,6 @@ colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int hdlen, in
 # filter-rewriter.c
 colo_filter_rewriter_pkt_info(const char *func, const char *src, const char *dst, uint32_t seq, uint32_t ack, uint32_t flag) "%s: src/dst: %s/%s p: seq/ack=%u/%u  flags=0x%x"
 colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
+
+# vhost-vdpa.c
+vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 41714d1..84876b0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -30,6 +30,7 @@
 #include "migration/misc.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-vdpa.h"
+#include "trace.h"
 
 /* Todo:need to add the multiqueue support here */
 typedef struct VhostVDPAState {
@@ -365,6 +366,8 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
     };
     int r;
 
+    trace_vhost_vdpa_set_address_space_id(v, vq_group, asid_num);
+
     r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
     if (unlikely(r < 0)) {
         error_report("Can't set vq group %u asid %u, errno=%d (%s)",
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (34 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 18:14   ` Eugenio Perez Martin
  2024-01-15  3:52   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base " Si-Wei Liu
                   ` (5 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

For better debuggability and observability.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/trace-events | 2 +-
 hw/virtio/vhost-vdpa.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 196f32f..a8d3321 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -58,7 +58,7 @@ vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int r
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
 vhost_vdpa_set_vring_num(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
 vhost_vdpa_set_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
-vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
+vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"
 vhost_vdpa_set_vring_kick(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
 vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
 vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8ba390d..d66936f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1607,6 +1607,7 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
 
     if (v->shadow_vqs_enabled) {
         ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
+        trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num, true);
         return 0;
     }
 
@@ -1619,7 +1620,7 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
     }
 
     ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
-    trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
+    trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num, false);
     return ret;
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base trace for svq mode
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (35 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 18:14   ` Eugenio Perez Martin
  2024-01-15  3:53   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 38/40] vdpa: add trace events for eval_flush Si-Wei Liu
                   ` (4 subsequent siblings)
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

For better debuggability and observability.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 hw/virtio/trace-events | 2 +-
 hw/virtio/vhost-vdpa.c | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a8d3321..5085607 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -57,7 +57,7 @@ vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
 vhost_vdpa_set_vring_num(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
-vhost_vdpa_set_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
+vhost_vdpa_set_dev_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"
 vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"
 vhost_vdpa_set_vring_kick(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
 vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d66936f..ff4f218 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1043,7 +1043,10 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
 static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
                                          struct vhost_vring_state *ring)
 {
-    trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
+    struct vhost_vdpa *v = dev->opaque;
+
+    trace_vhost_vdpa_set_dev_vring_base(dev, ring->index, ring->num,
+                                        v->shadow_vqs_enabled);
     return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
 }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 38/40] vdpa: add trace events for eval_flush
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (36 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base " Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2024-01-15  3:57   ` Jason Wang
  2023-12-07 17:39 ` [PATCH 39/40] vdpa: add trace events for vhost_vdpa_net_load_cmd Si-Wei Liu
                   ` (3 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

For better debuggability and observability.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/trace-events | 2 ++
 net/vhost-vdpa.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/net/trace-events b/net/trace-events
index aab666a..d650c71 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -26,3 +26,5 @@ colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
 
 # vhost-vdpa.c
 vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"
+vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
+vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 84876b0..a0bd8cd 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -608,6 +608,9 @@ static void vhost_vdpa_net_data_eval_flush(NetClientState *nc, bool stop)
                v->desc_group < 0) {
         v->shared->flush_map = true;
     }
+    trace_vhost_vdpa_net_data_eval_flush(v, s->vhost_vdpa.index,
+                                        v->shared->svq_switching,
+                                        v->shared->flush_map);
 }
 
 static NetClientInfo net_vhost_vdpa_info = {
@@ -1457,6 +1460,10 @@ static void vhost_vdpa_net_cvq_eval_flush(NetClientState *nc, bool stop)
         !s->cvq_isolated && v->desc_group < 0) {
         v->shared->flush_map = true;
     }
+
+    trace_vhost_vdpa_net_cvq_eval_flush(v, s->vhost_vdpa.index,
+                                       v->shared->svq_switching,
+                                       v->shared->flush_map);
 }
 
 static NetClientInfo net_vhost_vdpa_cvq_info = {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 39/40] vdpa: add trace events for vhost_vdpa_net_load_cmd
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (37 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 38/40] vdpa: add trace events for eval_flush Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 18:14   ` Eugenio Perez Martin
  2023-12-07 17:39 ` [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq Si-Wei Liu
                   ` (2 subsequent siblings)
  41 siblings, 1 reply; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

For better debuggability and observability.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/trace-events | 2 ++
 net/vhost-vdpa.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/trace-events b/net/trace-events
index d650c71..be087e6 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -28,3 +28,5 @@ colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
 vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"
 vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
 vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
+vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
+vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a0bd8cd..61da8b4 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -885,6 +885,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
 
     assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
     cmd_size = sizeof(ctrl) + data_size;
+    trace_vhost_vdpa_net_load_cmd(s, class, cmd, data_num, data_size);
     if (vhost_svq_available_slots(svq) < 2 ||
         iov_size(out_cursor, 1) < cmd_size) {
         /*
@@ -916,6 +917,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
 
     r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1);
     if (unlikely(r < 0)) {
+        trace_vhost_vdpa_net_load_cmd_retval(s, class, cmd, r);
         return r;
     }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (38 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 39/40] vdpa: add trace events for vhost_vdpa_net_load_cmd Si-Wei Liu
@ 2023-12-07 17:39 ` Si-Wei Liu
  2023-12-11 18:15   ` Eugenio Perez Martin
  2024-01-15  3:58   ` Jason Wang
  2023-12-11 18:39 ` [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Eugenio Perez Martin
  2024-01-11  8:21 ` Jason Wang
  41 siblings, 2 replies; 102+ messages in thread
From: Si-Wei Liu @ 2023-12-07 17:39 UTC (permalink / raw)
  To: eperezma, jasowang, mst, dtatulea, leiyang, yin31149,
	boris.ostrovsky, jonah.palmer
  Cc: qemu-devel

For better debuggability and observability.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
---
 net/trace-events | 1 +
 net/vhost-vdpa.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/net/trace-events b/net/trace-events
index be087e6..c128cc4 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -30,3 +30,4 @@ vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flu
 vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
 vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
 vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
+vhost_vdpa_net_load_mq(void *s, int ncurqps) "vdpa state: %p current_qpairs: %d"
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 61da8b4..17b8d01 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -1109,6 +1109,8 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
         return 0;
     }
 
+    trace_vhost_vdpa_net_load_mq(s, n->curr_queue_pairs);
+
     mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
     const struct iovec data = {
         .iov_base = &mq,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h
  2023-12-07 17:39 ` [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h Si-Wei Liu
@ 2023-12-11  7:47   ` Eugenio Perez Martin
  2024-01-11  3:32   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11  7:47 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

This should be updated from scripts/update-linux-headers.sh.

> ---
>  include/standard-headers/linux/vhost_types.h | 13 +++++++++++++
>  linux-headers/linux/vhost.h                  |  9 +++++++++
>  2 files changed, 22 insertions(+)
>
> diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
> index 5ad07e1..c39199b 100644
> --- a/include/standard-headers/linux/vhost_types.h
> +++ b/include/standard-headers/linux/vhost_types.h
> @@ -185,5 +185,18 @@ struct vhost_vdpa_iova_range {
>   * DRIVER_OK
>   */
>  #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
> +/* Device can be resumed */
> +#define VHOST_BACKEND_F_RESUME  0x5
> +/* Device supports the driver enabling virtqueues both before and after
> + * DRIVER_OK
> + */
> +#define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
> +/* Device may expose the virtqueue's descriptor area, driver area and
> + * device area to a different group for ASID binding than where its
> + * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
> + */
> +#define VHOST_BACKEND_F_DESC_ASID    0x7
> +/* IOTLB don't flush memory mapping across device reset */
> +#define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
>
>  #endif
> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> index f5c48b6..c61c687 100644
> --- a/linux-headers/linux/vhost.h
> +++ b/linux-headers/linux/vhost.h
> @@ -219,4 +219,13 @@
>   */
>  #define VHOST_VDPA_RESUME              _IO(VHOST_VIRTIO, 0x7E)
>
> +/* Get the dedicated group for the descriptor table of a virtqueue:
> + * read index, write group in num.
> + * The virtqueue index is stored in the index field of vhost_vring_state.
> + * The group id for the descriptor table of this specific virtqueue
> + * is returned via num field of vhost_vring_state.
> + */
> +#define VHOST_VDPA_GET_VRING_DESC_GROUP        _IOWR(VHOST_VIRTIO, 0x7F,       \
> +                                             struct vhost_vring_state)
> +
>  #endif
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init
  2023-12-07 17:39 ` [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init Si-Wei Liu
@ 2023-12-11 10:46   ` Eugenio Perez Martin
  2023-12-11 11:01     ` Eugenio Perez Martin
  2024-01-11  7:09   ` Jason Wang
  1 sibling, 1 reply; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 10:46 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Add the desc_group field to struct vhost_vdpa, and get it
> populated when the corresponding vq is initialized at
> net_vhost_vdpa_init. If the vq does not have descriptor
> group capability, or it doesn't have a dedicated ASID
> group to host descriptors other than the data buffers,
> desc_group will be set to a negative value -1.
>

We should use a defined constant. As always, I don't have a good name
though :). DESC_GROUP_SAME_AS_BUFFERS_GROUP?

> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  include/hw/virtio/vhost-vdpa.h |  1 +
>  net/vhost-vdpa.c               | 15 +++++++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 6533ad2..63493ff 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -87,6 +87,7 @@ typedef struct vhost_vdpa {
>      Error *migration_blocker;
>      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>      IOMMUNotifier n;
> +    int64_t desc_group;
>  } VhostVDPA;
>
>  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index cb5705d..1a738b2 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1855,11 +1855,22 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>
>      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>      if (ret) {
> -        qemu_del_net_client(nc);
> -        return NULL;
> +        goto err;
>      }
>
> +    if (is_datapath) {
> +        ret = vhost_vdpa_probe_desc_group(vdpa_device_fd, features,
> +                                          0, &desc_group, errp);
> +        if (unlikely(ret < 0)) {
> +            goto err;
> +        }
> +    }
> +    s->vhost_vdpa.desc_group = desc_group;

Why not do the probe at the same time as the CVQ isolation probe? It
would save all the effort to restore the previous device status, not
to mention not needed to initialize and reset the device so many times
for the probing. The error unwinding is not needed here that way.

I think the most controversial part is how to know the right vring
group. When I sent the CVQ probe, I delegated that to the device
startup and we decide it would be weird to have CVQ isolated only in
the MQ case but not in the SQ case. I think we could do the same here
for the sake of making the series simpler: just checking the actual
isolation of vring descriptor group, and then move to save the actual
vring group at initialization if it saves significant time.

Does it make sense to you?

Thanks!

>      return nc;
> +
> +err:
> +    qemu_del_net_client(nc);
> +    return NULL;
>  }
>
>  static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init
  2023-12-11 10:46   ` Eugenio Perez Martin
@ 2023-12-11 11:01     ` Eugenio Perez Martin
  0 siblings, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 11:01 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Mon, Dec 11, 2023 at 11:46 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> > Add the desc_group field to struct vhost_vdpa, and get it
> > populated when the corresponding vq is initialized at
> > net_vhost_vdpa_init. If the vq does not have descriptor
> > group capability, or it doesn't have a dedicated ASID
> > group to host descriptors other than the data buffers,
> > desc_group will be set to a negative value -1.
> >
>
> We should use a defined constant. As always, I don't have a good name
> though :). DESC_GROUP_SAME_AS_BUFFERS_GROUP?
>
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > ---
> >  include/hw/virtio/vhost-vdpa.h |  1 +
> >  net/vhost-vdpa.c               | 15 +++++++++++++--
> >  2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 6533ad2..63493ff 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -87,6 +87,7 @@ typedef struct vhost_vdpa {
> >      Error *migration_blocker;
> >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >      IOMMUNotifier n;
> > +    int64_t desc_group;
> >  } VhostVDPA;
> >
> >  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index cb5705d..1a738b2 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -1855,11 +1855,22 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >      if (ret) {
> > -        qemu_del_net_client(nc);
> > -        return NULL;
> > +        goto err;
> >      }
> >
> > +    if (is_datapath) {
> > +        ret = vhost_vdpa_probe_desc_group(vdpa_device_fd, features,
> > +                                          0, &desc_group, errp);

Also, it is always checking for the vring group of the first virtqueue
of the vdpa parent device, isn't it? The 3rd parameter should be
queue_pair_index*2.

Even with queue_pair_index*2., we're also assuming tx queue will have
the same vring group as rx. While I think this is a valid assumption,
maybe it is better to probe it at initialization and act as if the
device does not have VHOST_BACKEND_F_DESC_ASID if we find otherwise?

Thanks!


> > +        if (unlikely(ret < 0)) {
> > +            goto err;
> > +        }
> > +    }
> > +    s->vhost_vdpa.desc_group = desc_group;
>
> Why not do the probe at the same time as the CVQ isolation probe? It
> would save all the effort to restore the previous device status, not
> to mention not needed to initialize and reset the device so many times
> for the probing. The error unwinding is not needed here that way.
>
> I think the most controversial part is how to know the right vring
> group. When I sent the CVQ probe, I delegated that to the device
> startup and we decide it would be weird to have CVQ isolated only in
> the MQ case but not in the SQ case. I think we could do the same here
> for the sake of making the series simpler: just checking the actual
> isolation of vring descriptor group, and then move to save the actual
> vring group at initialization if it saves significant time.
>
> Does it make sense to you?
>
> Thanks!
>
> >      return nc;
> > +
> > +err:
> > +    qemu_del_net_client(nc);
> > +    return NULL;
> >  }
> >
> >  static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
> > --
> > 1.8.3.1
> >



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 06/40] vhost: make svq work with gpa without iova translation
  2023-12-07 17:39 ` [PATCH 06/40] vhost: make svq work with gpa without iova translation Si-Wei Liu
@ 2023-12-11 11:17   ` Eugenio Perez Martin
  2024-01-11  7:31   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 11:17 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Make vhost_svq_vring_write_descs able to work with GPA directly
> without going through iova tree for translation. This will be
> needed in the next few patches where the SVQ has dedicated
> address space to host its virtqueues. Instead of having to
> translate qemu's VA to IOVA via the iova tree, with dedicated
> or isolated address space for SVQ descriptors, the IOVA is
> exactly same as the guest GPA space where translation would
> not be needed any more.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 35 +++++++++++++++++++++++------------
>  1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fc5f408..97ccd45 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -136,8 +136,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
>   * Return true if success, false otherwise and print error.
>   */
>  static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> -                                        const struct iovec *iovec, size_t num,
> -                                        bool more_descs, bool write)
> +                                        const struct iovec *iovec, hwaddr *addr,
> +                                        size_t num, bool more_descs, bool write)
>  {
>      uint16_t i = svq->free_head, last = svq->free_head;
>      unsigned n;
> @@ -149,8 +149,15 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>          return true;
>      }
>
> -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> -    if (unlikely(!ok)) {
> +    if (svq->iova_tree) {
> +        ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> +        if (unlikely(!ok)) {
> +            return false;
> +        }
> +    } else if (!addr) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "No translation found for vaddr 0x%p\n",
> +                      iovec[0].iov_base);
>          return false;
>      }
>
> @@ -161,7 +168,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>          } else {
>              descs[i].flags = flags;
>          }
> -        descs[i].addr = cpu_to_le64(sg[n]);
> +        descs[i].addr = cpu_to_le64(svq->iova_tree ? sg[n] : addr[n]);
>          descs[i].len = cpu_to_le32(iovec[n].iov_len);
>
>          last = i;
> @@ -173,9 +180,10 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>  }
>
>  static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -                                const struct iovec *out_sg, size_t out_num,
> -                                const struct iovec *in_sg, size_t in_num,
> -                                unsigned *head)
> +                                const struct iovec *out_sg, hwaddr *out_addr,
> +                                size_t out_num,
> +                                const struct iovec *in_sg, hwaddr *in_addr,
> +                                size_t in_num, unsigned *head)
>  {
>      unsigned avail_idx;
>      vring_avail_t *avail = svq->vring.avail;
> @@ -191,13 +199,14 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>          return false;
>      }
>
> -    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
> -                                     false);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_addr, out_num,
> +                                     in_num > 0, false);
>      if (unlikely(!ok)) {
>          return false;
>      }
>
> -    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_addr, in_num,
> +                                     false, true);
>      if (unlikely(!ok)) {
>          return false;
>      }
> @@ -258,7 +267,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>          return -ENOSPC;
>      }
>
> -    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
> +    ok = vhost_svq_add_split(svq, out_sg, elem ? elem->out_addr : NULL,
> +                             out_num, in_sg, elem ? elem->in_addr : NULL,
> +                             in_num, &qemu_head);

This function is using in_sg and out_sg intentionally as CVQ buffers
do not use VirtQueueElement addressing. You can check calls at
net/vhost-vdpa.c for more info. The right place for this change is
actually vhost_svq_add_element, and I suggest checking for
svq->iova_tree as the rest of the patch.

Apart from that,

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

Thanks!

>      if (unlikely(!ok)) {
>          return -EINVAL;
>      }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id
  2023-12-07 17:39 ` [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id Si-Wei Liu
@ 2023-12-11 11:18   ` Eugenio Perez Martin
  2024-01-11  7:33   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 11:18 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Move it a few lines ahead to make function call easier for those
> before it.  No funtional change involved.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  net/vhost-vdpa.c | 36 ++++++++++++++++++------------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1a738b2..dbfa192 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -335,6 +335,24 @@ static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
>      }
>  }
>
> +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> +                                           unsigned vq_group,
> +                                           unsigned asid_num)
> +{
> +    struct vhost_vring_state asid = {
> +        .index = vq_group,
> +        .num = asid_num,
> +    };
> +    int r;
> +
> +    r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> +    if (unlikely(r < 0)) {
> +        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> +                     asid.index, asid.num, errno, g_strerror(errno));
> +    }
> +    return r;
> +}
> +
>  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>  {
>      struct vhost_vdpa *v = &s->vhost_vdpa;
> @@ -490,24 +508,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
>      return state.num;
>  }
>
> -static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> -                                           unsigned vq_group,
> -                                           unsigned asid_num)
> -{
> -    struct vhost_vring_state asid = {
> -        .index = vq_group,
> -        .num = asid_num,
> -    };
> -    int r;
> -
> -    r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> -    if (unlikely(r < 0)) {
> -        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> -                     asid.index, asid.num, errno, g_strerror(errno));
> -    }
> -    return r;
> -}
> -
>  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
>  {
>      VhostIOVATree *tree = v->shared->iova_tree;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa
  2023-12-07 17:39 ` [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa Si-Wei Liu
@ 2023-12-11 11:19   ` Eugenio Perez Martin
  2024-01-11  7:37   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 11:19 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:52 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Previous commits had it removed. Now adding it back because
> this function will be needed by next patches.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index dbfa192..c9bfc6f 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -287,6 +287,16 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
>      return size;
>  }
>
> +

Extra newline.

> +/** From any vdpa net client, get the netclient of the first queue pair */
> +static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> +{
> +    NICState *nic = qemu_get_nic(s->nc.peer);
> +    NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
> +
> +    return DO_UPCAST(VhostVDPAState, nc, nc0);
> +}
> +
>  static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
>  {
>      struct vhost_vdpa *v = &s->vhost_vdpa;
> @@ -566,7 +576,7 @@ dma_map_err:
>
>  static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>  {
> -    VhostVDPAState *s;
> +    VhostVDPAState *s, *s0;
>      struct vhost_vdpa *v;
>      int64_t cvq_group;
>      int r;
> @@ -577,7 +587,8 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>      s = DO_UPCAST(VhostVDPAState, nc, nc);
>      v = &s->vhost_vdpa;
>
> -    v->shadow_vqs_enabled = v->shared->shadow_data;
> +    s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +    v->shadow_vqs_enabled = s0->vhost_vdpa.shadow_vqs_enabled;
>      s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;

I'm wondering if shared->shadow_data is more correct now.

Either way:

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

Thanks!

>
>      if (v->shared->shadow_data) {
> --
> 1.8.3.1
>
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 09/40] vdpa: no repeat setting shadow_data
  2023-12-07 17:39 ` [PATCH 09/40] vdpa: no repeat setting shadow_data Si-Wei Liu
@ 2023-12-11 11:21   ` Eugenio Perez Martin
  2024-01-11  7:34   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 11:21 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Since shadow_data is now shared in the parent data struct, it
> just needs to be set only once by the first vq. This change
> will make shadow_data independent of svq enabled state, which
> can be optionally turned off when SVQ descritors and device

descri *p* tors typo.

> driver areas are all isolated to a separate address space.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Can you send this separately so we make this series smaller?

Apart from the typo,

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

Thanks!

> ---
>  net/vhost-vdpa.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index c9bfc6f..2555897 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -387,13 +387,12 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>      if (s->always_svq ||
>          migration_is_setup_or_active(migrate_get_current()->state)) {
>          v->shadow_vqs_enabled = true;
> -        v->shared->shadow_data = true;
>      } else {
>          v->shadow_vqs_enabled = false;
> -        v->shared->shadow_data = false;
>      }
>
>      if (v->index == 0) {
> +        v->shared->shadow_data = v->shadow_vqs_enabled;
>          vhost_vdpa_net_data_start_first(s);
>          return 0;
>      }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible
  2023-12-07 17:39 ` [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible Si-Wei Liu
@ 2023-12-11 13:35   ` Eugenio Perez Martin
  2024-01-11  8:02   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 13:35 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> When backend supports the VHOST_BACKEND_F_DESC_ASID feature
> and all the data vqs can support one or more descriptor group
> to host SVQ vrings and descriptors, we assign them a different
> ASID than where its buffers reside in guest memory address
> space. With this dedicated ASID for SVQs, the IOVA for what
> vdpa device may care effectively becomes the GPA, thus there's
> no need to translate IOVA address. For this reason, shadow_data
> can be turned off accordingly. It doesn't mean the SVQ is not
> enabled, but just that the translation is not needed from iova
> tree perspective.
>
> We can reuse CVQ's address space ID to host SVQ descriptors
> because both CVQ and SVQ are emulated in the same QEMU
> process, which will share the same VA address space.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c |  5 ++++-
>  net/vhost-vdpa.c       | 57 ++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 24844b5..30dff95 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -627,6 +627,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>      uint64_t qemu_backend_features = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
>                                       0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
>                                       0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
> +                                     0x1ULL << VHOST_BACKEND_F_DESC_ASID |
>                                       0x1ULL << VHOST_BACKEND_F_SUSPEND;
>      int ret;
>
> @@ -1249,7 +1250,9 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
>              goto err;
>          }
>
> -        vhost_svq_start(svq, dev->vdev, vq, v->shared->iova_tree);
> +        vhost_svq_start(svq, dev->vdev, vq,
> +                        v->desc_group >= 0 && v->address_space_id ?

v->address_space_id != VHOST_VDPA_GUEST_PA_ASID?

> +                        NULL : v->shared->iova_tree);
>          ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err);
>          if (unlikely(!ok)) {
>              goto err_map;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 2555897..aebaa53 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -366,20 +366,50 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
>  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>  {
>      struct vhost_vdpa *v = &s->vhost_vdpa;
> +    int r;
>
>      migration_add_notifier(&s->migration_state,
>                             vdpa_net_migration_state_notifier);
>
> +    if (!v->shadow_vqs_enabled) {

&& VHOST_BACKEND_F_DESC_ASID feature is acked?

> +        if (v->desc_group >= 0 &&
> +            v->address_space_id != VHOST_VDPA_GUEST_PA_ASID) {
> +            vhost_vdpa_set_address_space_id(v, v->desc_group,
> +                                            VHOST_VDPA_GUEST_PA_ASID);
> +            s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> +        }
> +        return;
> +    }
> +
>      /* iova_tree may be initialized by vhost_vdpa_net_load_setup */
> -    if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
> +    if (!v->shared->iova_tree) {
>          v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
>                                                     v->shared->iova_range.last);
>      }

Maybe not so popular opinion, but I would add a previous patch modifying:
if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
    iova_tree = new()
}
---

to:
if (!v->shadow_vqs_enabled) {
  return
}

if (!v->shared->iova_tree) {
    iova_tree = new()
}
---

> +
> +    if (s->always_svq || v->desc_group < 0) {
> +        return;
> +    }
> +
> +    r = vhost_vdpa_set_address_space_id(v, v->desc_group,
> +                                        VHOST_VDPA_NET_CVQ_ASID);
> +    if (unlikely(r < 0)) {
> +        /* The other data vqs should also fall back to using the same ASID */
> +        s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> +        return;
> +    }
> +
> +    /* No translation needed on data SVQ when descriptor group is used */
> +    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;

I'm not sure "address_space_id" is a good name for this member
anymore. If any, I think we can add a comment explaining that it only
applies to descs vring if VHOST_BACKEND_F_DESC_ASID is acked and all
the needed conditions are met.

Also, maybe it is better to define a new constant
VHOST_VDPA_NET_VRING_DESCS_ASID, set it to VHOST_VDPA_NET_CVQ_ASID,
and explain why it is ok to reuse that ASID?

> +    s->vhost_vdpa.shared->shadow_data = false;
> +    return;
>  }
>
>  static int vhost_vdpa_net_data_start(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +
>      struct vhost_vdpa *v = &s->vhost_vdpa;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> @@ -397,6 +427,18 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>          return 0;
>      }
>
> +    if (v->desc_group >= 0 && v->desc_group != s0->vhost_vdpa.desc_group) {
> +        unsigned asid;
> +        asid = v->shadow_vqs_enabled ?
> +            s0->vhost_vdpa.address_space_id : VHOST_VDPA_GUEST_PA_ASID;
> +        if (asid != s->vhost_vdpa.address_space_id) {
> +            vhost_vdpa_set_address_space_id(v, v->desc_group, asid);
> +        }
> +        s->vhost_vdpa.address_space_id = asid;
> +    } else {
> +        s->vhost_vdpa.address_space_id = s0->vhost_vdpa.address_space_id;
> +    }
> +

Ok, here I see how all vqs are configured so I think some of my
previous comments are not 100% valid.

However I think we can improve this, as this omits the case where two
vrings different from s0 vring have the same vring descriptor group.
But I guess we can always optimize on top.

>      return 0;
>  }
>
> @@ -603,13 +645,19 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>          return 0;
>      }
>
> -    if (!s->cvq_isolated) {
> +    if (!s->cvq_isolated && v->desc_group < 0) {
> +        if (s0->vhost_vdpa.shadow_vqs_enabled &&
> +            s0->vhost_vdpa.desc_group >= 0 &&
> +            s0->vhost_vdpa.address_space_id) {
> +            v->shadow_vqs_enabled = false;
> +        }
>          return 0;
>      }
>
> -    cvq_group = vhost_vdpa_get_vring_group(v->shared->device_fd,
> +    cvq_group = s->cvq_isolated ?
> +                vhost_vdpa_get_vring_group(v->shared->device_fd,
>                                             v->dev->vq_index_end - 1,
> -                                           &err);
> +                                           &err) : v->desc_group;

I'm not sure if we can happily set v->desc_group if !s->cvq_isolated.
If CVQ buffers share its group with data queues, but its vring is
effectively isolated, we are setting all the dataplane buffers to the
ASID of the CVQ descriptors, isn't it?

>      if (unlikely(cvq_group < 0)) {
>          error_report_err(err);
>          return cvq_group;
> @@ -1840,6 +1888,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      s->always_svq = svq;
>      s->migration_state.notify = NULL;
>      s->vhost_vdpa.shadow_vqs_enabled = svq;
> +    s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;

Isn't this overridden at each vhost_vdpa_net_*_start?


>      if (queue_pair_index == 0) {
>          vhost_vdpa_net_valid_svq_features(features,
>                                            &s->vhost_vdpa.migration_blocker);
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev
  2023-12-07 17:39 ` [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev Si-Wei Liu
@ 2023-12-11 13:36   ` Eugenio Perez Martin
  2024-01-11  8:03   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 13:36 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Generalize duplicated condition check for the last vq of vdpa
> device to a common function.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

We can also send this separately,

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  hw/virtio/vhost-vdpa.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 30dff95..2b1cc14 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -593,6 +593,11 @@ static bool vhost_vdpa_first_dev(struct vhost_dev *dev)
>      return v->index == 0;
>  }
>
> +static bool vhost_vdpa_last_dev(struct vhost_dev *dev)
> +{
> +    return dev->vq_index + dev->nvqs == dev->vq_index_end;
> +}
> +
>  static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
>                                         uint64_t *features)
>  {
> @@ -1432,7 +1437,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          goto out_stop;
>      }
>
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +    if (!vhost_vdpa_last_dev(dev)) {
>          return 0;
>      }
>
> @@ -1467,7 +1472,7 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
>  {
>      struct vhost_vdpa *v = dev->opaque;
>
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +    if (!vhost_vdpa_last_dev(dev)) {
>          return;
>      }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 14/40] vdpa: convert iova_tree to ref count based
  2023-12-07 17:39 ` [PATCH 14/40] vdpa: convert iova_tree to ref count based Si-Wei Liu
@ 2023-12-11 17:21   ` Eugenio Perez Martin
  2024-01-11  8:15   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 17:21 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that it can be freed from vhost_vdpa_cleanup on
> the last deref. The next few patches will try to
> make iova tree life cycle not depend on memory
> listener, and there's possiblity to keep iova tree
> around when memory mapping is not changed across
> device reset.
>

Title and commit description does not match with the patch, I guess it
is because the reference count was at iova_tree some time in the past
but you decided to move to VhostVDPAShared.

But this code should be merged with previous patches, because we have
an asymmetry here and some bug will arise if the guest reset the
device: allocating at device start, but freeing at cleanup instead of
stop.

> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a126e5c..7b8f047 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -238,6 +238,8 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
>      }
>      if (--s->vhost_vdpa.shared->refcnt == 0) {
>          qemu_close(s->vhost_vdpa.shared->device_fd);
> +        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
> +                        vhost_iova_tree_delete);
>          g_free(s->vhost_vdpa.shared);
>      }
>      s->vhost_vdpa.shared = NULL;
> @@ -461,19 +463,12 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>  static void vhost_vdpa_net_client_stop(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_dev *dev;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
>      if (s->vhost_vdpa.index == 0) {
>          migration_remove_notifier(&s->migration_state);
>      }
> -
> -    dev = s->vhost_vdpa.dev;
> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> -        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
> -                        vhost_iova_tree_delete);
> -    }
>  }
>
>  static int vhost_vdpa_net_load_setup(NetClientState *nc, NICState *nic)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 18/40] vdpa: unregister listener on last dev cleanup
  2023-12-07 17:39 ` [PATCH 18/40] vdpa: unregister listener on last dev cleanup Si-Wei Liu
@ 2023-12-11 17:37   ` Eugenio Perez Martin
  2024-01-11  8:26   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 17:37 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that the free of iova tree struct can be safely deferred to
> until the last vq referencing it goes away.
>

I think this patch message went out of sync too.

> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 4f026db..ea2dfc8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -815,7 +815,10 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>      }
>
>      vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> -    memory_listener_unregister(&v->shared->listener);
> +    if (vhost_vdpa_last_dev(dev) && v->shared->listener_registered) {
> +        memory_listener_unregister(&v->shared->listener);
> +        v->shared->listener_registered = false;
> +    }

I think this version is more correct, but it should not matter as the
device cleanup implies the device will not be used anymore, isn't it?
Or am I missing something?

>      vhost_vdpa_svq_cleanup(dev);
>
>      dev->opaque = NULL;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace
  2023-12-07 17:39 ` [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace Si-Wei Liu
@ 2023-12-11 18:13   ` Eugenio Perez Martin
  2024-01-15  3:50   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:13 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  net/trace-events | 3 +++
>  net/vhost-vdpa.c | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/net/trace-events b/net/trace-events
> index 823a071..aab666a 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -23,3 +23,6 @@ colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int hdlen, in
>  # filter-rewriter.c
>  colo_filter_rewriter_pkt_info(const char *func, const char *src, const char *dst, uint32_t seq, uint32_t ack, uint32_t flag) "%s: src/dst: %s/%s p: seq/ack=%u/%u  flags=0x%x"
>  colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
> +
> +# vhost-vdpa.c
> +vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 41714d1..84876b0 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -30,6 +30,7 @@
>  #include "migration/misc.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-vdpa.h"
> +#include "trace.h"
>
>  /* Todo:need to add the multiqueue support here */
>  typedef struct VhostVDPAState {
> @@ -365,6 +366,8 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
>      };
>      int r;
>
> +    trace_vhost_vdpa_set_address_space_id(v, vq_group, asid_num);
> +
>      r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
>      if (unlikely(r < 0)) {
>          error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode
  2023-12-07 17:39 ` [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode Si-Wei Liu
@ 2023-12-11 18:14   ` Eugenio Perez Martin
  2024-01-15  3:52   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:14 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  hw/virtio/trace-events | 2 +-
>  hw/virtio/vhost-vdpa.c | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 196f32f..a8d3321 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -58,7 +58,7 @@ vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int r
>  vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
>  vhost_vdpa_set_vring_num(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
>  vhost_vdpa_set_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> -vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> +vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"
>  vhost_vdpa_set_vring_kick(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
>  vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
>  vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 8ba390d..d66936f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1607,6 +1607,7 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>
>      if (v->shadow_vqs_enabled) {
>          ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
> +        trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num, true);
>          return 0;
>      }
>
> @@ -1619,7 +1620,7 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>      }
>
>      ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> -    trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
> +    trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num, false);
>      return ret;
>  }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base trace for svq mode
  2023-12-07 17:39 ` [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base " Si-Wei Liu
@ 2023-12-11 18:14   ` Eugenio Perez Martin
  2024-01-15  3:53   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:14 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  hw/virtio/trace-events | 2 +-
>  hw/virtio/vhost-vdpa.c | 5 ++++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index a8d3321..5085607 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -57,7 +57,7 @@ vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>  vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>  vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
>  vhost_vdpa_set_vring_num(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> -vhost_vdpa_set_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> +vhost_vdpa_set_dev_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"
>  vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"
>  vhost_vdpa_set_vring_kick(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
>  vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index: %u fd: %d"
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index d66936f..ff4f218 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1043,7 +1043,10 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>  static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
>                                           struct vhost_vring_state *ring)
>  {
> -    trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    trace_vhost_vdpa_set_dev_vring_base(dev, ring->index, ring->num,
> +                                        v->shadow_vqs_enabled);
>      return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
>  }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 39/40] vdpa: add trace events for vhost_vdpa_net_load_cmd
  2023-12-07 17:39 ` [PATCH 39/40] vdpa: add trace events for vhost_vdpa_net_load_cmd Si-Wei Liu
@ 2023-12-11 18:14   ` Eugenio Perez Martin
  0 siblings, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:14 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  net/trace-events | 2 ++
>  net/vhost-vdpa.c | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/net/trace-events b/net/trace-events
> index d650c71..be087e6 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -28,3 +28,5 @@ colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
>  vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"
>  vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
>  vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
> +vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
> +vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a0bd8cd..61da8b4 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -885,6 +885,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
>
>      assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
>      cmd_size = sizeof(ctrl) + data_size;
> +    trace_vhost_vdpa_net_load_cmd(s, class, cmd, data_num, data_size);
>      if (vhost_svq_available_slots(svq) < 2 ||
>          iov_size(out_cursor, 1) < cmd_size) {
>          /*
> @@ -916,6 +917,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s,
>
>      r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1);
>      if (unlikely(r < 0)) {
> +        trace_vhost_vdpa_net_load_cmd_retval(s, class, cmd, r);
>          return r;
>      }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq
  2023-12-07 17:39 ` [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq Si-Wei Liu
@ 2023-12-11 18:15   ` Eugenio Perez Martin
  2024-01-15  3:58   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:15 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  net/trace-events | 1 +
>  net/vhost-vdpa.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/net/trace-events b/net/trace-events
> index be087e6..c128cc4 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -30,3 +30,4 @@ vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flu
>  vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
>  vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
>  vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
> +vhost_vdpa_net_load_mq(void *s, int ncurqps) "vdpa state: %p current_qpairs: %d"
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 61da8b4..17b8d01 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1109,6 +1109,8 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
>          return 0;
>      }
>
> +    trace_vhost_vdpa_net_load_mq(s, n->curr_queue_pairs);
> +
>      mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
>      const struct iovec data = {
>          .iov_base = &mq,
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (39 preceding siblings ...)
  2023-12-07 17:39 ` [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq Si-Wei Liu
@ 2023-12-11 18:39 ` Eugenio Perez Martin
  2024-01-11  8:21 ` Jason Wang
  41 siblings, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:39 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> This patch series contain several enhancements to SVQ live migration downtime
> for vDPA-net hardware device, specifically on mlx5_vdpa. Currently it is based
> off of Eugenio's RFC v2 .load_setup series [1] to utilize the shared facility
> and reduce frictions in merging or duplicating code if at all possible.
>
> It's stacked up in particular order as below, as the optimization for one on
> the top has to depend on others on the bottom. Here's a breakdown for what
> each part does respectively:
>
> Patch #  |          Feature / optimization
> ---------V-------------------------------------------------------------------
> 35 - 40  | trace events
> 34       | migrate_cancel bug fix
> 21 - 33  | (Un)map batching at stop-n-copy to further optimize LM down time
> 11 - 20  | persistent IOTLB [3] to improve LM down time
> 02 - 10  | SVQ descriptor ASID [2] to optimize SVQ switching
> 01       | dependent linux headers
>          V
>

Hi Si-Wei,

Thanks for the series, I think it contains great additions for the
live migration solution!

It is pretty large though. Do you think it would be feasible to split
out the fixes and the tracing patches in a separated series? That
would allow the reviews to focus on the downtime reduction. I think I
acked all of them.

Maybe we can even create a third series for the vring asid? I think we
should note an increase on performance using svq so it justifies by
itself too.

> Let's first define 2 sources of downtime that this work is concerned with:
>
> * SVQ switching downtime (Downtime #1): downtime at the start of migration.
>   Time spent on teardown and setup for SVQ mode switching, and this downtime
>   is regarded as the maxium time for an individual vdpa-net device.
>   No memory transfer is involved during SVQ switching, hence no .
>
> * LM downtime (Downtime #2): aggregated downtime for all vdpa-net devices on
>   resource teardown and setup in the last stop-n-copy phase on source host.
>
> With each part of the optimizations applied bottom up, the effective outcome
> in terms of down time (in seconds) performance can be observed in this table:
>
>
>                     |    Downtime #1    |    Downtime #2
> --------------------+-------------------+-------------------
> Baseline QEMU       |     20s ~ 30s     |        20s
>                     |                   |
> Iterative map       |                   |
> at destination[1]   |        5s         |        20s
>                     |                   |
> SVQ descriptor      |                   |
>     ASID [2]        |        2s         |         5s
>                     |                   |
>                     |                   |
> persistent IOTLB    |        2s         |         2s
>       [3]           |                   |
>                     |                   |
> (Un)map batching    |                   |
> at stop-n-copy      |      1.7s         |       1.5s
> before switchover   |                   |
>
> (VM config: 128GB mem, 2 mlx5_vdpa devices, each w/ 4 data vqs)
>

Thanks for all the profiling, it looks promising!

> Please find the details regarding each enhancement on the commit log.
>
> Thanks,
> -Siwei
>
>
> [1] [RFC PATCH v2 00/10] Map memory at destination .load_setup in vDPA-net migration
> https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05711.html
> [2] VHOST_BACKEND_F_DESC_ASID
> https://lore.kernel.org/virtualization/20231018171456.1624030-2-dtatulea@nvidia.com/
> [3] VHOST_BACKEND_F_IOTLB_PERSIST
> https://lore.kernel.org/virtualization/1698304480-18463-1-git-send-email-si-wei.liu@oracle.com/
>
> ---
>
> Si-Wei Liu (40):
>   linux-headers: add vhost_types.h and vhost.h
>   vdpa: add vhost_vdpa_get_vring_desc_group
>   vdpa: probe descriptor group index for data vqs
>   vdpa: piggyback desc_group index when probing isolated cvq
>   vdpa: populate desc_group from net_vhost_vdpa_init
>   vhost: make svq work with gpa without iova translation
>   vdpa: move around vhost_vdpa_set_address_space_id
>   vdpa: add back vhost_vdpa_net_first_nc_vdpa
>   vdpa: no repeat setting shadow_data
>   vdpa: assign svq descriptors a separate ASID when possible
>   vdpa: factor out vhost_vdpa_last_dev
>   vdpa: check map_thread_enabled before join maps thread
>   vdpa: ref counting VhostVDPAShared
>   vdpa: convert iova_tree to ref count based
>   vdpa: add svq_switching and flush_map to header
>   vdpa: indicate SVQ switching via flag
>   vdpa: judge if map can be kept across reset
>   vdpa: unregister listener on last dev cleanup
>   vdpa: should avoid map flushing with persistent iotlb
>   vdpa: avoid mapping flush across reset
>   vdpa: vhost_vdpa_dma_batch_end_once rename
>   vdpa: factor out vhost_vdpa_map_batch_begin
>   vdpa: vhost_vdpa_dma_batch_begin_once rename
>   vdpa: factor out vhost_vdpa_dma_batch_end
>   vdpa: add asid to dma_batch_once API
>   vdpa: return int for dma_batch_once API
>   vdpa: add asid to all dma_batch call sites
>   vdpa: support iotlb_batch_asid
>   vdpa: expose API vhost_vdpa_dma_batch_once
>   vdpa: batch map/unmap op per svq pair basis
>   vdpa: batch map and unmap around cvq svq start/stop
>   vdpa: factor out vhost_vdpa_net_get_nc_vdpa
>   vdpa: batch multiple dma_unmap to a single call for vm stop
>   vdpa: fix network breakage after cancelling migration
>   vdpa: add vhost_vdpa_set_address_space_id trace
>   vdpa: add vhost_vdpa_get_vring_base trace for svq mode
>   vdpa: add vhost_vdpa_set_dev_vring_base trace for svq mode
>   vdpa: add trace events for eval_flush
>   vdpa: add trace events for vhost_vdpa_net_load_cmd
>   vdpa: add trace event for vhost_vdpa_net_load_mq
>
>  hw/virtio/trace-events                       |   9 +-
>  hw/virtio/vhost-shadow-virtqueue.c           |  35 ++-
>  hw/virtio/vhost-vdpa.c                       | 156 +++++++---
>  include/hw/virtio/vhost-vdpa.h               |  16 +
>  include/standard-headers/linux/vhost_types.h |  13 +
>  linux-headers/linux/vhost.h                  |   9 +
>  net/trace-events                             |   8 +
>  net/vhost-vdpa.c                             | 434 ++++++++++++++++++++++-----
>  8 files changed, 558 insertions(+), 122 deletions(-)
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 03/40] vdpa: probe descriptor group index for data vqs
  2023-12-07 17:39 ` [PATCH 03/40] vdpa: probe descriptor group index for data vqs Si-Wei Liu
@ 2023-12-11 18:49   ` Eugenio Perez Martin
  2024-01-11  4:02   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-11 18:49 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:53 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Getting it ahead at initialization time instead of start time allows
> decision making independent of device status, while reducing failure
> possibility in starting device or during migration.
>
> Adding function vhost_vdpa_probe_desc_group() for that end. This
> function will be used to probe the descriptor group for data vqs.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 887c329..0cf3147 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1688,6 +1688,95 @@ out:
>      return r;
>  }
>
> +static int vhost_vdpa_probe_desc_group(int device_fd, uint64_t features,
> +                                       int vq_index, int64_t *desc_grpidx,
> +                                       Error **errp)
> +{
> +    uint64_t backend_features;
> +    int64_t vq_group, desc_group;
> +    uint8_t saved_status = 0;
> +    uint8_t status = 0;
> +    int r;
> +
> +    ERRP_GUARD();
> +
> +    r = ioctl(device_fd, VHOST_GET_BACKEND_FEATURES, &backend_features);
> +    if (unlikely(r < 0)) {
> +        error_setg_errno(errp, errno, "Cannot get vdpa backend_features");
> +        return r;
> +    }
> +
> +    if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> +        return 0;
> +    }
> +
> +    if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID))) {
> +        return 0;
> +    }
> +
> +    r = ioctl(device_fd, VHOST_VDPA_GET_STATUS, &saved_status);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "Cannot get device status");
> +        goto out;

Nit, we could return here directly, can't we?

> +    }
> +
> +    r = ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "Cannot reset device");
> +        goto out;
> +    }
> +
> +    r = ioctl(device_fd, VHOST_SET_FEATURES, &features);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, errno, "Cannot set features");

missing goto out?

> +    }
> +
> +    status = VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +             VIRTIO_CONFIG_S_DRIVER |
> +             VIRTIO_CONFIG_S_FEATURES_OK;
> +
> +    r = ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "Cannot set device status");
> +        goto out;
> +    }
> +
> +    vq_group = vhost_vdpa_get_vring_group(device_fd, vq_index, errp);
> +    if (unlikely(vq_group < 0)) {
> +        if (vq_group != -ENOTSUP) {
> +            r = vq_group;
> +            goto out;
> +        }
> +
> +        /*
> +         * The kernel report VHOST_BACKEND_F_IOTLB_ASID if the vdpa frontend
> +         * support ASID even if the parent driver does not.
> +         */
> +        error_free(*errp);
> +        *errp = NULL;
> +        r = 0;
> +        goto out;
> +    }
> +
> +    desc_group = vhost_vdpa_get_vring_desc_group(device_fd, vq_index,
> +                                                 errp);
> +    if (unlikely(desc_group < 0)) {
> +        r = desc_group;
> +        goto out;
> +    } else if (desc_group != vq_group) {
> +        *desc_grpidx = desc_group;
> +    }
> +    r = 1;
> +
> +out:
> +    status = 0;
> +    ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
> +    if (saved_status) {
> +        ioctl(device_fd, VHOST_VDPA_SET_STATUS, &saved_status);
> +    }
> +    return r;
> +}
> +

It is invalid to add static functions without a caller, I think the
compiler will complain about this.

>  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                         const char *device,
>                                         const char *name,
> --
> 1.8.3.1
>
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 17/40] vdpa: judge if map can be kept across reset
  2023-12-07 17:39 ` [PATCH 17/40] vdpa: judge if map can be kept across reset Si-Wei Liu
@ 2023-12-13  9:51   ` Eugenio Perez Martin
  2024-01-11  8:24   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-13  9:51 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> The descriptor group for SVQ ASID allows the guest memory mapping
> to retain across SVQ switching, same as how isolated CVQ can do
> with a different ASID than the guest GPA space. Introduce an
> evaluation function to judge whether to flush or keep iotlb maps
> based on virtqueue's descriptor group and cvq isolation capability.
>
> Have to hook the evaluation function to NetClient's .poll op as
> .vhost_reset_status runs ahead of .stop, and .vhost_dev_start
> don't have access to the vhost-vdpa net's information.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 04718b2..e9b96ed 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -504,12 +504,36 @@ static int vhost_vdpa_net_load_cleanup(NetClientState *nc, NICState *nic)
>                               n->parent_obj.status & VIRTIO_CONFIG_S_DRIVER_OK);
>  }
>
> +static void vhost_vdpa_net_data_eval_flush(NetClientState *nc, bool stop)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (!stop) {
> +        return;
> +    }
> +
> +    if (s->vhost_vdpa.index == 0) {
> +        if (s->always_svq) {
> +            v->shared->flush_map = true;

Why do we need to reset the map in the case of always_svq?

> +        } else if (!v->shared->svq_switching || v->desc_group >= 0) {
> +            v->shared->flush_map = false;
> +        } else {
> +            v->shared->flush_map = true;
> +        }
> +    } else if (!s->always_svq && v->shared->svq_switching &&
> +               v->desc_group < 0) {
> +        v->shared->flush_map = true;
> +    }
> +}
> +

I'm wondering, since we have the reference count for the memory
listener already, why not adding one refcnt if _start detect it can
keep the memory maps?

>  static NetClientInfo net_vhost_vdpa_info = {
>          .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>          .size = sizeof(VhostVDPAState),
>          .receive = vhost_vdpa_receive,
>          .start = vhost_vdpa_net_data_start,
>          .load = vhost_vdpa_net_data_load,
> +        .poll = vhost_vdpa_net_data_eval_flush,
>          .stop = vhost_vdpa_net_client_stop,
>          .cleanup = vhost_vdpa_cleanup,
>          .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> @@ -1368,12 +1392,28 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
>      return 0;
>  }
>
> +static void vhost_vdpa_net_cvq_eval_flush(NetClientState *nc, bool stop)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (!stop) {
> +        return;
> +    }
> +
> +    if (!v->shared->flush_map && !v->shared->svq_switching &&
> +        !s->cvq_isolated && v->desc_group < 0) {
> +        v->shared->flush_map = true;
> +    }
> +}
> +
>  static NetClientInfo net_vhost_vdpa_cvq_info = {
>      .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>      .size = sizeof(VhostVDPAState),
>      .receive = vhost_vdpa_receive,
>      .start = vhost_vdpa_net_cvq_start,
>      .load = vhost_vdpa_net_cvq_load,
> +    .poll = vhost_vdpa_net_cvq_eval_flush,
>      .stop = vhost_vdpa_net_cvq_stop,
>      .cleanup = vhost_vdpa_cleanup,
>      .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 25/40] vdpa: add asid to dma_batch_once API
  2023-12-07 17:39 ` [PATCH 25/40] vdpa: add asid to dma_batch_once API Si-Wei Liu
@ 2023-12-13 15:42   ` Eugenio Perez Martin
  2024-01-15  3:07   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-13 15:42 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that DMA batching API can operate on other ASID than 0.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/trace-events |  4 ++--
>  hw/virtio/vhost-vdpa.c | 14 ++++++++------
>  2 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 3411a07..196f32f 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -32,8 +32,8 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>  # vhost-vdpa.c
>  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> -vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> -vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type, uint32_t asid)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8" asid: %"PRIu32
> +vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type, uint32_t asid)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8" asid: %"PRIu32
>  vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
>  vhost_vdpa_listener_region_del_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 999a97a..2db2832 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -161,11 +161,12 @@ int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>      return ret;
>  }
>
> -static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
> +static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
>  {
>      int fd = s->device_fd;
>      struct vhost_msg_v2 msg = {
>          .type = VHOST_IOTLB_MSG_V2,
> +        .asid = asid,
>          .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
>      };
>
> @@ -178,7 +179,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
>          return false;
>      }
>
> -    trace_vhost_vdpa_map_batch_begin(s, fd, msg.type, msg.iotlb.type);
> +    trace_vhost_vdpa_map_batch_begin(s, fd, msg.type, msg.iotlb.type, msg.asid);
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
>                       fd, errno, strerror(errno));
> @@ -193,17 +194,18 @@ static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
>          return;
>      }
>
> -    if (vhost_vdpa_map_batch_begin(s)) {
> +    if (vhost_vdpa_map_batch_begin(s, 0)) {
>          s->iotlb_batch_begin_sent = true;
>      }
>  }
>
> -static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s)
> +static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)

Maybe adding the asid parameter is not needed? We already have it in
s->asid by the end of the series, and kernel will also complain if
wrong asid is send.

Actually, dma_map and dma_unmap have the asid parameter because maps
out of batch, but I think there are no IOTLB operation out of batch by
the end of the series, isn't it?

Thanks!

>  {
>      struct vhost_msg_v2 msg = {};
>      int fd = s->device_fd;
>
>      msg.type = VHOST_IOTLB_MSG_V2;
> +    msg.asid = asid;
>      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
>
>      if (s->map_thread_enabled && !qemu_thread_is_self(&s->map_thread)) {
> @@ -215,7 +217,7 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s)
>          return false;
>      }
>
> -    trace_vhost_vdpa_dma_batch_end(s, fd, msg.type, msg.iotlb.type);
> +    trace_vhost_vdpa_dma_batch_end(s, fd, msg.type, msg.iotlb.type, msg.asid);
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
>                       fd, errno, strerror(errno));
> @@ -233,7 +235,7 @@ static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
>          return;
>      }
>
> -    if (vhost_vdpa_dma_batch_end(s)) {
> +    if (vhost_vdpa_dma_batch_end(s, 0)) {
>          s->iotlb_batch_begin_sent = false;
>      }
>  }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 28/40] vdpa: support iotlb_batch_asid
  2023-12-07 17:39 ` [PATCH 28/40] vdpa: support iotlb_batch_asid Si-Wei Liu
@ 2023-12-13 15:42   ` Eugenio Perez Martin
  2024-01-15  3:19   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-13 15:42 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Then it's possible to specify ASID when calling the DMA
> batching API. If the ASID to work on doesn't align with
> the ASID for ongoing transaction, the API will fail the
> request and return negative, and the transaction will
> remain intact as if no failed request ever had occured.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++++++------
>  include/hw/virtio/vhost-vdpa.h |  1 +
>  net/vhost-vdpa.c               |  1 +
>  3 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index d3f5721..b7896a8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -189,15 +189,25 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
>
>  static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
>  {
> -    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
> -        s->iotlb_batch_begin_sent) {
> +    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
>          return 0;
>      }
>
> -    if (vhost_vdpa_map_batch_begin(s, asid)) {
> -        s->iotlb_batch_begin_sent = true;
> +    if (s->iotlb_batch_begin_sent && s->iotlb_batch_asid != asid) {
> +        return -1;
> +    }
> +
> +    if (s->iotlb_batch_begin_sent) {
> +        return 0;
>      }
>
> +    if (!vhost_vdpa_map_batch_begin(s, asid)) {
> +        return 0;
> +    }
> +
> +    s->iotlb_batch_begin_sent = true;
> +    s->iotlb_batch_asid = asid;
> +
>      return 0;
>  }
>
> @@ -237,10 +247,13 @@ static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
>          return 0;
>      }
>
> -    if (vhost_vdpa_dma_batch_end(s, asid)) {
> -        s->iotlb_batch_begin_sent = false;
> +    if (!vhost_vdpa_dma_batch_end(s, asid)) {
> +        return 0;
>      }
>
> +    s->iotlb_batch_begin_sent = false;
> +    s->iotlb_batch_asid = -1;

If we define -1 as "not in batch", iotlb_batch_begin_sent is
redundant. Can we "#define IOTLB_NOT_IN_BATCH -1" and remove
iotlb_batch_begin_sent?

> +
>      return 0;
>  }
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 0fe0f60..219316f 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -61,6 +61,7 @@ typedef struct vhost_vdpa_shared {
>      bool map_thread_enabled;
>
>      bool iotlb_batch_begin_sent;
> +    uint32_t iotlb_batch_asid;
>
>      /*
>       * The memory listener has been registered, so DMA maps have been sent to
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index e9b96ed..bc72345 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1933,6 +1933,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>          s->vhost_vdpa.shared->device_fd = vdpa_device_fd;
>          s->vhost_vdpa.shared->iova_range = iova_range;
>          s->vhost_vdpa.shared->shadow_data = svq;
> +        s->vhost_vdpa.shared->iotlb_batch_asid = -1;
>          s->vhost_vdpa.shared->refcnt++;
>      } else if (!is_datapath) {
>          s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once
  2023-12-07 17:39 ` [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once Si-Wei Liu
@ 2023-12-13 15:42   ` Eugenio Perez Martin
  2024-01-15  3:32   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-13 15:42 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that the batching API can be called from other file
> externally than the local.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c         | 21 +++++++++++++++------
>  include/hw/virtio/vhost-vdpa.h |  3 +++
>  2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index b7896a8..68dc01b 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -187,7 +187,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
>      return true;
>  }
>
> -static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
> +int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
>  {
>      if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
>          return 0;
> @@ -237,7 +237,7 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
>      return true;
>  }
>
> -static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
> +int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
>  {
>      if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
>          return 0;
> @@ -436,7 +436,12 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>          iova = mem_region.iova;
>      }
>
> -    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    ret = vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    if (unlikely(ret)) {
> +        error_report("Can't batch mapping on asid 0 (%p)", s);

Can we move this error to vhost_vdpa_dma_batch_begin_once?

That way we avoid duplicating the error message later in the patch and
we can tell the expected ASID.

> +        goto fail_map;
> +    }
> +
>      ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
>                               int128_get64(llsize), vaddr, section->readonly);
>      if (ret) {
> @@ -518,7 +523,11 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>          iova = result->iova;
>          vhost_iova_tree_remove(s->iova_tree, *result);
>      }
> -    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    ret = vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    if (ret) {
> +        error_report("Can't batch mapping on asid 0 (%p)", s);
> +    }
> +
>      /*
>       * The unmap ioctl doesn't accept a full 64-bit. need to check it
>       */
> @@ -1396,10 +1405,10 @@ static void *vhost_vdpa_load_map(void *opaque)
>                                       msg->iotlb.size);
>              break;
>          case VHOST_IOTLB_BATCH_BEGIN:
> -            vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
> +            r = vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
>              break;
>          case VHOST_IOTLB_BATCH_END:
> -            vhost_vdpa_dma_batch_end_once(shared, msg->asid);
> +            r = vhost_vdpa_dma_batch_end_once(shared, msg->asid);
>              break;
>          default:
>              error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 219316f..aa13679 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -106,6 +106,9 @@ int vhost_vdpa_dma_map(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>                         hwaddr size, void *vaddr, bool readonly);
>  int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>                           hwaddr size);
> +int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid);
> +int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid);
> +
>  int vhost_vdpa_load_setup(VhostVDPAShared *s, AddressSpace *dma_as);
>  int vhost_vdpa_load_cleanup(VhostVDPAShared *s, bool vhost_will_start);
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop
  2023-12-07 17:39 ` [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop Si-Wei Liu
@ 2023-12-13 16:46   ` Eugenio Perez Martin
  2024-01-15  3:47   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Eugenio Perez Martin @ 2023-12-13 16:46 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: jasowang, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Thu, Dec 7, 2023 at 7:51 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Should help live migration downtime on source host. Below are the
> coalesced dma_unmap time series on 2 queue pair config (no
> dedicated descriptor group ASID for SVQ).
>
> 109531@1693367276.853503:vhost_vdpa_reset_device dev: 0x55c933926890
> 109531@1693367276.853513:vhost_vdpa_add_status dev: 0x55c933926890 status: 0x3
> 109531@1693367276.853520:vhost_vdpa_flush_map dev: 0x55c933926890 doit: 1 svq_flush: 0 persist: 1
> 109531@1693367276.853524:vhost_vdpa_set_config_call dev: 0x55c933926890 fd: -1
> 109531@1693367276.853579:vhost_vdpa_iotlb_begin_batch vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 type: 5
> 109531@1693367276.853586:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 type: 3
> 109531@1693367276.853600:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 type: 3
> 109531@1693367276.853618:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x4000 size: 0x2000 type: 3
> 109531@1693367276.853625:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x6000 size: 0x1000 type: 3
> 109531@1693367276.853630:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0x7000 size: 0x2000 type: 3
> 109531@1693367276.853636:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0x9000 size: 0x1000 type: 3
> 109531@1693367276.853642:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0xa000 size: 0x2000 type: 3
> 109531@1693367276.853648:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0xc000 size: 0x1000 type: 3
> 109531@1693367276.853654:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xf000 size: 0x1000 type: 3
> 109531@1693367276.853660:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0x10000 size: 0x1000 type: 3
> 109531@1693367276.853666:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xd000 size: 0x1000 type: 3
> 109531@1693367276.853670:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xe000 size: 0x1000 type: 3
> 109531@1693367276.853675:vhost_vdpa_iotlb_end_batch vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 type: 6
> 109531@1693367277.014697:vhost_vdpa_get_vq_index dev: 0x55c933925de0 idx: 0 vq idx: 0
> 109531@1693367277.014747:vhost_vdpa_get_vq_index dev: 0x55c933925de0 idx: 1 vq idx: 1
> 109531@1693367277.014753:vhost_vdpa_get_vq_index dev: 0x55c9339262e0 idx: 2 vq idx: 2
> 109531@1693367277.014756:vhost_vdpa_get_vq_index dev: 0x55c9339262e0 idx: 3 vq idx: 3
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c         |   7 +--
>  include/hw/virtio/vhost-vdpa.h |   3 ++
>  net/vhost-vdpa.c               | 112 +++++++++++++++++++++++++++--------------
>  3 files changed, 80 insertions(+), 42 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index d98704a..4010fd9 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1162,8 +1162,8 @@ static void vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr addr)
>      vhost_iova_tree_remove(v->shared->iova_tree, *result);
>  }
>
> -static void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> -                                       const VhostShadowVirtqueue *svq)
> +void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                const VhostShadowVirtqueue *svq)
>  {
>      struct vhost_vdpa *v = dev->opaque;
>      struct vhost_vring_addr svq_addr;
> @@ -1346,17 +1346,14 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>          return;
>      }
>
> -    vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
>      for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>          VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>
>          vhost_svq_stop(svq);
> -        vhost_vdpa_svq_unmap_rings(dev, svq);
>
>          event_notifier_cleanup(&svq->hdev_kick);
>          event_notifier_cleanup(&svq->hdev_call);
>      }
> -    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
>  }
>
>  static void vhost_vdpa_suspend(struct vhost_dev *dev)
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index aa13679..f426e2c 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -112,6 +112,9 @@ int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid);
>  int vhost_vdpa_load_setup(VhostVDPAShared *s, AddressSpace *dma_as);
>  int vhost_vdpa_load_cleanup(VhostVDPAShared *s, bool vhost_will_start);
>
> +void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                const VhostShadowVirtqueue *svq);
> +
>  typedef struct vdpa_iommu {
>      VhostVDPAShared *dev_shared;
>      IOMMUMemoryRegion *iommu_mr;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 683619f..41714d1 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -29,6 +29,7 @@
>  #include "migration/migration.h"
>  #include "migration/misc.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-vdpa.h"
>
>  /* Todo:need to add the multiqueue support here */
>  typedef struct VhostVDPAState {
> @@ -467,15 +468,89 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>      return 0;
>  }
>
> +static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> +{
> +    VhostIOVATree *tree = v->shared->iova_tree;
> +    DMAMap needle = {
> +        /*
> +         * No need to specify size or to look for more translations since
> +         * this contiguous chunk was allocated by us.
> +         */
> +        .translated_addr = (hwaddr)(uintptr_t)addr,
> +    };
> +    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
> +    int r;
> +
> +    if (unlikely(!map)) {
> +        error_report("Cannot locate expected map");
> +        return;
> +    }
> +
> +    r = vhost_vdpa_dma_unmap(v->shared, v->address_space_id, map->iova,
> +                             map->size + 1);
> +    if (unlikely(r != 0)) {
> +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> +    }
> +
> +    vhost_iova_tree_remove(tree, *map);
> +}
> +
>  static void vhost_vdpa_net_client_stop(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    struct vhost_vdpa *last_vi = NULL;
> +    bool has_cvq = v->dev->vq_index_end % 2;
> +    int nvqp;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
>      if (s->vhost_vdpa.index == 0) {
>          migration_remove_notifier(&s->migration_state);
>      }
> +
> +    if (v->dev->vq_index + v->dev->nvqs != v->dev->vq_index_end) {
> +        return;
> +    }
> +
> +    nvqp = (v->dev->vq_index_end + 1) / 2;
> +    for (int i = 0; i < nvqp; ++i) {
> +        VhostVDPAState *s_i = vhost_vdpa_net_get_nc_vdpa(s, i);
> +        struct vhost_vdpa *v_i = &s_i->vhost_vdpa;
> +
> +        if (!v_i->shadow_vqs_enabled) {
> +            continue;
> +        }
> +        if (!last_vi) {
> +            vhost_vdpa_dma_batch_begin_once(v_i->shared,
> +                                            v_i->address_space_id);
> +            last_vi = v_i;
> +        } else if (last_vi->address_space_id != v_i->address_space_id) {
> +            vhost_vdpa_dma_batch_end_once(last_vi->shared,
> +                                          last_vi->address_space_id);
> +            vhost_vdpa_dma_batch_begin_once(v_i->shared,
> +                                            v_i->address_space_id);
> +            last_vi = v_i;
> +        }
> +
> +        for (unsigned j = 0; j < v_i->shadow_vqs->len; ++j) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v_i->shadow_vqs, j);
> +
> +            vhost_vdpa_svq_unmap_rings(v_i->dev, svq);
> +        }
> +    }
> +    if (has_cvq) {
> +        if (last_vi) {
> +            assert(last_vi->address_space_id == v->address_space_id);
> +        }
> +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> +    }
> +    if (last_vi) {
> +        vhost_vdpa_dma_batch_end_once(last_vi->shared,
> +                                      last_vi->address_space_id);
> +        last_vi = NULL;
> +    }

Since we've delayed the guest unmap memory to _cleanup, why not delay
these unmaps to cleanup too?

>  }
>
>  static int vhost_vdpa_net_load_setup(NetClientState *nc, NICState *nic)
> @@ -585,33 +660,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
>      return state.num;
>  }
>
> -static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> -{
> -    VhostIOVATree *tree = v->shared->iova_tree;
> -    DMAMap needle = {
> -        /*
> -         * No need to specify size or to look for more translations since
> -         * this contiguous chunk was allocated by us.
> -         */
> -        .translated_addr = (hwaddr)(uintptr_t)addr,
> -    };
> -    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
> -    int r;
> -
> -    if (unlikely(!map)) {
> -        error_report("Cannot locate expected map");
> -        return;
> -    }
> -
> -    r = vhost_vdpa_dma_unmap(v->shared, v->address_space_id, map->iova,
> -                             map->size + 1);
> -    if (unlikely(r != 0)) {
> -        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> -    }
> -
> -    vhost_iova_tree_remove(tree, *map);
> -}
> -
>  /** Map CVQ buffer. */
>  static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
>                                    bool write)
> @@ -740,18 +788,8 @@ err:
>
>  static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>  {
> -    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_vdpa *v = &s->vhost_vdpa;
> -
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
> -    if (s->vhost_vdpa.shadow_vqs_enabled) {
> -        vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
> -        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> -        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> -        vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
> -    }
> -
>      vhost_vdpa_net_client_stop(nc);
>  }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h
  2023-12-07 17:39 ` [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h Si-Wei Liu
  2023-12-11  7:47   ` Eugenio Perez Martin
@ 2024-01-11  3:32   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  3:32 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

It's better to document which version did this commit sync to.

Thanks

> ---
>  include/standard-headers/linux/vhost_types.h | 13 +++++++++++++
>  linux-headers/linux/vhost.h                  |  9 +++++++++
>  2 files changed, 22 insertions(+)
>
> diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
> index 5ad07e1..c39199b 100644
> --- a/include/standard-headers/linux/vhost_types.h
> +++ b/include/standard-headers/linux/vhost_types.h
> @@ -185,5 +185,18 @@ struct vhost_vdpa_iova_range {
>   * DRIVER_OK
>   */
>  #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
> +/* Device can be resumed */
> +#define VHOST_BACKEND_F_RESUME  0x5
> +/* Device supports the driver enabling virtqueues both before and after
> + * DRIVER_OK
> + */
> +#define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
> +/* Device may expose the virtqueue's descriptor area, driver area and
> + * device area to a different group for ASID binding than where its
> + * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
> + */
> +#define VHOST_BACKEND_F_DESC_ASID    0x7
> +/* IOTLB don't flush memory mapping across device reset */
> +#define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
>
>  #endif
> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> index f5c48b6..c61c687 100644
> --- a/linux-headers/linux/vhost.h
> +++ b/linux-headers/linux/vhost.h
> @@ -219,4 +219,13 @@
>   */
>  #define VHOST_VDPA_RESUME              _IO(VHOST_VIRTIO, 0x7E)
>
> +/* Get the dedicated group for the descriptor table of a virtqueue:
> + * read index, write group in num.
> + * The virtqueue index is stored in the index field of vhost_vring_state.
> + * The group id for the descriptor table of this specific virtqueue
> + * is returned via num field of vhost_vring_state.
> + */
> +#define VHOST_VDPA_GET_VRING_DESC_GROUP        _IOWR(VHOST_VIRTIO, 0x7F,       \
> +                                             struct vhost_vring_state)
> +
>  #endif
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group
  2023-12-07 17:39 ` [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group Si-Wei Liu
@ 2024-01-11  3:51   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  3:51 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Internal API to get the descriptor group index for a specific virtqueue
> through the VHOST_VDPA_GET_VRING_DESC_GROUP ioctl.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  net/vhost-vdpa.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 90f4128..887c329 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -471,6 +471,25 @@ static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index,
>      return state.num;
>  }
>
> +static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
> +                                               unsigned vq_index,
> +                                               Error **errp)
> +{
> +    struct vhost_vring_state state = {
> +        .index = vq_index,
> +    };
> +    int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_DESC_GROUP, &state);
> +
> +    if (unlikely(r < 0)) {
> +        r = -errno;
> +        error_setg_errno(errp, errno, "Cannot get VQ %u descriptor group",
> +                         vq_index);
> +        return r;
> +    }
> +
> +    return state.num;
> +}
> +
>  static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
>                                             unsigned vq_group,
>                                             unsigned asid_num)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 03/40] vdpa: probe descriptor group index for data vqs
  2023-12-07 17:39 ` [PATCH 03/40] vdpa: probe descriptor group index for data vqs Si-Wei Liu
  2023-12-11 18:49   ` Eugenio Perez Martin
@ 2024-01-11  4:02   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  4:02 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:53 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Getting it ahead at initialization time instead of start time allows
> decision making independent of device status, while reducing failure
> possibility in starting device or during migration.
>
> Adding function vhost_vdpa_probe_desc_group() for that end. This
> function will be used to probe the descriptor group for data vqs.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 887c329..0cf3147 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1688,6 +1688,95 @@ out:
>      return r;
>  }
>
> +static int vhost_vdpa_probe_desc_group(int device_fd, uint64_t features,
> +                                       int vq_index, int64_t *desc_grpidx,
> +                                       Error **errp)
> +{
> +    uint64_t backend_features;
> +    int64_t vq_group, desc_group;
> +    uint8_t saved_status = 0;
> +    uint8_t status = 0;
> +    int r;
> +
> +    ERRP_GUARD();
> +
> +    r = ioctl(device_fd, VHOST_GET_BACKEND_FEATURES, &backend_features);
> +    if (unlikely(r < 0)) {
> +        error_setg_errno(errp, errno, "Cannot get vdpa backend_features");
> +        return r;
> +    }
> +
> +    if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> +        return 0;
> +    }
> +
> +    if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID))) {
> +        return 0;
> +    }
> +
> +    r = ioctl(device_fd, VHOST_VDPA_GET_STATUS, &saved_status);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "Cannot get device status");
> +        goto out;
> +    }

I wonder what's the reason for the status being saved and restored?

We don't do this in vhost_vdpa_probe_cvq_isolation().

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq
  2023-12-07 17:39 ` [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq Si-Wei Liu
@ 2024-01-11  7:06   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  7:06 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Same as the previous commit, but do it for cvq instead of data vqs.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 0cf3147..cb5705d 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1601,16 +1601,19 @@ static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
>  };
>
>  /**
> - * Probe if CVQ is isolated
> + * Probe if CVQ is isolated, and piggyback its descriptor group
> + * index if supported
>   *
>   * @device_fd         The vdpa device fd
>   * @features          Features offered by the device.
>   * @cvq_index         The control vq pair index
> + * @desc_grpidx       The CVQ's descriptor group index to return
>   *
> - * Returns <0 in case of failure, 0 if false and 1 if true.
> + * Returns <0 in case of failure, 0 if false and 1 if true (isolated).
>   */
>  static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
> -                                          int cvq_index, Error **errp)
> +                                          int cvq_index, int64_t *desc_grpidx,
> +                                          Error **errp)
>  {
>      uint64_t backend_features;
>      int64_t cvq_group;
> @@ -1667,6 +1670,13 @@ static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
>          goto out;
>      }
>
> +    if (backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) {
> +        int64_t desc_group = vhost_vdpa_get_vring_desc_group(device_fd,
> +                                                             cvq_index, errp);
> +        if (likely(desc_group >= 0) && desc_group != cvq_group)
> +            *desc_grpidx = desc_group;
> +    }
> +
>      for (int i = 0; i < cvq_index; ++i) {
>          int64_t group = vhost_vdpa_get_vring_group(device_fd, i, errp);
>          if (unlikely(group < 0)) {
> @@ -1685,6 +1695,8 @@ static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
>  out:
>      status = 0;
>      ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
> +    status = VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER;

Is this a bug fix or I don't see the connection with the descriptor group.

Thanks

> +    ioctl(device_fd, VHOST_VDPA_SET_STATUS, &status);
>      return r;
>  }
>
> @@ -1791,6 +1803,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                         Error **errp)
>  {
>      NetClientState *nc = NULL;
> +    int64_t desc_group = -1;
>      VhostVDPAState *s;
>      int ret = 0;
>      assert(name);
> @@ -1802,7 +1815,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      } else {
>          cvq_isolated = vhost_vdpa_probe_cvq_isolation(vdpa_device_fd, features,
>                                                        queue_pair_index * 2,
> -                                                      errp);
> +                                                      &desc_group, errp);
>          if (unlikely(cvq_isolated < 0)) {
>              return NULL;
>          }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init
  2023-12-07 17:39 ` [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init Si-Wei Liu
  2023-12-11 10:46   ` Eugenio Perez Martin
@ 2024-01-11  7:09   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  7:09 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Add the desc_group field to struct vhost_vdpa, and get it
> populated when the corresponding vq is initialized at
> net_vhost_vdpa_init. If the vq does not have descriptor
> group capability, or it doesn't have a dedicated ASID
> group to host descriptors other than the data buffers,
> desc_group will be set to a negative value -1.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  include/hw/virtio/vhost-vdpa.h |  1 +
>  net/vhost-vdpa.c               | 15 +++++++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 6533ad2..63493ff 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -87,6 +87,7 @@ typedef struct vhost_vdpa {
>      Error *migration_blocker;
>      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>      IOMMUNotifier n;
> +    int64_t desc_group;
>  } VhostVDPA;
>
>  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index cb5705d..1a738b2 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1855,11 +1855,22 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>
>      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>      if (ret) {
> -        qemu_del_net_client(nc);
> -        return NULL;
> +        goto err;

This part of introducing the "err" label looks more like a cleanup.

Others look good.

Thanks

>      }
>
> +    if (is_datapath) {
> +        ret = vhost_vdpa_probe_desc_group(vdpa_device_fd, features,
> +                                          0, &desc_group, errp);
> +        if (unlikely(ret < 0)) {
> +            goto err;
> +        }
> +    }
> +    s->vhost_vdpa.desc_group = desc_group;
>      return nc;
> +
> +err:
> +    qemu_del_net_client(nc);
> +    return NULL;
>  }
>
>  static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 06/40] vhost: make svq work with gpa without iova translation
  2023-12-07 17:39 ` [PATCH 06/40] vhost: make svq work with gpa without iova translation Si-Wei Liu
  2023-12-11 11:17   ` Eugenio Perez Martin
@ 2024-01-11  7:31   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  7:31 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Make vhost_svq_vring_write_descs able to work with GPA directly
> without going through iova tree for translation. This will be
> needed in the next few patches where the SVQ has dedicated
> address space to host its virtqueues. Instead of having to
> translate qemu's VA to IOVA via the iova tree, with dedicated
> or isolated address space for SVQ descriptors, the IOVA is
> exactly same as the guest GPA space where translation would
> not be needed any more.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 35 +++++++++++++++++++++++------------
>  1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fc5f408..97ccd45 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -136,8 +136,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
>   * Return true if success, false otherwise and print error.
>   */
>  static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> -                                        const struct iovec *iovec, size_t num,
> -                                        bool more_descs, bool write)
> +                                        const struct iovec *iovec, hwaddr *addr,
> +                                        size_t num, bool more_descs, bool write)
>  {
>      uint16_t i = svq->free_head, last = svq->free_head;
>      unsigned n;
> @@ -149,8 +149,15 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>          return true;
>      }
>
> -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> -    if (unlikely(!ok)) {
> +    if (svq->iova_tree) {
> +        ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> +        if (unlikely(!ok)) {
> +            return false;
> +        }

So the idea is when shadow virtqueue can work directly for GPA, there
won't be an iova_tree here?

If yes, I think we need a comment around iova_tree or here to explain this.

> +    } else if (!addr) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "No translation found for vaddr 0x%p\n",
> +                      iovec[0].iov_base);
>          return false;
>      }
>
> @@ -161,7 +168,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>          } else {
>              descs[i].flags = flags;
>          }
> -        descs[i].addr = cpu_to_le64(sg[n]);
> +        descs[i].addr = cpu_to_le64(svq->iova_tree ? sg[n] : addr[n]);

Or maybe a helper and do the switch there with the comments.

Thanks

>          descs[i].len = cpu_to_le32(iovec[n].iov_len);
>
>          last = i;
> @@ -173,9 +180,10 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>  }
>
>  static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -                                const struct iovec *out_sg, size_t out_num,
> -                                const struct iovec *in_sg, size_t in_num,
> -                                unsigned *head)
> +                                const struct iovec *out_sg, hwaddr *out_addr,
> +                                size_t out_num,
> +                                const struct iovec *in_sg, hwaddr *in_addr,
> +                                size_t in_num, unsigned *head)
>  {
>      unsigned avail_idx;
>      vring_avail_t *avail = svq->vring.avail;
> @@ -191,13 +199,14 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>          return false;
>      }
>
> -    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
> -                                     false);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_addr, out_num,
> +                                     in_num > 0, false);
>      if (unlikely(!ok)) {
>          return false;
>      }
>
> -    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_addr, in_num,
> +                                     false, true);
>      if (unlikely(!ok)) {
>          return false;
>      }
> @@ -258,7 +267,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>          return -ENOSPC;
>      }
>
> -    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
> +    ok = vhost_svq_add_split(svq, out_sg, elem ? elem->out_addr : NULL,
> +                             out_num, in_sg, elem ? elem->in_addr : NULL,
> +                             in_num, &qemu_head);
>      if (unlikely(!ok)) {
>          return -EINVAL;
>      }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id
  2023-12-07 17:39 ` [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id Si-Wei Liu
  2023-12-11 11:18   ` Eugenio Perez Martin
@ 2024-01-11  7:33   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  7:33 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Move it a few lines ahead to make function call easier for those
> before it.  No funtional change involved.

Typo for functional.

>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  net/vhost-vdpa.c | 36 ++++++++++++++++++------------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1a738b2..dbfa192 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -335,6 +335,24 @@ static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
>      }
>  }
>
> +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> +                                           unsigned vq_group,
> +                                           unsigned asid_num)
> +{
> +    struct vhost_vring_state asid = {
> +        .index = vq_group,
> +        .num = asid_num,
> +    };
> +    int r;
> +
> +    r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> +    if (unlikely(r < 0)) {
> +        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> +                     asid.index, asid.num, errno, g_strerror(errno));
> +    }
> +    return r;
> +}
> +
>  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>  {
>      struct vhost_vdpa *v = &s->vhost_vdpa;
> @@ -490,24 +508,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
>      return state.num;
>  }
>
> -static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> -                                           unsigned vq_group,
> -                                           unsigned asid_num)
> -{
> -    struct vhost_vring_state asid = {
> -        .index = vq_group,
> -        .num = asid_num,
> -    };
> -    int r;
> -
> -    r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> -    if (unlikely(r < 0)) {
> -        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> -                     asid.index, asid.num, errno, g_strerror(errno));
> -    }
> -    return r;
> -}
> -
>  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
>  {
>      VhostIOVATree *tree = v->shared->iova_tree;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 09/40] vdpa: no repeat setting shadow_data
  2023-12-07 17:39 ` [PATCH 09/40] vdpa: no repeat setting shadow_data Si-Wei Liu
  2023-12-11 11:21   ` Eugenio Perez Martin
@ 2024-01-11  7:34   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  7:34 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Since shadow_data is now shared in the parent data struct, it
> just needs to be set only once by the first vq. This change
> will make shadow_data independent of svq enabled state, which
> can be optionally turned off when SVQ descritors and device

Typo for descriptors.

> driver areas are all isolated to a separate address space.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  net/vhost-vdpa.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index c9bfc6f..2555897 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -387,13 +387,12 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>      if (s->always_svq ||
>          migration_is_setup_or_active(migrate_get_current()->state)) {
>          v->shadow_vqs_enabled = true;
> -        v->shared->shadow_data = true;
>      } else {
>          v->shadow_vqs_enabled = false;
> -        v->shared->shadow_data = false;
>      }
>
>      if (v->index == 0) {
> +        v->shared->shadow_data = v->shadow_vqs_enabled;
>          vhost_vdpa_net_data_start_first(s);
>          return 0;
>      }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa
  2023-12-07 17:39 ` [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa Si-Wei Liu
  2023-12-11 11:19   ` Eugenio Perez Martin
@ 2024-01-11  7:37   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  7:37 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:52 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Previous commits had it removed. Now adding it back because
> this function will be needed by next patches.

Need some description to explain why. Because it should not be needed
as we have a "parent" structure now, anything that is common could be
stored there?

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible
  2023-12-07 17:39 ` [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible Si-Wei Liu
  2023-12-11 13:35   ` Eugenio Perez Martin
@ 2024-01-11  8:02   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:02 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> When backend supports the VHOST_BACKEND_F_DESC_ASID feature
> and all the data vqs can support one or more descriptor group
> to host SVQ vrings and descriptors, we assign them a different
> ASID than where its buffers reside in guest memory address
> space. With this dedicated ASID for SVQs, the IOVA for what
> vdpa device may care effectively becomes the GPA, thus there's
> no need to translate IOVA address. For this reason, shadow_data
> can be turned off accordingly. It doesn't mean the SVQ is not
> enabled, but just that the translation is not needed from iova
> tree perspective.
>
> We can reuse CVQ's address space ID to host SVQ descriptors
> because both CVQ and SVQ are emulated in the same QEMU
> process, which will share the same VA address space.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c |  5 ++++-
>  net/vhost-vdpa.c       | 57 ++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 24844b5..30dff95 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -627,6 +627,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>      uint64_t qemu_backend_features = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
>                                       0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
>                                       0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
> +                                     0x1ULL << VHOST_BACKEND_F_DESC_ASID |
>                                       0x1ULL << VHOST_BACKEND_F_SUSPEND;
>      int ret;
>
> @@ -1249,7 +1250,9 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
>              goto err;
>          }
>
> -        vhost_svq_start(svq, dev->vdev, vq, v->shared->iova_tree);
> +        vhost_svq_start(svq, dev->vdev, vq,
> +                        v->desc_group >= 0 && v->address_space_id ?
> +                        NULL : v->shared->iova_tree);

Nit: it might be a little bit more clear if we use a helper to check
like vhost_svq_needs _iova_tree()

>          ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err);
>          if (unlikely(!ok)) {
>              goto err_map;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 2555897..aebaa53 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -366,20 +366,50 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
>  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>  {
>      struct vhost_vdpa *v = &s->vhost_vdpa;
> +    int r;
>
>      migration_add_notifier(&s->migration_state,
>                             vdpa_net_migration_state_notifier);
>
> +    if (!v->shadow_vqs_enabled) {
> +        if (v->desc_group >= 0 &&
> +            v->address_space_id != VHOST_VDPA_GUEST_PA_ASID) {
> +            vhost_vdpa_set_address_space_id(v, v->desc_group,
> +                                            VHOST_VDPA_GUEST_PA_ASID);
> +            s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> +        }
> +        return;
> +    }
> +
>      /* iova_tree may be initialized by vhost_vdpa_net_load_setup */
> -    if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
> +    if (!v->shared->iova_tree) {
>          v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
>                                                     v->shared->iova_range.last);
>      }
> +
> +    if (s->always_svq || v->desc_group < 0) {

I think the always_svq mode deserves a TODO there since it can utilize
the desc_group actually?

> +        return;
> +    }
> +
> +    r = vhost_vdpa_set_address_space_id(v, v->desc_group,
> +                                        VHOST_VDPA_NET_CVQ_ASID);

Any reason why we only set the descriptor group for the first nc?

(This seems implies the device has one descriptor group for all
virtqueue which might not be true)

> +    if (unlikely(r < 0)) {
> +        /* The other data vqs should also fall back to using the same ASID */
> +        s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> +        return;
> +    }
> +
> +    /* No translation needed on data SVQ when descriptor group is used */
> +    s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID;
> +    s->vhost_vdpa.shared->shadow_data = false;
> +    return;
>  }
>
>  static int vhost_vdpa_net_data_start(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
> +
>      struct vhost_vdpa *v = &s->vhost_vdpa;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> @@ -397,6 +427,18 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>          return 0;
>      }
>
> +    if (v->desc_group >= 0 && v->desc_group != s0->vhost_vdpa.desc_group) {
> +        unsigned asid;
> +        asid = v->shadow_vqs_enabled ?
> +            s0->vhost_vdpa.address_space_id : VHOST_VDPA_GUEST_PA_ASID;
> +        if (asid != s->vhost_vdpa.address_space_id) {
> +            vhost_vdpa_set_address_space_id(v, v->desc_group, asid);
> +        }
> +        s->vhost_vdpa.address_space_id = asid;

Can we unify the logic for nc0 and others here?

Then we don't need the trick in start_fisrt().

> +    } else {
> +        s->vhost_vdpa.address_space_id = s0->vhost_vdpa.address_space_id;
> +    }
> +
>      return 0;
>  }
>
> @@ -603,13 +645,19 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>          return 0;
>      }
>
> -    if (!s->cvq_isolated) {
> +    if (!s->cvq_isolated && v->desc_group < 0) {
> +        if (s0->vhost_vdpa.shadow_vqs_enabled &&
> +            s0->vhost_vdpa.desc_group >= 0 &&

I think we should fail if v->desc_group < 0 but s0->vhost_vdpa.desc_group >= 0 ?

> +            s0->vhost_vdpa.address_space_id) {

If this is a check for VHOST_VDPA_GUEST_PA_ASID, let's explicitly
check it against the macro here.

But it's not clear to me the logic here:

It looks to me like the code tries to work when CVQ is not isolated,
is this intended? This makes the logic rather complicated here.

Thanks


> +            v->shadow_vqs_enabled = false;
> +        }
>          return 0;
>      }
>
> -    cvq_group = vhost_vdpa_get_vring_group(v->shared->device_fd,
> +    cvq_group = s->cvq_isolated ?
> +                vhost_vdpa_get_vring_group(v->shared->device_fd,
>                                             v->dev->vq_index_end - 1,
> -                                           &err);
> +                                           &err) : v->desc_group;
>      if (unlikely(cvq_group < 0)) {
>          error_report_err(err);
>          return cvq_group;
> @@ -1840,6 +1888,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      s->always_svq = svq;
>      s->migration_state.notify = NULL;
>      s->vhost_vdpa.shadow_vqs_enabled = svq;
> +    s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
>      if (queue_pair_index == 0) {
>          vhost_vdpa_net_valid_svq_features(features,
>                                            &s->vhost_vdpa.migration_blocker);
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev
  2023-12-07 17:39 ` [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev Si-Wei Liu
  2023-12-11 13:36   ` Eugenio Perez Martin
@ 2024-01-11  8:03   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:03 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Generalize duplicated condition check for the last vq of vdpa
> device to a common function.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  hw/virtio/vhost-vdpa.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 30dff95..2b1cc14 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -593,6 +593,11 @@ static bool vhost_vdpa_first_dev(struct vhost_dev *dev)
>      return v->index == 0;
>  }
>
> +static bool vhost_vdpa_last_dev(struct vhost_dev *dev)
> +{
> +    return dev->vq_index + dev->nvqs == dev->vq_index_end;
> +}
> +
>  static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
>                                         uint64_t *features)
>  {
> @@ -1432,7 +1437,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          goto out_stop;
>      }
>
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +    if (!vhost_vdpa_last_dev(dev)) {
>          return 0;
>      }
>
> @@ -1467,7 +1472,7 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
>  {
>      struct vhost_vdpa *v = dev->opaque;
>
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +    if (!vhost_vdpa_last_dev(dev)) {
>          return;
>      }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 13/40] vdpa: ref counting VhostVDPAShared
  2023-12-07 17:39 ` [PATCH 13/40] vdpa: ref counting VhostVDPAShared Si-Wei Liu
@ 2024-01-11  8:12   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:12 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Subsequent patches attempt to release VhostVDPAShared resources,
> for example iova tree to free and memory listener to unregister,
> in vdpa_dev_cleanup(). Instead of checking against the vq index,
> which is not always available in all of the callers, counting
> the usage by reference. Then it'll be easy to free resource
> upon the last deref.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  include/hw/virtio/vhost-vdpa.h |  2 ++
>  net/vhost-vdpa.c               | 14 ++++++++++----
>  2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 63493ff..7b8d3bf 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -70,6 +70,8 @@ typedef struct vhost_vdpa_shared {
>
>      /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
>      bool shadow_data;
> +
> +    unsigned refcnt;
>  } VhostVDPAShared;
>
>  typedef struct vhost_vdpa {
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index aebaa53..a126e5c 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -236,11 +236,11 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
>          g_free(s->vhost_net);
>          s->vhost_net = NULL;
>      }
> -    if (s->vhost_vdpa.index != 0) {
> -        return;
> +    if (--s->vhost_vdpa.shared->refcnt == 0) {
> +        qemu_close(s->vhost_vdpa.shared->device_fd);
> +        g_free(s->vhost_vdpa.shared);
>      }

I'd suggest having a get and put helper, then we can check and do
cleanup in the put when refcnt is zero.

Thanks

> -    qemu_close(s->vhost_vdpa.shared->device_fd);
> -    g_free(s->vhost_vdpa.shared);
> +    s->vhost_vdpa.shared = NULL;
>  }
>
>  /** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend  */
> @@ -1896,6 +1896,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>          s->vhost_vdpa.shared->device_fd = vdpa_device_fd;
>          s->vhost_vdpa.shared->iova_range = iova_range;
>          s->vhost_vdpa.shared->shadow_data = svq;
> +        s->vhost_vdpa.shared->refcnt++;
>      } else if (!is_datapath) {
>          s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
>                                       PROT_READ | PROT_WRITE,
> @@ -1910,6 +1911,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      }
>      if (queue_pair_index != 0) {
>          s->vhost_vdpa.shared = shared;
> +        s->vhost_vdpa.shared->refcnt++;
>      }
>
>      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> @@ -1928,6 +1930,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      return nc;
>
>  err:
> +    if (--s->vhost_vdpa.shared->refcnt == 0) {
> +        g_free(s->vhost_vdpa.shared);
> +    }
> +    s->vhost_vdpa.shared = NULL;
>      qemu_del_net_client(nc);
>      return NULL;
>  }
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 14/40] vdpa: convert iova_tree to ref count based
  2023-12-07 17:39 ` [PATCH 14/40] vdpa: convert iova_tree to ref count based Si-Wei Liu
  2023-12-11 17:21   ` Eugenio Perez Martin
@ 2024-01-11  8:15   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:15 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that it can be freed from vhost_vdpa_cleanup on
> the last deref. The next few patches will try to
> make iova tree life cycle not depend on memory
> listener, and there's possiblity to keep iova tree
> around when memory mapping is not changed across
> device reset.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a126e5c..7b8f047 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -238,6 +238,8 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
>      }
>      if (--s->vhost_vdpa.shared->refcnt == 0) {
>          qemu_close(s->vhost_vdpa.shared->device_fd);
> +        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
> +                        vhost_iova_tree_delete);

Could be part of the put() as well.

Thanks

>          g_free(s->vhost_vdpa.shared);
>      }
>      s->vhost_vdpa.shared = NULL;
> @@ -461,19 +463,12 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>  static void vhost_vdpa_net_client_stop(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_dev *dev;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
>      if (s->vhost_vdpa.index == 0) {
>          migration_remove_notifier(&s->migration_state);
>      }
> -
> -    dev = s->vhost_vdpa.dev;
> -    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> -        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
> -                        vhost_iova_tree_delete);
> -    }
>  }
>
>  static int vhost_vdpa_net_load_setup(NetClientState *nc, NICState *nic)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 15/40] vdpa: add svq_switching and flush_map to header
  2023-12-07 17:39 ` [PATCH 15/40] vdpa: add svq_switching and flush_map to header Si-Wei Liu
@ 2024-01-11  8:16   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:16 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Will be used in next patches.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  include/hw/virtio/vhost-vdpa.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 7b8d3bf..0fe0f60 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -72,6 +72,12 @@ typedef struct vhost_vdpa_shared {
>      bool shadow_data;
>
>      unsigned refcnt;
> +
> +    /* SVQ switching is in progress? 1: turn on SVQ, -1: turn off SVQ */
> +    int svq_switching;

Nit: just curious about any reason why 0, 1 or true false is not used?

Thanks

> +
> +    /* Flush mappings on reset due to shared address space */
> +    bool flush_map;
>  } VhostVDPAShared;
>
>  typedef struct vhost_vdpa {
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 16/40] vdpa: indicate SVQ switching via flag
  2023-12-07 17:39 ` [PATCH 16/40] vdpa: indicate SVQ switching via flag Si-Wei Liu
@ 2024-01-11  8:17   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:17 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> svq_switching indicates the case where SVQ mode change
> is on going. Positive (1) means switching from the
> normal passthrough mode to SVQ mode, and negative (-1)
> meaning switch SVQ back to the passthrough; zero (0)
> indicates that there's no SVQ mode switch taking place.

Ok, so the previous patch forgot to describe the zero(0).

And it looks to me we'd better use enum instead of the magic number here.

Thanks

>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 7b8f047..04718b2 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -320,6 +320,7 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
>      data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
>      cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
>                                    n->max_ncs - n->max_queue_pairs : 0;
> +    v->shared->svq_switching = enable ? 1 : -1;
>      /*
>       * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
>       * in the future and resume the device if read-only operations between
> @@ -332,6 +333,7 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
>      if (unlikely(r < 0)) {
>          error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
>      }
> +    v->shared->svq_switching = 0;
>  }
>
>  static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB
  2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
                   ` (40 preceding siblings ...)
  2023-12-11 18:39 ` [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Eugenio Perez Martin
@ 2024-01-11  8:21 ` Jason Wang
  41 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:21 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> This patch series contain several enhancements to SVQ live migration downtime
> for vDPA-net hardware device, specifically on mlx5_vdpa. Currently it is based
> off of Eugenio's RFC v2 .load_setup series [1] to utilize the shared facility
> and reduce frictions in merging or duplicating code if at all possible.
>
> It's stacked up in particular order as below, as the optimization for one on
> the top has to depend on others on the bottom. Here's a breakdown for what
> each part does respectively:
>
> Patch #  |          Feature / optimization
> ---------V-------------------------------------------------------------------
> 35 - 40  | trace events
> 34       | migrate_cancel bug fix
> 21 - 33  | (Un)map batching at stop-n-copy to further optimize LM down time
> 11 - 20  | persistent IOTLB [3] to improve LM down time
> 02 - 10  | SVQ descriptor ASID [2] to optimize SVQ switching
> 01       | dependent linux headers
>          V
>
> Let's first define 2 sources of downtime that this work is concerned with:
>
> * SVQ switching downtime (Downtime #1): downtime at the start of migration.
>   Time spent on teardown and setup for SVQ mode switching, and this downtime
>   is regarded as the maxium time for an individual vdpa-net device.
>   No memory transfer is involved during SVQ switching, hence no .
>
> * LM downtime (Downtime #2): aggregated downtime for all vdpa-net devices on
>   resource teardown and setup in the last stop-n-copy phase on source host.
>
> With each part of the optimizations applied bottom up, the effective outcome
> in terms of down time (in seconds) performance can be observed in this table:
>
>
>                     |    Downtime #1    |    Downtime #2
> --------------------+-------------------+-------------------
> Baseline QEMU       |     20s ~ 30s     |        20s
>                     |                   |
> Iterative map       |                   |
> at destination[1]   |        5s         |        20s
>                     |                   |
> SVQ descriptor      |                   |
>     ASID [2]        |        2s         |         5s
>                     |                   |
>                     |                   |
> persistent IOTLB    |        2s         |         2s
>       [3]           |                   |
>                     |                   |
> (Un)map batching    |                   |
> at stop-n-copy      |      1.7s         |       1.5s
> before switchover   |                   |
>
> (VM config: 128GB mem, 2 mlx5_vdpa devices, each w/ 4 data vqs)

This looks promising!

But the series looks a little bit huge, can we split them into 2 or 3 series?

It helps to speed up the reviewing and merging.

Thanks

>
> Please find the details regarding each enhancement on the commit log.
>
> Thanks,
> -Siwei
>
>
> [1] [RFC PATCH v2 00/10] Map memory at destination .load_setup in vDPA-net migration
> https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05711.html
> [2] VHOST_BACKEND_F_DESC_ASID
> https://lore.kernel.org/virtualization/20231018171456.1624030-2-dtatulea@nvidia.com/
> [3] VHOST_BACKEND_F_IOTLB_PERSIST
> https://lore.kernel.org/virtualization/1698304480-18463-1-git-send-email-si-wei.liu@oracle.com/
>
> ---
>
> Si-Wei Liu (40):
>   linux-headers: add vhost_types.h and vhost.h
>   vdpa: add vhost_vdpa_get_vring_desc_group
>   vdpa: probe descriptor group index for data vqs
>   vdpa: piggyback desc_group index when probing isolated cvq
>   vdpa: populate desc_group from net_vhost_vdpa_init
>   vhost: make svq work with gpa without iova translation
>   vdpa: move around vhost_vdpa_set_address_space_id
>   vdpa: add back vhost_vdpa_net_first_nc_vdpa
>   vdpa: no repeat setting shadow_data
>   vdpa: assign svq descriptors a separate ASID when possible
>   vdpa: factor out vhost_vdpa_last_dev
>   vdpa: check map_thread_enabled before join maps thread
>   vdpa: ref counting VhostVDPAShared
>   vdpa: convert iova_tree to ref count based
>   vdpa: add svq_switching and flush_map to header
>   vdpa: indicate SVQ switching via flag
>   vdpa: judge if map can be kept across reset
>   vdpa: unregister listener on last dev cleanup
>   vdpa: should avoid map flushing with persistent iotlb
>   vdpa: avoid mapping flush across reset
>   vdpa: vhost_vdpa_dma_batch_end_once rename
>   vdpa: factor out vhost_vdpa_map_batch_begin
>   vdpa: vhost_vdpa_dma_batch_begin_once rename
>   vdpa: factor out vhost_vdpa_dma_batch_end
>   vdpa: add asid to dma_batch_once API
>   vdpa: return int for dma_batch_once API
>   vdpa: add asid to all dma_batch call sites
>   vdpa: support iotlb_batch_asid
>   vdpa: expose API vhost_vdpa_dma_batch_once
>   vdpa: batch map/unmap op per svq pair basis
>   vdpa: batch map and unmap around cvq svq start/stop
>   vdpa: factor out vhost_vdpa_net_get_nc_vdpa
>   vdpa: batch multiple dma_unmap to a single call for vm stop
>   vdpa: fix network breakage after cancelling migration
>   vdpa: add vhost_vdpa_set_address_space_id trace
>   vdpa: add vhost_vdpa_get_vring_base trace for svq mode
>   vdpa: add vhost_vdpa_set_dev_vring_base trace for svq mode
>   vdpa: add trace events for eval_flush
>   vdpa: add trace events for vhost_vdpa_net_load_cmd
>   vdpa: add trace event for vhost_vdpa_net_load_mq
>
>  hw/virtio/trace-events                       |   9 +-
>  hw/virtio/vhost-shadow-virtqueue.c           |  35 ++-
>  hw/virtio/vhost-vdpa.c                       | 156 +++++++---
>  include/hw/virtio/vhost-vdpa.h               |  16 +
>  include/standard-headers/linux/vhost_types.h |  13 +
>  linux-headers/linux/vhost.h                  |   9 +
>  net/trace-events                             |   8 +
>  net/vhost-vdpa.c                             | 434 ++++++++++++++++++++++-----
>  8 files changed, 558 insertions(+), 122 deletions(-)
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 17/40] vdpa: judge if map can be kept across reset
  2023-12-07 17:39 ` [PATCH 17/40] vdpa: judge if map can be kept across reset Si-Wei Liu
  2023-12-13  9:51   ` Eugenio Perez Martin
@ 2024-01-11  8:24   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:24 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> The descriptor group for SVQ ASID allows the guest memory mapping
> to retain across SVQ switching, same as how isolated CVQ can do
> with a different ASID than the guest GPA space. Introduce an
> evaluation function to judge whether to flush or keep iotlb maps
> based on virtqueue's descriptor group and cvq isolation capability.

I may miss something, but is there any reason we can't judge during
initialization?

We know the device capability so it should not depend on any runtime
configuration.

Thanks

>
> Have to hook the evaluation function to NetClient's .poll op as
> .vhost_reset_status runs ahead of .stop, and .vhost_dev_start
> don't have access to the vhost-vdpa net's information.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/vhost-vdpa.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 04718b2..e9b96ed 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -504,12 +504,36 @@ static int vhost_vdpa_net_load_cleanup(NetClientState *nc, NICState *nic)
>                               n->parent_obj.status & VIRTIO_CONFIG_S_DRIVER_OK);
>  }
>
> +static void vhost_vdpa_net_data_eval_flush(NetClientState *nc, bool stop)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (!stop) {
> +        return;
> +    }
> +
> +    if (s->vhost_vdpa.index == 0) {
> +        if (s->always_svq) {
> +            v->shared->flush_map = true;
> +        } else if (!v->shared->svq_switching || v->desc_group >= 0) {
> +            v->shared->flush_map = false;
> +        } else {
> +            v->shared->flush_map = true;
> +        }
> +    } else if (!s->always_svq && v->shared->svq_switching &&
> +               v->desc_group < 0) {
> +        v->shared->flush_map = true;
> +    }
> +}
> +
>  static NetClientInfo net_vhost_vdpa_info = {
>          .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>          .size = sizeof(VhostVDPAState),
>          .receive = vhost_vdpa_receive,
>          .start = vhost_vdpa_net_data_start,
>          .load = vhost_vdpa_net_data_load,
> +        .poll = vhost_vdpa_net_data_eval_flush,
>          .stop = vhost_vdpa_net_client_stop,
>          .cleanup = vhost_vdpa_cleanup,
>          .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> @@ -1368,12 +1392,28 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
>      return 0;
>  }
>
> +static void vhost_vdpa_net_cvq_eval_flush(NetClientState *nc, bool stop)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +
> +    if (!stop) {
> +        return;
> +    }
> +
> +    if (!v->shared->flush_map && !v->shared->svq_switching &&
> +        !s->cvq_isolated && v->desc_group < 0) {
> +        v->shared->flush_map = true;
> +    }
> +}
> +
>  static NetClientInfo net_vhost_vdpa_cvq_info = {
>      .type = NET_CLIENT_DRIVER_VHOST_VDPA,
>      .size = sizeof(VhostVDPAState),
>      .receive = vhost_vdpa_receive,
>      .start = vhost_vdpa_net_cvq_start,
>      .load = vhost_vdpa_net_cvq_load,
> +    .poll = vhost_vdpa_net_cvq_eval_flush,
>      .stop = vhost_vdpa_net_cvq_stop,
>      .cleanup = vhost_vdpa_cleanup,
>      .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 18/40] vdpa: unregister listener on last dev cleanup
  2023-12-07 17:39 ` [PATCH 18/40] vdpa: unregister listener on last dev cleanup Si-Wei Liu
  2023-12-11 17:37   ` Eugenio Perez Martin
@ 2024-01-11  8:26   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:26 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that the free of iova tree struct can be safely deferred to
> until the last vq referencing it goes away.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 4f026db..ea2dfc8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -815,7 +815,10 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>      }
>
>      vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> -    memory_listener_unregister(&v->shared->listener);
> +    if (vhost_vdpa_last_dev(dev) && v->shared->listener_registered) {
> +        memory_listener_unregister(&v->shared->listener);
> +        v->shared->listener_registered = false;
> +    }

Can we move this to the put() (refcnt decreasing helper) of shared?

Thanks

>      vhost_vdpa_svq_cleanup(dev);
>
>      dev->opaque = NULL;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 19/40] vdpa: should avoid map flushing with persistent iotlb
  2023-12-07 17:39 ` [PATCH 19/40] vdpa: should avoid map flushing with persistent iotlb Si-Wei Liu
@ 2024-01-11  8:28   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:28 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Today memory listener is unregistered in vhost_vdpa_reset_status
> unconditionally, due to which all the maps will be flushed away
> from the iotlb. However, map flush is not always needed, and
> doing it from performance hot path may have innegligible latency
> impact that affects VM reboot time or brown out period during
> live migration.
>
> Leverage the IOTLB_PERSIST backend featuae, which ensures durable
> iotlb maps and not disappearing even across reset. When it is
> supported, we may conditionally keep the maps for cases where the
> guest memory mapping doesn't change. Prepare a function so that
> the next patch will be able to use it to keep the maps.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/trace-events |  1 +
>  hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>  2 files changed, 21 insertions(+)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 77905d1..9725d44 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -66,6 +66,7 @@ vhost_vdpa_set_owner(void *dev) "dev: %p"
>  vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
>  vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
>  vhost_vdpa_set_config_call(void *dev, int fd)"dev: %p fd: %d"
> +vhost_vdpa_maybe_flush_map(void *dev, bool reg, bool flush, bool persist) "dev: %p registered: %d flush_map: %d iotlb_persistent: %d"
>
>  # virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index ea2dfc8..31e0a55 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1471,6 +1471,26 @@ out_stop:
>      return ok ? 0 : -1;
>  }
>
> +static void vhost_vdpa_maybe_flush_map(struct vhost_dev *dev)

Nit: Not a native speaker, but it looks like
vhost_vdpa_may_flush_map() is better.

> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    trace_vhost_vdpa_maybe_flush_map(dev, v->shared->listener_registered,
> +                                     v->shared->flush_map,
> +                                     !!(dev->backend_cap &
> +                                     BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)));
> +
> +    if (!v->shared->listener_registered) {
> +        return;
> +    }
> +
> +    if (!(dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)) ||
> +        v->shared->flush_map) {
> +        memory_listener_unregister(&v->shared->listener);
> +        v->shared->listener_registered = false;
> +    }

Others look good.

Thanks

> +}
> +
>  static void vhost_vdpa_reset_status(struct vhost_dev *dev)
>  {
>      struct vhost_vdpa *v = dev->opaque;
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 20/40] vdpa: avoid mapping flush across reset
  2023-12-07 17:39 ` [PATCH 20/40] vdpa: avoid mapping flush across reset Si-Wei Liu
@ 2024-01-11  8:30   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-11  8:30 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:52 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Leverage the IOTLB_PERSIST and DESC_ASID features to achieve
> a slightly light weight reset path, without resorting to
> suspend and resume. Not as best but it offers significant
> time saving too, which should somehow play its role in live
> migration down time reduction by large.
>
> It benefits two cases:
>   - normal virtio reset in the VM, e.g. guest reboot, don't
>     have to tear down all iotlb mapping and set up again.
>   - SVQ switching, in which data vq's descriptor table and
>     vrings are moved to a different ASID than where its
>     buffers reside. Along with the use of persistent iotlb,
>     it would save substantial time from pinning and mapping
>     unneccessarily when moving descriptors on to or out of
>     shadow mode.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Looks good to me.

Thanks

> ---
>  hw/virtio/vhost-vdpa.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 31e0a55..47c764b 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -633,6 +633,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>                                       0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
>                                       0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
>                                       0x1ULL << VHOST_BACKEND_F_DESC_ASID |
> +                                     0x1ULL << VHOST_BACKEND_F_IOTLB_PERSIST |
>                                       0x1ULL << VHOST_BACKEND_F_SUSPEND;
>      int ret;
>
> @@ -1493,8 +1494,6 @@ static void vhost_vdpa_maybe_flush_map(struct vhost_dev *dev)
>
>  static void vhost_vdpa_reset_status(struct vhost_dev *dev)
>  {
> -    struct vhost_vdpa *v = dev->opaque;
> -
>      if (!vhost_vdpa_last_dev(dev)) {
>          return;
>      }
> @@ -1502,9 +1501,7 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
>      vhost_vdpa_reset_device(dev);
>      vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>                                 VIRTIO_CONFIG_S_DRIVER);
> -    memory_listener_unregister(&v->shared->listener);
> -    v->shared->listener_registered = false;
> -
> +    vhost_vdpa_maybe_flush_map(dev);
>  }
>
>  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename
  2023-12-07 17:39 ` [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename Si-Wei Liu
@ 2024-01-15  2:40   ` Jason Wang
  2024-01-15  2:52     ` Jason Wang
  0 siblings, 1 reply; 102+ messages in thread
From: Jason Wang @ 2024-01-15  2:40 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> No functional changes. Rename only.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 47c764b..013bfa2 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -191,7 +191,7 @@ static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
>      s->iotlb_batch_begin_sent = true;
>  }
>
> -static void vhost_vdpa_dma_end_batch(VhostVDPAShared *s)
> +static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
>  {
>      struct vhost_msg_v2 msg = {};
>      int fd = s->device_fd;
> @@ -229,7 +229,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>  {
>      VhostVDPAShared *s = container_of(listener, VhostVDPAShared, listener);
>
> -    vhost_vdpa_dma_end_batch(s);
> +    vhost_vdpa_dma_batch_end_once(s);
>  }
>
>  static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> @@ -1367,7 +1367,7 @@ static void *vhost_vdpa_load_map(void *opaque)
>              vhost_vdpa_iotlb_batch_begin_once(shared);
>              break;
>          case VHOST_IOTLB_BATCH_END:
> -            vhost_vdpa_dma_end_batch(shared);
> +            vhost_vdpa_dma_batch_end_once(shared);

It's better to explain why having a "_once" suffix is better here.

Thanks

>              break;
>          default:
>              error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename
  2024-01-15  2:40   ` Jason Wang
@ 2024-01-15  2:52     ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  2:52 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Mon, Jan 15, 2024 at 10:40 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> > No functional changes. Rename only.
> >
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > ---
> >  hw/virtio/vhost-vdpa.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 47c764b..013bfa2 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -191,7 +191,7 @@ static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
> >      s->iotlb_batch_begin_sent = true;
> >  }
> >
> > -static void vhost_vdpa_dma_end_batch(VhostVDPAShared *s)
> > +static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
> >  {
> >      struct vhost_msg_v2 msg = {};
> >      int fd = s->device_fd;
> > @@ -229,7 +229,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >  {
> >      VhostVDPAShared *s = container_of(listener, VhostVDPAShared, listener);
> >
> > -    vhost_vdpa_dma_end_batch(s);
> > +    vhost_vdpa_dma_batch_end_once(s);
> >  }
> >
> >  static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > @@ -1367,7 +1367,7 @@ static void *vhost_vdpa_load_map(void *opaque)
> >              vhost_vdpa_iotlb_batch_begin_once(shared);
> >              break;
> >          case VHOST_IOTLB_BATCH_END:
> > -            vhost_vdpa_dma_end_batch(shared);
> > +            vhost_vdpa_dma_batch_end_once(shared);
>
> It's better to explain why having a "_once" suffix is better here.

Ok, if it's for symmetry for host_vdpa_iotlb_batch_begin_once(), I
think it makes sense.

But it's better to document this in the change log.

Thanks

>
> Thanks
>
> >              break;
> >          default:
> >              error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
> > --
> > 1.8.3.1
> >



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 22/40] vdpa: factor out vhost_vdpa_map_batch_begin
  2023-12-07 17:39 ` [PATCH 22/40] vdpa: factor out vhost_vdpa_map_batch_begin Si-Wei Liu
@ 2024-01-15  3:02   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:02 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Refactoring only. No functional change.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  hw/virtio/trace-events |  2 +-
>  hw/virtio/vhost-vdpa.c | 25 ++++++++++++++++---------
>  2 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 9725d44..b0239b8 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -32,7 +32,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>  # vhost-vdpa.c
>  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> -vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 013bfa2..7a1b7f4 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -161,7 +161,7 @@ int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>      return ret;
>  }
>
> -static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
> +static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
>  {
>      int fd = s->device_fd;
>      struct vhost_msg_v2 msg = {
> @@ -169,26 +169,33 @@ static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
>          .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
>      };
>
> -    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
> -        s->iotlb_batch_begin_sent) {
> -        return;
> -    }
> -
>      if (s->map_thread_enabled && !qemu_thread_is_self(&s->map_thread)) {
>          struct vhost_msg_v2 *new_msg = g_new(struct vhost_msg_v2, 1);
>
>          *new_msg = msg;
>          g_async_queue_push(s->map_queue, new_msg);
>
> -        return;
> +        return false;
>      }
>
> -    trace_vhost_vdpa_listener_begin_batch(s, fd, msg.type, msg.iotlb.type);
> +    trace_vhost_vdpa_map_batch_begin(s, fd, msg.type, msg.iotlb.type);
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
>                       fd, errno, strerror(errno));
>      }
> -    s->iotlb_batch_begin_sent = true;
> +    return true;
> +}
> +
> +static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
> +{
> +    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
> +        s->iotlb_batch_begin_sent) {
> +        return;
> +    }
> +
> +    if (vhost_vdpa_map_batch_begin(s)) {
> +        s->iotlb_batch_begin_sent = true;
> +    }
>  }
>
>  static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 23/40] vdpa: vhost_vdpa_dma_batch_begin_once rename
  2023-12-07 17:39 ` [PATCH 23/40] vdpa: vhost_vdpa_dma_batch_begin_once rename Si-Wei Liu
@ 2024-01-15  3:03   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:03 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> No functional changes. Rename only.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 7a1b7f4..a6c6fe5 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -186,7 +186,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
>      return true;
>  }
>
> -static void vhost_vdpa_iotlb_batch_begin_once(VhostVDPAShared *s)
> +static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
>  {
>      if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
>          s->iotlb_batch_begin_sent) {
> @@ -411,7 +411,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>          iova = mem_region.iova;
>      }
>
> -    vhost_vdpa_iotlb_batch_begin_once(s);
> +    vhost_vdpa_dma_batch_begin_once(s);
>      ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
>                               int128_get64(llsize), vaddr, section->readonly);
>      if (ret) {
> @@ -493,7 +493,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>          iova = result->iova;
>          vhost_iova_tree_remove(s->iova_tree, *result);
>      }
> -    vhost_vdpa_iotlb_batch_begin_once(s);
> +    vhost_vdpa_dma_batch_begin_once(s);
>      /*
>       * The unmap ioctl doesn't accept a full 64-bit. need to check it
>       */
> @@ -1371,7 +1371,7 @@ static void *vhost_vdpa_load_map(void *opaque)
>                                       msg->iotlb.size);
>              break;
>          case VHOST_IOTLB_BATCH_BEGIN:
> -            vhost_vdpa_iotlb_batch_begin_once(shared);
> +            vhost_vdpa_dma_batch_begin_once(shared);

Nit: "iotlb" seems to be better than "dma" as there's no guarantee
that the underlayer device is using DMA (e.g simulator or VDUSE).

Thanks

>              break;
>          case VHOST_IOTLB_BATCH_END:
>              vhost_vdpa_dma_batch_end_once(shared);
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 24/40] vdpa: factor out vhost_vdpa_dma_batch_end
  2023-12-07 17:39 ` [PATCH 24/40] vdpa: factor out vhost_vdpa_dma_batch_end Si-Wei Liu
@ 2024-01-15  3:05   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:05 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:52 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Refactoring only. No functional change.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/trace-events |  2 +-
>  hw/virtio/vhost-vdpa.c | 30 ++++++++++++++++++------------
>  2 files changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index b0239b8..3411a07 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
>  vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
>  vhost_vdpa_listener_region_del_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index a6c6fe5..999a97a 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -198,19 +198,11 @@ static void vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s)
>      }
>  }
>
> -static void vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s)
> +static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s)
>  {

I had the same comment for using "iotlb" instead of "dma".

Others look good.

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 25/40] vdpa: add asid to dma_batch_once API
  2023-12-07 17:39 ` [PATCH 25/40] vdpa: add asid to dma_batch_once API Si-Wei Liu
  2023-12-13 15:42   ` Eugenio Perez Martin
@ 2024-01-15  3:07   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:07 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that DMA batching API can operate on other ASID than 0.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/trace-events |  4 ++--
>  hw/virtio/vhost-vdpa.c | 14 ++++++++------
>  2 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 3411a07..196f32f 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -32,8 +32,8 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>  # vhost-vdpa.c
>  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa_shared:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> -vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> -vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_map_batch_begin(void *v, int fd, uint32_t msg_type, uint8_t type, uint32_t asid)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8" asid: %"PRIu32
> +vhost_vdpa_dma_batch_end(void *v, int fd, uint32_t msg_type, uint8_t type, uint32_t asid)  "vdpa_shared:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8" asid: %"PRIu32
>  vhost_vdpa_listener_region_add_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
>  vhost_vdpa_listener_region_del_unaligned(void *v, const char *name, uint64_t offset_as, uint64_t offset_page) "vdpa_shared: %p region %s offset_within_address_space %"PRIu64" offset_within_region %"PRIu64
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 999a97a..2db2832 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -161,11 +161,12 @@ int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>      return ret;
>  }
>
> -static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s)
> +static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
>  {
>      int fd = s->device_fd;
>      struct vhost_msg_v2 msg = {
>          .type = VHOST_IOTLB_MSG_V2,
> +        .asid = asid,

I wonder if we need a check if vhost doesn't support ASID but asid is not zero?

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 28/40] vdpa: support iotlb_batch_asid
  2023-12-07 17:39 ` [PATCH 28/40] vdpa: support iotlb_batch_asid Si-Wei Liu
  2023-12-13 15:42   ` Eugenio Perez Martin
@ 2024-01-15  3:19   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:19 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Then it's possible to specify ASID when calling the DMA
> batching API. If the ASID to work on doesn't align with
> the ASID for ongoing transaction, the API will fail the
> request and return negative, and the transaction will
> remain intact as if no failed request ever had occured.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c         | 25 +++++++++++++++++++------
>  include/hw/virtio/vhost-vdpa.h |  1 +
>  net/vhost-vdpa.c               |  1 +
>  3 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index d3f5721..b7896a8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -189,15 +189,25 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
>
>  static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
>  {
> -    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH)) ||
> -        s->iotlb_batch_begin_sent) {
> +    if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
>          return 0;
>      }
>
> -    if (vhost_vdpa_map_batch_begin(s, asid)) {
> -        s->iotlb_batch_begin_sent = true;
> +    if (s->iotlb_batch_begin_sent && s->iotlb_batch_asid != asid) {
> +        return -1;
> +    }
> +
> +    if (s->iotlb_batch_begin_sent) {
> +        return 0;
>      }
>
> +    if (!vhost_vdpa_map_batch_begin(s, asid)) {
> +        return 0;
> +    }
> +
> +    s->iotlb_batch_begin_sent = true;
> +    s->iotlb_batch_asid = asid;
> +
>      return 0;
>  }
>
> @@ -237,10 +247,13 @@ static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
>          return 0;
>      }
>
> -    if (vhost_vdpa_dma_batch_end(s, asid)) {
> -        s->iotlb_batch_begin_sent = false;
> +    if (!vhost_vdpa_dma_batch_end(s, asid)) {
> +        return 0;
>      }
>
> +    s->iotlb_batch_begin_sent = false;
> +    s->iotlb_batch_asid = -1;
> +
>      return 0;
>  }
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 0fe0f60..219316f 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -61,6 +61,7 @@ typedef struct vhost_vdpa_shared {
>      bool map_thread_enabled;
>
>      bool iotlb_batch_begin_sent;
> +    uint32_t iotlb_batch_asid;
>
>      /*
>       * The memory listener has been registered, so DMA maps have been sent to
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index e9b96ed..bc72345 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1933,6 +1933,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>          s->vhost_vdpa.shared->device_fd = vdpa_device_fd;
>          s->vhost_vdpa.shared->iova_range = iova_range;
>          s->vhost_vdpa.shared->shadow_data = svq;
> +        s->vhost_vdpa.shared->iotlb_batch_asid = -1;

This seems a trick, uAPI defines asid as:

        __u32 asid;

So technically -1U is a legal value.

Thanks

>          s->vhost_vdpa.shared->refcnt++;
>      } else if (!is_datapath) {
>          s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once
  2023-12-07 17:39 ` [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once Si-Wei Liu
  2023-12-13 15:42   ` Eugenio Perez Martin
@ 2024-01-15  3:32   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:32 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> So that the batching API can be called from other file
> externally than the local.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c         | 21 +++++++++++++++------
>  include/hw/virtio/vhost-vdpa.h |  3 +++
>  2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index b7896a8..68dc01b 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -187,7 +187,7 @@ static bool vhost_vdpa_map_batch_begin(VhostVDPAShared *s, uint32_t asid)
>      return true;
>  }
>
> -static int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
> +int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid)
>  {
>      if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
>          return 0;
> @@ -237,7 +237,7 @@ static bool vhost_vdpa_dma_batch_end(VhostVDPAShared *s, uint32_t asid)
>      return true;
>  }
>
> -static int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
> +int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid)
>  {
>      if (!(s->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
>          return 0;
> @@ -436,7 +436,12 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>          iova = mem_region.iova;
>      }
>
> -    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    ret = vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    if (unlikely(ret)) {
> +        error_report("Can't batch mapping on asid 0 (%p)", s);
> +        goto fail_map;
> +    }
> +

This seems like another patch.

>      ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
>                               int128_get64(llsize), vaddr, section->readonly);
>      if (ret) {
> @@ -518,7 +523,11 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>          iova = result->iova;
>          vhost_iova_tree_remove(s->iova_tree, *result);
>      }
> -    vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    ret = vhost_vdpa_dma_batch_begin_once(s, VHOST_VDPA_GUEST_PA_ASID);
> +    if (ret) {
> +        error_report("Can't batch mapping on asid 0 (%p)", s);
> +    }

And this as well.

> +
>      /*
>       * The unmap ioctl doesn't accept a full 64-bit. need to check it
>       */
> @@ -1396,10 +1405,10 @@ static void *vhost_vdpa_load_map(void *opaque)
>                                       msg->iotlb.size);
>              break;
>          case VHOST_IOTLB_BATCH_BEGIN:
> -            vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
> +            r = vhost_vdpa_dma_batch_begin_once(shared, msg->asid);
>              break;
>          case VHOST_IOTLB_BATCH_END:
> -            vhost_vdpa_dma_batch_end_once(shared, msg->asid);
> +            r = vhost_vdpa_dma_batch_end_once(shared, msg->asid);

And these.

Thanks

>              break;
>          default:
>              error_report("Invalid IOTLB msg type %d", msg->iotlb.type);
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 219316f..aa13679 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -106,6 +106,9 @@ int vhost_vdpa_dma_map(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>                         hwaddr size, void *vaddr, bool readonly);
>  int vhost_vdpa_dma_unmap(VhostVDPAShared *s, uint32_t asid, hwaddr iova,
>                           hwaddr size);
> +int vhost_vdpa_dma_batch_begin_once(VhostVDPAShared *s, uint32_t asid);
> +int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid);
> +
>  int vhost_vdpa_load_setup(VhostVDPAShared *s, AddressSpace *dma_as);
>  int vhost_vdpa_load_cleanup(VhostVDPAShared *s, bool vhost_will_start);
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 30/40] vdpa: batch map/unmap op per svq pair basis
  2023-12-07 17:39 ` [PATCH 30/40] vdpa: batch map/unmap op per svq pair basis Si-Wei Liu
@ 2024-01-15  3:33   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:33 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Coalesce multiple map or unmap operations to just one
> so that all mapping setup or teardown can occur in a
> single DMA batch.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 31/40] vdpa: batch map and unmap around cvq svq start/stop
  2023-12-07 17:39 ` [PATCH 31/40] vdpa: batch map and unmap around cvq svq start/stop Si-Wei Liu
@ 2024-01-15  3:34   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:34 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Coalesce map or unmap operations to exact one DMA
> batch to reduce potential impact on performance.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 32/40] vdpa: factor out vhost_vdpa_net_get_nc_vdpa
  2023-12-07 17:39 ` [PATCH 32/40] vdpa: factor out vhost_vdpa_net_get_nc_vdpa Si-Wei Liu
@ 2024-01-15  3:35   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:35 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Introduce new API. No functional change on existing API.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop
  2023-12-07 17:39 ` [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop Si-Wei Liu
  2023-12-13 16:46   ` Eugenio Perez Martin
@ 2024-01-15  3:47   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:47 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Should help live migration downtime on source host. Below are the
> coalesced dma_unmap time series on 2 queue pair config (no
> dedicated descriptor group ASID for SVQ).

It's better to explain how we can batch into a single call (e.g do we
batch at the level of a whole device?)

>
> 109531@1693367276.853503:vhost_vdpa_reset_device dev: 0x55c933926890
> 109531@1693367276.853513:vhost_vdpa_add_status dev: 0x55c933926890 status: 0x3
> 109531@1693367276.853520:vhost_vdpa_flush_map dev: 0x55c933926890 doit: 1 svq_flush: 0 persist: 1
> 109531@1693367276.853524:vhost_vdpa_set_config_call dev: 0x55c933926890 fd: -1
> 109531@1693367276.853579:vhost_vdpa_iotlb_begin_batch vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 type: 5
> 109531@1693367276.853586:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x1000 size: 0x2000 type: 3
> 109531@1693367276.853600:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x3000 size: 0x1000 type: 3
> 109531@1693367276.853618:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x4000 size: 0x2000 type: 3
> 109531@1693367276.853625:vhost_vdpa_dma_unmap vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 asid: 0 iova: 0x6000 size: 0x1000 type: 3
> 109531@1693367276.853630:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0x7000 size: 0x2000 type: 3
> 109531@1693367276.853636:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0x9000 size: 0x1000 type: 3
> 109531@1693367276.853642:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0xa000 size: 0x2000 type: 3
> 109531@1693367276.853648:vhost_vdpa_dma_unmap vdpa:0x7fa2aa84c190 fd: 16 msg_type: 2 asid: 0 iova: 0xc000 size: 0x1000 type: 3
> 109531@1693367276.853654:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xf000 size: 0x1000 type: 3
> 109531@1693367276.853660:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0x10000 size: 0x1000 type: 3
> 109531@1693367276.853666:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xd000 size: 0x1000 type: 3
> 109531@1693367276.853670:vhost_vdpa_dma_unmap vdpa:0x7fa2aa6b6190 fd: 16 msg_type: 2 asid: 0 iova: 0xe000 size: 0x1000 type: 3
> 109531@1693367276.853675:vhost_vdpa_iotlb_end_batch vdpa:0x7fa2aa895190 fd: 16 msg_type: 2 type: 6
> 109531@1693367277.014697:vhost_vdpa_get_vq_index dev: 0x55c933925de0 idx: 0 vq idx: 0
> 109531@1693367277.014747:vhost_vdpa_get_vq_index dev: 0x55c933925de0 idx: 1 vq idx: 1
> 109531@1693367277.014753:vhost_vdpa_get_vq_index dev: 0x55c9339262e0 idx: 2 vq idx: 2
> 109531@1693367277.014756:vhost_vdpa_get_vq_index dev: 0x55c9339262e0 idx: 3 vq idx: 3
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/vhost-vdpa.c         |   7 +--
>  include/hw/virtio/vhost-vdpa.h |   3 ++
>  net/vhost-vdpa.c               | 112 +++++++++++++++++++++++++++--------------
>  3 files changed, 80 insertions(+), 42 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index d98704a..4010fd9 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1162,8 +1162,8 @@ static void vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr addr)
>      vhost_iova_tree_remove(v->shared->iova_tree, *result);
>  }
>
> -static void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> -                                       const VhostShadowVirtqueue *svq)
> +void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                const VhostShadowVirtqueue *svq)
>  {
>      struct vhost_vdpa *v = dev->opaque;
>      struct vhost_vring_addr svq_addr;
> @@ -1346,17 +1346,14 @@ static void vhost_vdpa_svqs_stop(struct vhost_dev *dev)
>          return;
>      }
>
> -    vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
>      for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>          VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>
>          vhost_svq_stop(svq);
> -        vhost_vdpa_svq_unmap_rings(dev, svq);
>
>          event_notifier_cleanup(&svq->hdev_kick);
>          event_notifier_cleanup(&svq->hdev_call);
>      }
> -    vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
>  }
>
>  static void vhost_vdpa_suspend(struct vhost_dev *dev)
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index aa13679..f426e2c 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -112,6 +112,9 @@ int vhost_vdpa_dma_batch_end_once(VhostVDPAShared *s, uint32_t asid);
>  int vhost_vdpa_load_setup(VhostVDPAShared *s, AddressSpace *dma_as);
>  int vhost_vdpa_load_cleanup(VhostVDPAShared *s, bool vhost_will_start);
>
> +void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                const VhostShadowVirtqueue *svq);
> +
>  typedef struct vdpa_iommu {
>      VhostVDPAShared *dev_shared;
>      IOMMUMemoryRegion *iommu_mr;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 683619f..41714d1 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -29,6 +29,7 @@
>  #include "migration/migration.h"
>  #include "migration/misc.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-vdpa.h"
>
>  /* Todo:need to add the multiqueue support here */
>  typedef struct VhostVDPAState {
> @@ -467,15 +468,89 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>      return 0;
>  }
>
> +static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> +{
> +    VhostIOVATree *tree = v->shared->iova_tree;
> +    DMAMap needle = {
> +        /*
> +         * No need to specify size or to look for more translations since
> +         * this contiguous chunk was allocated by us.
> +         */
> +        .translated_addr = (hwaddr)(uintptr_t)addr,
> +    };
> +    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
> +    int r;
> +
> +    if (unlikely(!map)) {
> +        error_report("Cannot locate expected map");
> +        return;
> +    }
> +
> +    r = vhost_vdpa_dma_unmap(v->shared, v->address_space_id, map->iova,
> +                             map->size + 1);
> +    if (unlikely(r != 0)) {
> +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> +    }
> +
> +    vhost_iova_tree_remove(tree, *map);
> +}
> +
>  static void vhost_vdpa_net_client_stop(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    struct vhost_vdpa *v = &s->vhost_vdpa;
> +    struct vhost_vdpa *last_vi = NULL;

Nit: just curious what did "vi" mean here?

> +    bool has_cvq = v->dev->vq_index_end % 2;
> +    int nvqp;
>
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
>      if (s->vhost_vdpa.index == 0) {
>          migration_remove_notifier(&s->migration_state);
>      }
> +
> +    if (v->dev->vq_index + v->dev->nvqs != v->dev->vq_index_end) {
> +        return;
> +    }
> +
> +    nvqp = (v->dev->vq_index_end + 1) / 2;
> +    for (int i = 0; i < nvqp; ++i) {
> +        VhostVDPAState *s_i = vhost_vdpa_net_get_nc_vdpa(s, i);
> +        struct vhost_vdpa *v_i = &s_i->vhost_vdpa;
> +
> +        if (!v_i->shadow_vqs_enabled) {
> +            continue;
> +        }
> +        if (!last_vi) {
> +            vhost_vdpa_dma_batch_begin_once(v_i->shared,
> +                                            v_i->address_space_id);
> +            last_vi = v_i;
> +        } else if (last_vi->address_space_id != v_i->address_space_id) {
> +            vhost_vdpa_dma_batch_end_once(last_vi->shared,
> +                                          last_vi->address_space_id);
> +            vhost_vdpa_dma_batch_begin_once(v_i->shared,
> +                                            v_i->address_space_id);
> +            last_vi = v_i;
> +        }
> +
> +        for (unsigned j = 0; j < v_i->shadow_vqs->len; ++j) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v_i->shadow_vqs, j);
> +
> +            vhost_vdpa_svq_unmap_rings(v_i->dev, svq);
> +        }
> +    }
> +    if (has_cvq) {
> +        if (last_vi) {
> +            assert(last_vi->address_space_id == v->address_space_id);
> +        }
> +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> +    }
> +    if (last_vi) {
> +        vhost_vdpa_dma_batch_end_once(last_vi->shared,
> +                                      last_vi->address_space_id);
> +        last_vi = NULL;
> +    }

The logic looks rather complicated, can we simplify it by:

batch_begin_once()
unmap()
batch_end_once()

?

Thanks


>  }
>
>  static int vhost_vdpa_net_load_setup(NetClientState *nc, NICState *nic)
> @@ -585,33 +660,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
>      return state.num;
>  }
>
> -static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
> -{
> -    VhostIOVATree *tree = v->shared->iova_tree;
> -    DMAMap needle = {
> -        /*
> -         * No need to specify size or to look for more translations since
> -         * this contiguous chunk was allocated by us.
> -         */
> -        .translated_addr = (hwaddr)(uintptr_t)addr,
> -    };
> -    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
> -    int r;
> -
> -    if (unlikely(!map)) {
> -        error_report("Cannot locate expected map");
> -        return;
> -    }
> -
> -    r = vhost_vdpa_dma_unmap(v->shared, v->address_space_id, map->iova,
> -                             map->size + 1);
> -    if (unlikely(r != 0)) {
> -        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
> -    }
> -
> -    vhost_iova_tree_remove(tree, *map);
> -}
> -
>  /** Map CVQ buffer. */
>  static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
>                                    bool write)
> @@ -740,18 +788,8 @@ err:
>
>  static void vhost_vdpa_net_cvq_stop(NetClientState *nc)
>  {
> -    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> -    struct vhost_vdpa *v = &s->vhost_vdpa;
> -
>      assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
> -    if (s->vhost_vdpa.shadow_vqs_enabled) {
> -        vhost_vdpa_dma_batch_begin_once(v->shared, v->address_space_id);
> -        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
> -        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->status);
> -        vhost_vdpa_dma_batch_end_once(v->shared, v->address_space_id);
> -    }
> -
>      vhost_vdpa_net_client_stop(nc);
>  }
>
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 34/40] vdpa: fix network breakage after cancelling migration
  2023-12-07 17:39 ` [PATCH 34/40] vdpa: fix network breakage after cancelling migration Si-Wei Liu
@ 2024-01-15  3:48   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:48 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Fix an issue where cancellation of ongoing migration ends up
> with no network connectivity.
>
> When canceling migration, SVQ will be switched back to the
> passthrough mode, but the right call fd is not programed to
> the device and the svq's own call fd is still used. At the
> point of this transitioning period, the shadow_vqs_enabled
> hadn't been set back to false yet, causing the installation
> of call fd inadvertently bypassed.
>
> Fixes: a8ac88585da1 ("vhost: Add Shadow VirtQueue call forwarding capabilities")
> Cc: Eugenio Pérez <eperezma@redhat.com>

Let's cc stable and post this as an independent patch.

>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace
  2023-12-07 17:39 ` [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace Si-Wei Liu
  2023-12-11 18:13   ` Eugenio Perez Martin
@ 2024-01-15  3:50   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:50 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/trace-events | 3 +++
>  net/vhost-vdpa.c | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/net/trace-events b/net/trace-events
> index 823a071..aab666a 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -23,3 +23,6 @@ colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int hdlen, in
>  # filter-rewriter.c
>  colo_filter_rewriter_pkt_info(const char *func, const char *src, const char *dst, uint32_t seq, uint32_t ack, uint32_t flag) "%s: src/dst: %s/%s p: seq/ack=%u/%u  flags=0x%x"
>  colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
> +
> +# vhost-vdpa.c
> +vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"

So pointer is not user friendly, how about using the name of the netclient?

Thanks

> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 41714d1..84876b0 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -30,6 +30,7 @@
>  #include "migration/misc.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-vdpa.h"
> +#include "trace.h"
>
>  /* Todo:need to add the multiqueue support here */
>  typedef struct VhostVDPAState {
> @@ -365,6 +366,8 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
>      };
>      int r;
>
> +    trace_vhost_vdpa_set_address_space_id(v, vq_group, asid_num);
> +
>      r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
>      if (unlikely(r < 0)) {
>          error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode
  2023-12-07 17:39 ` [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode Si-Wei Liu
  2023-12-11 18:14   ` Eugenio Perez Martin
@ 2024-01-15  3:52   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:52 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  hw/virtio/trace-events | 2 +-
>  hw/virtio/vhost-vdpa.c | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 196f32f..a8d3321 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -58,7 +58,7 @@ vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int r
>  vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
>  vhost_vdpa_set_vring_num(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
>  vhost_vdpa_set_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> -vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> +vhost_vdpa_get_vring_base(void *dev, unsigned int index, unsigned int num, bool svq) "dev: %p index: %u num: %u svq: %d"

In the future, it might be better to use the name of VirtIODevice as
well as vq index instead of pointer. But consider %p has been used, so

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base trace for svq mode
  2023-12-07 17:39 ` [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base " Si-Wei Liu
  2023-12-11 18:14   ` Eugenio Perez Martin
@ 2024-01-15  3:53   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:53 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 38/40] vdpa: add trace events for eval_flush
  2023-12-07 17:39 ` [PATCH 38/40] vdpa: add trace events for eval_flush Si-Wei Liu
@ 2024-01-15  3:57   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:57 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/trace-events | 2 ++
>  net/vhost-vdpa.c | 7 +++++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/net/trace-events b/net/trace-events
> index aab666a..d650c71 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -26,3 +26,5 @@ colo_filter_rewriter_conn_offset(uint32_t offset) ": offset=%u"
>
>  # vhost-vdpa.c
>  vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "vhost_vdpa: %p vq_group: %u asid: %u"
> +vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
> +vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 84876b0..a0bd8cd 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -608,6 +608,9 @@ static void vhost_vdpa_net_data_eval_flush(NetClientState *nc, bool stop)
>                 v->desc_group < 0) {
>          v->shared->flush_map = true;
>      }
> +    trace_vhost_vdpa_net_data_eval_flush(v, s->vhost_vdpa.index,
> +                                        v->shared->svq_switching,
> +                                        v->shared->flush_map);
>  }
>
>  static NetClientInfo net_vhost_vdpa_info = {
> @@ -1457,6 +1460,10 @@ static void vhost_vdpa_net_cvq_eval_flush(NetClientState *nc, bool stop)

For even better debuggability and observability.

Is it better to squash this into the patch into vhost_vdpa_net_cvq_eval_flush()?

Thanks

>          !s->cvq_isolated && v->desc_group < 0) {
>          v->shared->flush_map = true;
>      }
> +
> +    trace_vhost_vdpa_net_cvq_eval_flush(v, s->vhost_vdpa.index,
> +                                       v->shared->svq_switching,
> +                                       v->shared->flush_map);
>  }
>
>  static NetClientInfo net_vhost_vdpa_cvq_info = {
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq
  2023-12-07 17:39 ` [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq Si-Wei Liu
  2023-12-11 18:15   ` Eugenio Perez Martin
@ 2024-01-15  3:58   ` Jason Wang
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2024-01-15  3:58 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: eperezma, mst, dtatulea, leiyang, yin31149, boris.ostrovsky,
	jonah.palmer, qemu-devel

On Fri, Dec 8, 2023 at 2:51 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> For better debuggability and observability.
>
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---
>  net/trace-events | 1 +
>  net/vhost-vdpa.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/net/trace-events b/net/trace-events
> index be087e6..c128cc4 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -30,3 +30,4 @@ vhost_vdpa_net_data_eval_flush(void *s, int qindex, int svq_switch, bool svq_flu
>  vhost_vdpa_net_cvq_eval_flush(void *s, int qindex, int svq_switch, bool svq_flush) "vhost_vdpa: %p qp: %d svq_switch: %d flush_map: %d"
>  vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
>  vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
> +vhost_vdpa_net_load_mq(void *s, int ncurqps) "vdpa state: %p current_qpairs: %d"

Similarly, I think nc->name looks better than the plain pointer here?

Thanks

> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 61da8b4..17b8d01 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1109,6 +1109,8 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
>          return 0;
>      }
>
> +    trace_vhost_vdpa_net_load_mq(s, n->curr_queue_pairs);
> +
>      mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
>      const struct iovec data = {
>          .iov_base = &mq,
> --
> 1.8.3.1
>



^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2024-01-15  3:59 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-07 17:39 [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Si-Wei Liu
2023-12-07 17:39 ` [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h Si-Wei Liu
2023-12-11  7:47   ` Eugenio Perez Martin
2024-01-11  3:32   ` Jason Wang
2023-12-07 17:39 ` [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group Si-Wei Liu
2024-01-11  3:51   ` Jason Wang
2023-12-07 17:39 ` [PATCH 03/40] vdpa: probe descriptor group index for data vqs Si-Wei Liu
2023-12-11 18:49   ` Eugenio Perez Martin
2024-01-11  4:02   ` Jason Wang
2023-12-07 17:39 ` [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq Si-Wei Liu
2024-01-11  7:06   ` Jason Wang
2023-12-07 17:39 ` [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init Si-Wei Liu
2023-12-11 10:46   ` Eugenio Perez Martin
2023-12-11 11:01     ` Eugenio Perez Martin
2024-01-11  7:09   ` Jason Wang
2023-12-07 17:39 ` [PATCH 06/40] vhost: make svq work with gpa without iova translation Si-Wei Liu
2023-12-11 11:17   ` Eugenio Perez Martin
2024-01-11  7:31   ` Jason Wang
2023-12-07 17:39 ` [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id Si-Wei Liu
2023-12-11 11:18   ` Eugenio Perez Martin
2024-01-11  7:33   ` Jason Wang
2023-12-07 17:39 ` [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa Si-Wei Liu
2023-12-11 11:19   ` Eugenio Perez Martin
2024-01-11  7:37   ` Jason Wang
2023-12-07 17:39 ` [PATCH 09/40] vdpa: no repeat setting shadow_data Si-Wei Liu
2023-12-11 11:21   ` Eugenio Perez Martin
2024-01-11  7:34   ` Jason Wang
2023-12-07 17:39 ` [PATCH 10/40] vdpa: assign svq descriptors a separate ASID when possible Si-Wei Liu
2023-12-11 13:35   ` Eugenio Perez Martin
2024-01-11  8:02   ` Jason Wang
2023-12-07 17:39 ` [PATCH 11/40] vdpa: factor out vhost_vdpa_last_dev Si-Wei Liu
2023-12-11 13:36   ` Eugenio Perez Martin
2024-01-11  8:03   ` Jason Wang
2023-12-07 17:39 ` [PATCH 12/40] vdpa: check map_thread_enabled before join maps thread Si-Wei Liu
2023-12-07 17:39 ` [PATCH 13/40] vdpa: ref counting VhostVDPAShared Si-Wei Liu
2024-01-11  8:12   ` Jason Wang
2023-12-07 17:39 ` [PATCH 14/40] vdpa: convert iova_tree to ref count based Si-Wei Liu
2023-12-11 17:21   ` Eugenio Perez Martin
2024-01-11  8:15   ` Jason Wang
2023-12-07 17:39 ` [PATCH 15/40] vdpa: add svq_switching and flush_map to header Si-Wei Liu
2024-01-11  8:16   ` Jason Wang
2023-12-07 17:39 ` [PATCH 16/40] vdpa: indicate SVQ switching via flag Si-Wei Liu
2024-01-11  8:17   ` Jason Wang
2023-12-07 17:39 ` [PATCH 17/40] vdpa: judge if map can be kept across reset Si-Wei Liu
2023-12-13  9:51   ` Eugenio Perez Martin
2024-01-11  8:24   ` Jason Wang
2023-12-07 17:39 ` [PATCH 18/40] vdpa: unregister listener on last dev cleanup Si-Wei Liu
2023-12-11 17:37   ` Eugenio Perez Martin
2024-01-11  8:26   ` Jason Wang
2023-12-07 17:39 ` [PATCH 19/40] vdpa: should avoid map flushing with persistent iotlb Si-Wei Liu
2024-01-11  8:28   ` Jason Wang
2023-12-07 17:39 ` [PATCH 20/40] vdpa: avoid mapping flush across reset Si-Wei Liu
2024-01-11  8:30   ` Jason Wang
2023-12-07 17:39 ` [PATCH 21/40] vdpa: vhost_vdpa_dma_batch_end_once rename Si-Wei Liu
2024-01-15  2:40   ` Jason Wang
2024-01-15  2:52     ` Jason Wang
2023-12-07 17:39 ` [PATCH 22/40] vdpa: factor out vhost_vdpa_map_batch_begin Si-Wei Liu
2024-01-15  3:02   ` Jason Wang
2023-12-07 17:39 ` [PATCH 23/40] vdpa: vhost_vdpa_dma_batch_begin_once rename Si-Wei Liu
2024-01-15  3:03   ` Jason Wang
2023-12-07 17:39 ` [PATCH 24/40] vdpa: factor out vhost_vdpa_dma_batch_end Si-Wei Liu
2024-01-15  3:05   ` Jason Wang
2023-12-07 17:39 ` [PATCH 25/40] vdpa: add asid to dma_batch_once API Si-Wei Liu
2023-12-13 15:42   ` Eugenio Perez Martin
2024-01-15  3:07   ` Jason Wang
2023-12-07 17:39 ` [PATCH 26/40] vdpa: return int for " Si-Wei Liu
2023-12-07 17:39 ` [PATCH 27/40] vdpa: add asid to all dma_batch call sites Si-Wei Liu
2023-12-07 17:39 ` [PATCH 28/40] vdpa: support iotlb_batch_asid Si-Wei Liu
2023-12-13 15:42   ` Eugenio Perez Martin
2024-01-15  3:19   ` Jason Wang
2023-12-07 17:39 ` [PATCH 29/40] vdpa: expose API vhost_vdpa_dma_batch_once Si-Wei Liu
2023-12-13 15:42   ` Eugenio Perez Martin
2024-01-15  3:32   ` Jason Wang
2023-12-07 17:39 ` [PATCH 30/40] vdpa: batch map/unmap op per svq pair basis Si-Wei Liu
2024-01-15  3:33   ` Jason Wang
2023-12-07 17:39 ` [PATCH 31/40] vdpa: batch map and unmap around cvq svq start/stop Si-Wei Liu
2024-01-15  3:34   ` Jason Wang
2023-12-07 17:39 ` [PATCH 32/40] vdpa: factor out vhost_vdpa_net_get_nc_vdpa Si-Wei Liu
2024-01-15  3:35   ` Jason Wang
2023-12-07 17:39 ` [PATCH 33/40] vdpa: batch multiple dma_unmap to a single call for vm stop Si-Wei Liu
2023-12-13 16:46   ` Eugenio Perez Martin
2024-01-15  3:47   ` Jason Wang
2023-12-07 17:39 ` [PATCH 34/40] vdpa: fix network breakage after cancelling migration Si-Wei Liu
2024-01-15  3:48   ` Jason Wang
2023-12-07 17:39 ` [PATCH 35/40] vdpa: add vhost_vdpa_set_address_space_id trace Si-Wei Liu
2023-12-11 18:13   ` Eugenio Perez Martin
2024-01-15  3:50   ` Jason Wang
2023-12-07 17:39 ` [PATCH 36/40] vdpa: add vhost_vdpa_get_vring_base trace for svq mode Si-Wei Liu
2023-12-11 18:14   ` Eugenio Perez Martin
2024-01-15  3:52   ` Jason Wang
2023-12-07 17:39 ` [PATCH 37/40] vdpa: add vhost_vdpa_set_dev_vring_base " Si-Wei Liu
2023-12-11 18:14   ` Eugenio Perez Martin
2024-01-15  3:53   ` Jason Wang
2023-12-07 17:39 ` [PATCH 38/40] vdpa: add trace events for eval_flush Si-Wei Liu
2024-01-15  3:57   ` Jason Wang
2023-12-07 17:39 ` [PATCH 39/40] vdpa: add trace events for vhost_vdpa_net_load_cmd Si-Wei Liu
2023-12-11 18:14   ` Eugenio Perez Martin
2023-12-07 17:39 ` [PATCH 40/40] vdpa: add trace event for vhost_vdpa_net_load_mq Si-Wei Liu
2023-12-11 18:15   ` Eugenio Perez Martin
2024-01-15  3:58   ` Jason Wang
2023-12-11 18:39 ` [PATCH 00/40] vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB Eugenio Perez Martin
2024-01-11  8:21 ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.