All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ
@ 2022-04-13 16:31 Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ Eugenio Pérez
                   ` (24 more replies)
  0 siblings, 25 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support multiqueue and other
configurations.

Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
states, effectively intercepting them so qemu can track what regions of memory
are dirty because device action and needs migration. However, this does not
solve networking device state seen by the driver because CVQ messages, like
changes on MAC addresses from the driver.

To solve that, this series uses SVQ infraestructure proposed at SVQ to
intercept networking control messages used by the device. This way, qemu is
able to update VirtIONet device model and to migrate it.

You can run qemu in two modes after applying this series: only intercepting
cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:

-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on

The most updated kernel part of ASID is proposed at [1].

Other modes without x-cvq-svq have been not tested with this series. Other vq
cmd commands than set mac or mq are not tested. Some details like error control
are not 100% tested neither.

The firsts 5 patches will be or have already been proposed sepratedly. Patch 6
and 7 enable some pre-requisites. Patch 8 add cmdline parameter to shadow all
virtqueues. The rest of commits introduce the actual functionality.

Comments are welcomed.

Changes from rfc v6:
* Fix bad iotlb updates order when batching was enabled
* Add reference counting to iova_tree so cleaning is simpler.

Changes from rfc v5:
* Fixes bad calculus of cvq end group when MQ is not acked by the guest.

Changes from rfc v4:
* Add missing tracing
* Add multiqueue support
* Use already sent version for replacing g_memdup
* Care with memory management

Changes from rfc v3:
* Fix bad returning of descriptors to SVQ list.

Changes from rfc v2:
* Fix use-after-free.

Changes from rfc v1:
* Rebase to latest master.
* Configure ASID instead of assuming cvq asid != data vqs asid.
* Update device model so (MAC) state can be migrated too.

[1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/

Eugenio Pérez (24):
  vhost: Track descriptor chain in private at SVQ
  vdpa: Add missing tracing to batch mapping functions
  vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base
  util: Return void on iova_tree_remove
  vdpa: Send all updates in memory listener commit
  vhost: Add reference counting to vhost_iova_tree
  vdpa: Add x-svq to NetdevVhostVDPAOptions
  vhost: move descriptor translation to vhost_svq_vring_write_descs
  vdpa: Fix index calculus at vhost_vdpa_svqs_start
  virtio-net: Expose ctrl virtqueue logic
  vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  virtio: Make virtqueue_alloc_element non-static
  vhost: Add SVQElement
  vhost: Add custom used buffer callback
  vdpa: control virtqueue support on shadow virtqueue
  vhost: Add vhost_iova_tree_find
  vdpa: Add map/unmap operation callback to SVQ
  vhost: Add vhost_svq_inject
  vdpa: add NetClientState->start() callback
  vdpa: Add vhost_vdpa_start_control_svq
  vhost: Update kernel headers
  vhost: Make possible to check for device exclusive vq group
  vdpa: Add asid attribute to vdpa device
  vdpa: Add x-cvq-svq

Philippe Mathieu-Daudé (1):
  hw/virtio: Replace g_memdup() by g_memdup2()

 qapi/net.json                                |  13 +-
 hw/virtio/vhost-iova-tree.h                  |   7 +-
 hw/virtio/vhost-shadow-virtqueue.h           |  52 +++-
 include/hw/virtio/vhost-vdpa.h               |   4 +-
 include/hw/virtio/vhost.h                    |   6 +
 include/hw/virtio/virtio-net.h               |   3 +
 include/hw/virtio/virtio.h                   |   1 +
 include/net/net.h                            |   2 +
 include/qemu/iova-tree.h                     |   4 +-
 include/standard-headers/linux/vhost_types.h |  11 +-
 linux-headers/linux/vhost.h                  |  25 +-
 hw/net/vhost_net.c                           |  13 +-
 hw/net/virtio-net.c                          |  82 ++---
 hw/virtio/vhost-iova-tree.c                  |  35 ++-
 hw/virtio/vhost-shadow-virtqueue.c           | 265 +++++++++++++---
 hw/virtio/vhost-vdpa.c                       | 262 ++++++++++++----
 hw/virtio/virtio-crypto.c                    |   6 +-
 hw/virtio/virtio.c                           |   2 +-
 net/vhost-vdpa.c                             | 305 +++++++++++++++++--
 util/iova-tree.c                             |   4 +-
 hw/virtio/trace-events                       |   8 +-
 21 files changed, 930 insertions(+), 180 deletions(-)

-- 
2.27.0




^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  3:48   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 02/25] vdpa: Add missing tracing to batch mapping functions Eugenio Pérez
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Only the first one of them were properly enqueued back.

While we're at it, harden SVQ: The device could have access to modify
them, and it definitely have access when we implement packed vq. Harden
SVQ maintaining a private copy of the descriptor chain. Other fields
like buffer addresses are already maintained sepparatedly.

Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  6 ++++++
 hw/virtio/vhost-shadow-virtqueue.c | 27 +++++++++++++++++++++------
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index e5e24c536d..c132c994e9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -53,6 +53,12 @@ typedef struct VhostShadowVirtqueue {
     /* Next VirtQueue element that guest made available */
     VirtQueueElement *next_guest_avail_elem;
 
+    /*
+     * Backup next field for each descriptor so we can recover securely, not
+     * needing to trust the device access.
+     */
+    uint16_t *desc_next;
+
     /* Next head to expose to the device */
     uint16_t shadow_avail_idx;
 
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index b232803d1b..a2531d5874 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -138,6 +138,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     for (n = 0; n < num; n++) {
         if (more_descs || (n + 1 < num)) {
             descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
+            descs[i].next = cpu_to_le16(svq->desc_next[i]);
         } else {
             descs[i].flags = flags;
         }
@@ -145,10 +146,10 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
         descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
         last = i;
-        i = cpu_to_le16(descs[i].next);
+        i = cpu_to_le16(svq->desc_next[i]);
     }
 
-    svq->free_head = le16_to_cpu(descs[last].next);
+    svq->free_head = le16_to_cpu(svq->desc_next[last]);
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -333,13 +334,22 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
     svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
 }
 
+static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
+                                             uint16_t num, uint16_t i)
+{
+    for (uint16_t j = 0; j < num; ++j) {
+        i = le16_to_cpu(svq->desc_next[i]);
+    }
+
+    return i;
+}
+
 static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
                                            uint32_t *len)
 {
-    vring_desc_t *descs = svq->vring.desc;
     const vring_used_t *used = svq->vring.used;
     vring_used_elem_t used_elem;
-    uint16_t last_used;
+    uint16_t last_used, last_used_chain, num;
 
     if (!vhost_svq_more_used(svq)) {
         return NULL;
@@ -365,7 +375,10 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
         return NULL;
     }
 
-    descs[used_elem.id].next = svq->free_head;
+    num = svq->ring_id_maps[used_elem.id]->in_num +
+          svq->ring_id_maps[used_elem.id]->out_num;
+    last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
+    svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
 
     *len = used_elem.len;
@@ -540,8 +553,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
     memset(svq->vring.used, 0, device_size);
     svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+    svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
-        svq->vring.desc[i].next = cpu_to_le16(i + 1);
+        svq->desc_next[i] = cpu_to_le16(i + 1);
     }
 }
 
@@ -574,6 +588,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
         virtqueue_detach_element(svq->vq, next_avail_elem, 0);
     }
     svq->vq = NULL;
+    g_free(svq->desc_next);
     g_free(svq->ring_id_maps);
     qemu_vfree(svq->vring.desc);
     qemu_vfree(svq->vring.used);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 02/25] vdpa: Add missing tracing to batch mapping functions
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  3:49   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 03/25] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base Eugenio Pérez
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

These functions were not traced properly.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 2 ++
 hw/virtio/trace-events | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8adf7c0b92..9e5fe15d03 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -129,6 +129,7 @@ static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
         .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
     };
 
+    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -163,6 +164,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
     msg.type = v->msg_type;
     msg.iotlb.type = VHOST_IOTLB_BATCH_END;
 
+    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a5102eac9e..333348d9d5 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -25,6 +25,8 @@ vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%"
 # vhost-vdpa.c
 vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
 vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
+vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
 vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
 vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 03/25] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 02/25] vdpa: Add missing tracing to batch mapping functions Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  3:50   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 04/25] util: Return void on iova_tree_remove Eugenio Pérez
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9e5fe15d03..1f229ff4cb 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1172,11 +1172,11 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
                                        struct vhost_vring_state *ring)
 {
     struct vhost_vdpa *v = dev->opaque;
+    int vdpa_idx = ring->index - dev->vq_index;
     int ret;
 
     if (v->shadow_vqs_enabled) {
-        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
-                                                      ring->index);
+        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
 
         /*
          * Setting base as last used idx, so destination will see as available
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 04/25] util: Return void on iova_tree_remove
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (2 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 03/25] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  3:50   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2() Eugenio Pérez
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

It always returns IOVA_OK so nobody uses it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/qemu/iova-tree.h | 4 +---
 util/iova-tree.c         | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index c938fb0793..16bbfdf5f8 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -72,10 +72,8 @@ int iova_tree_insert(IOVATree *tree, const DMAMap *map);
  * provided.  The range does not need to be exactly what has inserted,
  * all the mappings that are included in the provided range will be
  * removed from the tree.  Here map->translated_addr is meaningless.
- *
- * Return: 0 if succeeded, or <0 if error.
  */
-int iova_tree_remove(IOVATree *tree, const DMAMap *map);
+void iova_tree_remove(IOVATree *tree, const DMAMap *map);
 
 /**
  * iova_tree_find:
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 6dff29c1f6..fee530a579 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -164,15 +164,13 @@ void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
     g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
 }
 
-int iova_tree_remove(IOVATree *tree, const DMAMap *map)
+void iova_tree_remove(IOVATree *tree, const DMAMap *map)
 {
     const DMAMap *overlap;
 
     while ((overlap = iova_tree_find(tree, map))) {
         g_tree_remove(tree->tree, overlap);
     }
-
-    return IOVA_OK;
 }
 
 /**
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2()
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (3 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 04/25] util: Return void on iova_tree_remove Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  3:51   ` Jason Wang
  2022-04-14  4:01   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit Eugenio Pérez
                   ` (19 subsequent siblings)
  24 siblings, 2 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538

  The old API took the size of the memory to duplicate as a guint,
  whereas most memory functions take memory sizes as a gsize. This
  made it easy to accidentally pass a gsize to g_memdup(). For large
  values, that would lead to a silent truncation of the size from 64
  to 32 bits, and result in a heap area being returned which is
  significantly smaller than what the caller expects. This can likely
  be exploited in various modules to cause a heap buffer overflow.

Replace g_memdup() by the safer g_memdup2() wrapper.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 hw/net/virtio-net.c       | 3 ++-
 hw/virtio/virtio-crypto.c | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1067e72b39..e4748a7e6c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1443,7 +1443,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
         }
 
         iov_cnt = elem->out_num;
-        iov2 = iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
+        iov2 = iov = g_memdup2(elem->out_sg,
+                               sizeof(struct iovec) * elem->out_num);
         s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
         iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
         if (s != sizeof(ctrl)) {
diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index dcd80b904d..0e31e3cc04 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -242,7 +242,7 @@ static void virtio_crypto_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
         }
 
         out_num = elem->out_num;
-        out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
+        out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
         out_iov = out_iov_copy;
 
         in_num = elem->in_num;
@@ -605,11 +605,11 @@ virtio_crypto_handle_request(VirtIOCryptoReq *request)
     }
 
     out_num = elem->out_num;
-    out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
+    out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
     out_iov = out_iov_copy;
 
     in_num = elem->in_num;
-    in_iov_copy = g_memdup(elem->in_sg, sizeof(in_iov[0]) * in_num);
+    in_iov_copy = g_memdup2(elem->in_sg, sizeof(in_iov[0]) * in_num);
     in_iov = in_iov_copy;
 
     if (unlikely(iov_to_buf(out_iov, out_num, 0, &req, sizeof(req))
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (4 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2() Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  4:11   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 07/25] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

With the introduction of many ASID it can happen that many changes on
different listeners come before the commit call. Since kernel vhost-vdpa
still does not support it, send it all in one shot.

This also have one extra advantage: If there is no update to notify, we
save the iotlb_{begin,end} calls.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost-vdpa.h |  2 +-
 hw/virtio/vhost-vdpa.c         | 69 +++++++++++++++++-----------------
 2 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a29dbb3f53..4961acea8b 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -27,7 +27,7 @@ typedef struct vhost_vdpa {
     int device_fd;
     int index;
     uint32_t msg_type;
-    bool iotlb_batch_begin_sent;
+    GArray *iotlb_updates;
     MemoryListener listener;
     struct vhost_vdpa_iova_range iova_range;
     uint64_t acked_features;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 1f229ff4cb..27ee678dc9 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -85,6 +85,11 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
     msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
     msg.iotlb.type = VHOST_IOTLB_UPDATE;
 
+    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_BATCH)) {
+        g_array_append_val(v->iotlb_updates, msg);
+        return 0;
+    }
+
    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
                             msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
 
@@ -109,6 +114,11 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
     msg.iotlb.size = size;
     msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
 
+    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_BATCH)) {
+        g_array_append_val(v->iotlb_updates, msg);
+        return 0;
+    }
+
     trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
                                msg.iotlb.size, msg.iotlb.type);
 
@@ -121,56 +131,47 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
     return ret;
 }
 
-static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
-{
-    int fd = v->device_fd;
-    struct vhost_msg_v2 msg = {
-        .type = v->msg_type,
-        .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
-    };
-
-    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
-    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
-        error_report("failed to write, fd=%d, errno=%d (%s)",
-                     fd, errno, strerror(errno));
-    }
-}
-
-static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
-{
-    if (v->dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH) &&
-        !v->iotlb_batch_begin_sent) {
-        vhost_vdpa_listener_begin_batch(v);
-    }
-
-    v->iotlb_batch_begin_sent = true;
-}
-
 static void vhost_vdpa_listener_commit(MemoryListener *listener)
 {
     struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
-    struct vhost_dev *dev = v->dev;
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
+    size_t num = v->iotlb_updates->len;
 
-    if (!(dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
+    if (!num) {
         return;
     }
 
-    if (!v->iotlb_batch_begin_sent) {
-        return;
+    msg.type = v->msg_type;
+    msg.iotlb.type = VHOST_IOTLB_BATCH_BEGIN;
+    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
+    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
+        error_report("failed to write BEGIN_BATCH, fd=%d, errno=%d (%s)",
+                     fd, errno, strerror(errno));
+        goto done;
     }
 
-    msg.type = v->msg_type;
-    msg.iotlb.type = VHOST_IOTLB_BATCH_END;
+    for (size_t i = 0; i < num; ++i) {
+        struct vhost_msg_v2 *update = &g_array_index(v->iotlb_updates,
+                                                     struct vhost_msg_v2, i);
+        if (write(fd, update, sizeof(*update)) != sizeof(*update)) {
+            error_report("failed to write dma update, fd=%d, errno=%d (%s)",
+                         fd, errno, strerror(errno));
+            goto done;
+        }
+    }
 
+    msg.iotlb.type = VHOST_IOTLB_BATCH_END;
     trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
     }
 
-    v->iotlb_batch_begin_sent = false;
+done:
+    g_array_set_size(v->iotlb_updates, 0);
+    return;
+
 }
 
 static void vhost_vdpa_listener_region_add(MemoryListener *listener,
@@ -227,7 +228,6 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
         iova = mem_region.iova;
     }
 
-    vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
                              vaddr, section->readonly);
     if (ret) {
@@ -292,7 +292,6 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
         iova = result->iova;
         vhost_iova_tree_remove(v->iova_tree, &mem_region);
     }
-    vhost_vdpa_iotlb_batch_begin_once(v);
     ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
     if (ret) {
         error_report("vhost_vdpa dma unmap error!");
@@ -446,6 +445,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     dev->opaque =  opaque ;
     v->listener = vhost_vdpa_memory_listener;
     v->msg_type = VHOST_IOTLB_MSG_V2;
+    v->iotlb_updates = g_array_new(false, false, sizeof(struct vhost_msg_v2));
     ret = vhost_vdpa_init_svq(dev, v, errp);
     if (ret) {
         goto err;
@@ -579,6 +579,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
     trace_vhost_vdpa_cleanup(dev, v);
     vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     memory_listener_unregister(&v->listener);
+    g_array_free(v->iotlb_updates, true);
     vhost_vdpa_svq_cleanup(dev);
 
     dev->opaque = NULL;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 07/25] vhost: Add reference counting to vhost_iova_tree
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (5 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  5:30   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Now that different vqs can have different ASIDs its easier to track them
using reference counters.

QEMU's glib version still does not have them so we've copied g_rc_box,
so the implementation can be converted to glib's one when the minimum
version is raised.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  5 +++--
 hw/virtio/vhost-iova-tree.c | 21 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 6a4f24e0f9..2fc825d7b1 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -16,8 +16,9 @@
 typedef struct VhostIOVATree VhostIOVATree;
 
 VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
-void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree);
+void vhost_iova_tree_release(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_release);
 
 const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
                                         const DMAMap *map);
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 55fed1fefb..31445cbdfc 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -28,6 +28,9 @@ struct VhostIOVATree {
 
     /* IOVA address to qemu memory maps. */
     IOVATree *iova_taddr_map;
+
+    /* Reference count */
+    size_t refcnt;
 };
 
 /**
@@ -44,14 +47,28 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
     tree->iova_last = iova_last;
 
     tree->iova_taddr_map = iova_tree_new();
+    tree->refcnt = 1;
     return tree;
 }
 
 /**
- * Delete an iova tree
+ * Increases the reference count of the iova tree
+ */
+VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree)
+{
+    ++iova_tree->refcnt;
+    return iova_tree;
+}
+
+/**
+ * Decrease reference counter of iova tree, freeing if it reaches 0
  */
-void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
+void vhost_iova_tree_release(VhostIOVATree *iova_tree)
 {
+    if (--iova_tree->refcnt) {
+        return;
+    }
+
     iova_tree_destroy(iova_tree->iova_taddr_map);
     g_free(iova_tree);
 }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (6 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 07/25] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  5:32   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 09/25] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |  9 ++++++++-
 net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index b92f3f5fb4..92848e4362 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,19 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #          (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+#         (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
     '*vhostdev':     'str',
-    '*queues':       'int' } }
+    '*queues':       'int',
+    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetClientDriver:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1e9fe47c03..9261101af2 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -128,6 +128,7 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+    g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_release);
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -187,13 +188,23 @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static int vhost_vdpa_get_iova_range(int fd,
+                                     struct vhost_vdpa_iova_range *iova_range)
+{
+    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+    return ret < 0 ? -errno : 0;
+}
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-                                           const char *device,
-                                           const char *name,
-                                           int vdpa_device_fd,
-                                           int queue_pair_index,
-                                           int nvqs,
-                                           bool is_datapath)
+                                       const char *device,
+                                       const char *name,
+                                       int vdpa_device_fd,
+                                       int queue_pair_index,
+                                       int nvqs,
+                                       bool is_datapath,
+                                       bool svq,
+                                       VhostIOVATree *iova_tree)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -211,8 +222,14 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
+                              NULL;
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
+        if (iova_tree) {
+            vhost_iova_tree_release(iova_tree);
+        }
         qemu_del_net_client(nc);
         return NULL;
     }
@@ -266,6 +283,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, i, has_cvq = 0;
+    g_autoptr(VhostIOVATree) iova_tree = NULL;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -285,19 +303,31 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         qemu_close(vdpa_device_fd);
         return queue_pairs;
     }
+    if (opts->x_svq) {
+        struct vhost_vdpa_iova_range iova_range;
+
+        if (has_cvq) {
+            error_setg(errp, "vdpa svq does not work with cvq");
+            goto err_svq;
+        }
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    }
 
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true);
+                                     vdpa_device_fd, i, 2, true, opts->x_svq,
+                                     iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false);
+                                 vdpa_device_fd, i, 1, false, opts->x_svq,
+                                 iova_tree);
         if (!nc)
             goto err;
     }
@@ -308,6 +338,8 @@ err:
     if (i) {
         qemu_del_net_client(ncs[0]);
     }
+
+err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 09/25] vhost: move descriptor translation to vhost_svq_vring_write_descs
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (7 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  5:48   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 10/25] vdpa: Fix index calculus at vhost_vdpa_svqs_start Eugenio Pérez
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

It's done for both in and out descriptors so it's better placed here.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index a2531d5874..f874374651 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -122,17 +122,23 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
     return true;
 }
 
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-                                    const struct iovec *iovec, size_t num,
-                                    bool more_descs, bool write)
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+                                        const struct iovec *iovec, size_t num,
+                                        bool more_descs, bool write)
 {
     uint16_t i = svq->free_head, last = svq->free_head;
     unsigned n;
     uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
     vring_desc_t *descs = svq->vring.desc;
+    bool ok;
 
     if (num == 0) {
-        return;
+        return true;
+    }
+
+    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+    if (unlikely(!ok)) {
+        return false;
     }
 
     for (n = 0; n < num; n++) {
@@ -150,6 +156,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     }
 
     svq->free_head = le16_to_cpu(svq->desc_next[last]);
+    return true;
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -169,21 +176,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
         return false;
     }
 
-    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+                                     elem->in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
-    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-                            elem->in_num > 0, false);
 
-
-    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
+                                     true);
     if (unlikely(!ok)) {
         return false;
     }
 
-    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-
     /*
      * Put the entry in the available array (but don't update avail->idx until
      * they do sync).
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 10/25] vdpa: Fix index calculus at vhost_vdpa_svqs_start
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (8 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 09/25] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  5:59   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 11/25] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 27ee678dc9..6b370c918c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1019,7 +1019,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
         VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
         struct vhost_vring_addr addr = {
-            .index = i,
+            .index = dev->vq_index + i,
         };
         int r;
         bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 11/25] virtio-net: Expose ctrl virtqueue logic
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (9 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 10/25] vdpa: Fix index calculus at vhost_vdpa_svqs_start Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 12/25] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

This allows external vhost-net devices to modify the state of the
VirtIO device model once vhost-vdpa device has acknowledge the control
commands.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio-net.h |  3 ++
 hw/net/virtio-net.c            | 83 ++++++++++++++++++++--------------
 2 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032627..e62f9e227f 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -218,6 +218,9 @@ struct VirtIONet {
     struct EBPFRSSContext ebpf_rss;
 };
 
+unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                    const struct iovec *in_sg, size_t in_num,
+                                    struct iovec *out_sg, unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                    const char *type);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e4748a7e6c..5905a9285c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1419,57 +1419,70 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
     return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                    const struct iovec *in_sg, size_t in_num,
+                                    struct iovec *out_sg, unsigned out_num)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_ctrl_hdr ctrl;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-    VirtQueueElement *elem;
     size_t s;
     struct iovec *iov, *iov2;
-    unsigned int iov_cnt;
+
+    if (iov_size(in_sg, in_num) < sizeof(status) ||
+        iov_size(out_sg, out_num) < sizeof(ctrl)) {
+        virtio_error(vdev, "virtio-net ctrl missing headers");
+        return 0;
+    }
+
+    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
+    iov_discard_front(&iov, &out_num, sizeof(ctrl));
+    if (s != sizeof(ctrl)) {
+        status = VIRTIO_NET_ERR;
+    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+    }
+
+    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
+    assert(s == sizeof(status));
+
+    g_free(iov2);
+    return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtQueueElement *elem;
 
     for (;;) {
+        unsigned written;
         elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
         if (!elem) {
             break;
         }
-        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-            virtio_error(vdev, "virtio-net ctrl missing headers");
+
+        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+                                             elem->out_sg, elem->out_num);
+        if (written > 0) {
+            virtqueue_push(vq, elem, written);
+            virtio_notify(vdev, vq);
+            g_free(elem);
+        } else {
             virtqueue_detach_element(vq, elem, 0);
             g_free(elem);
             break;
         }
-
-        iov_cnt = elem->out_num;
-        iov2 = iov = g_memdup2(elem->out_sg,
-                               sizeof(struct iovec) * elem->out_num);
-        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
-        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
-        if (s != sizeof(ctrl)) {
-            status = VIRTIO_NET_ERR;
-        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
-            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
-            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
-        }
-
-        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
-        assert(s == sizeof(status));
-
-        virtqueue_push(vq, elem, sizeof(status));
-        virtio_notify(vdev, vq);
-        g_free(iov2);
-        g_free(elem);
     }
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 12/25] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (10 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 11/25] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 13/25] virtio: Make virtqueue_alloc_element non-static Eugenio Pérez
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

To know the device features is also needed for CVQ SVQ. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Report errno in case of failure getting them while we're at it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 9261101af2..a8dde49198 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -236,20 +236,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+    int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+    if (ret) {
+        error_setg_errno(errp, errno,
+                         "Fail to query features from vhost-vDPA device");
+    }
+    return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+                                          int *has_cvq, Error **errp)
 {
     unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
     g_autofree struct vhost_vdpa_config *config = NULL;
     __virtio16 *max_queue_pairs;
-    uint64_t features;
     int ret;
 
-    ret = ioctl(fd, VHOST_GET_FEATURES, &features);
-    if (ret) {
-        error_setg(errp, "Fail to query features from vhost-vDPA device");
-        return ret;
-    }
-
     if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
         *has_cvq = 1;
     } else {
@@ -279,10 +283,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp)
 {
     const NetdevVhostVDPAOptions *opts;
+    uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
-    int queue_pairs, i, has_cvq = 0;
+    int queue_pairs, r, i, has_cvq = 0;
     g_autoptr(VhostIOVATree) iova_tree = NULL;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
@@ -297,7 +302,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return -errno;
     }
 
-    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+    r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
+    if (r) {
+        return r;
+    }
+
+    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
                                                  &has_cvq, errp);
     if (queue_pairs < 0) {
         qemu_close(vdpa_device_fd);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 13/25] virtio: Make virtqueue_alloc_element non-static
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (11 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 12/25] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 14/25] vhost: Add SVQElement Eugenio Pérez
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

So SVQ can allocate elements using it

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/virtio.h | 1 +
 hw/virtio/virtio.c         | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b31c4507f5..1e85833897 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -195,6 +195,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx);
 
 void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem);
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num);
 void *virtqueue_pop(VirtQueue *vq, size_t sz);
 unsigned int virtqueue_drop_all(VirtQueue *vq);
 void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9d637e043e..17cbbb5fca 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1376,7 +1376,7 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
                                                                         false);
 }
 
-static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
 {
     VirtQueueElement *elem;
     size_t in_addr_ofs = QEMU_ALIGN_UP(sz, __alignof__(elem->in_addr[0]));
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 14/25] vhost: Add SVQElement
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (12 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 13/25] virtio: Make virtqueue_alloc_element non-static Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 15/25] vhost: Add custom used buffer callback Eugenio Pérez
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

This allows SVQ to add metadata to the different queue elements

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |  8 ++++--
 hw/virtio/vhost-shadow-virtqueue.c | 46 ++++++++++++++++--------------
 2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index c132c994e9..f35d4b8f90 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,10 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQElement {
+    VirtQueueElement elem;
+} SVQElement;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -48,10 +52,10 @@ typedef struct VhostShadowVirtqueue {
     VhostIOVATree *iova_tree;
 
     /* Map for use the guest's descriptors */
-    VirtQueueElement **ring_id_maps;
+    SVQElement **ring_id_maps;
 
     /* Next VirtQueue element that guest made available */
-    VirtQueueElement *next_guest_avail_elem;
+    SVQElement *next_guest_avail_elem;
 
     /*
      * Backup next field for each descriptor so we can recover securely, not
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index f874374651..1702365475 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -159,9 +159,10 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     return true;
 }
 
-static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                VirtQueueElement *elem, unsigned *head)
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
+                                unsigned *head)
 {
+    const VirtQueueElement *elem = &svq_elem->elem;
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
     bool ok;
@@ -203,7 +204,7 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
     return true;
 }
 
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
 {
     unsigned qemu_head;
     bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
@@ -252,19 +253,21 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
         virtio_queue_set_notification(svq->vq, false);
 
         while (true) {
+            SVQElement *svq_elem;
             VirtQueueElement *elem;
             bool ok;
 
             if (svq->next_guest_avail_elem) {
-                elem = g_steal_pointer(&svq->next_guest_avail_elem);
+                svq_elem = g_steal_pointer(&svq->next_guest_avail_elem);
             } else {
-                elem = virtqueue_pop(svq->vq, sizeof(*elem));
+                svq_elem = virtqueue_pop(svq->vq, sizeof(*svq_elem));
             }
 
-            if (!elem) {
+            if (!svq_elem) {
                 break;
             }
 
+            elem = &svq_elem->elem;
             if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
                 /*
                  * This condition is possible since a contiguous buffer in GPA
@@ -277,11 +280,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                  * queue the current guest descriptor and ignore further kicks
                  * until some elements are used.
                  */
-                svq->next_guest_avail_elem = elem;
+                svq->next_guest_avail_elem = svq_elem;
                 return;
             }
 
-            ok = vhost_svq_add(svq, elem);
+            ok = vhost_svq_add(svq, svq_elem);
             if (unlikely(!ok)) {
                 /* VQ is broken, just return and ignore any other kicks */
                 return;
@@ -348,8 +351,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
     return i;
 }
 
-static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
-                                           uint32_t *len)
+static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
 {
     const vring_used_t *used = svq->vring.used;
     vring_used_elem_t used_elem;
@@ -379,8 +381,8 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
         return NULL;
     }
 
-    num = svq->ring_id_maps[used_elem.id]->in_num +
-          svq->ring_id_maps[used_elem.id]->out_num;
+    num = svq->ring_id_maps[used_elem.id]->elem.in_num +
+          svq->ring_id_maps[used_elem.id]->elem.out_num;
     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
     svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
@@ -401,11 +403,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
         vhost_svq_disable_notification(svq);
         while (true) {
             uint32_t len;
-            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
-            if (!elem) {
+            g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq, &len);
+            VirtQueueElement *elem;
+            if (!svq_elem) {
                 break;
             }
 
+            elem = &svq_elem->elem;
             if (unlikely(i >= svq->vring.num)) {
                 qemu_log_mask(LOG_GUEST_ERROR,
                          "More than %u used buffers obtained in a %u size SVQ",
@@ -556,7 +560,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
     memset(svq->vring.used, 0, device_size);
-    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+    svq->ring_id_maps = g_new0(SVQElement *, svq->vring.num);
     svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
         svq->desc_next[i] = cpu_to_le16(i + 1);
@@ -570,7 +574,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
 void vhost_svq_stop(VhostShadowVirtqueue *svq)
 {
     event_notifier_set_handler(&svq->svq_kick, NULL);
-    g_autofree VirtQueueElement *next_avail_elem = NULL;
+    g_autofree SVQElement *next_avail_elem = NULL;
 
     if (!svq->vq) {
         return;
@@ -580,16 +584,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
     vhost_svq_flush(svq, false);
 
     for (unsigned i = 0; i < svq->vring.num; ++i) {
-        g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->ring_id_maps[i]);
-        if (elem) {
-            virtqueue_detach_element(svq->vq, elem, 0);
+        g_autofree SVQElement *svq_elem = NULL;
+        svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
+        if (svq_elem) {
+            virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
         }
     }
 
     next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
     if (next_avail_elem) {
-        virtqueue_detach_element(svq->vq, next_avail_elem, 0);
+        virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
     }
     svq->vq = NULL;
     g_free(svq->desc_next);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 15/25] vhost: Add custom used buffer callback
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (13 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 14/25] vhost: Add SVQElement Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

The callback allows SVQ users to know the VirtQueue requests and
responses. QEMU can use this to synchronize virtio device model state,
allowing to migrate it with minimum changes to the migration code.

In the case of networking, this will be used to inspect control
virtqueue messages.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 16 +++++++++++++++-
 include/hw/virtio/vhost-vdpa.h     |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c |  9 ++++++++-
 hw/virtio/vhost-vdpa.c             |  3 ++-
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index f35d4b8f90..2809dee27b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -19,6 +19,13 @@ typedef struct SVQElement {
     VirtQueueElement elem;
 } SVQElement;
 
+typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
+                                         const VirtQueueElement *elem);
+
+typedef struct VhostShadowVirtqueueOps {
+    VirtQueueElementCallback used_elem_handler;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -63,6 +70,12 @@ typedef struct VhostShadowVirtqueue {
      */
     uint16_t *desc_next;
 
+    /* Optional callbacks */
+    const VhostShadowVirtqueueOps *ops;
+
+    /* Optional custom used virtqueue element handler */
+    VirtQueueElementCallback used_elem_cb;
+
     /* Next head to expose to the device */
     uint16_t shadow_avail_idx;
 
@@ -89,7 +102,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 4961acea8b..8b8834dd24 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -17,6 +17,7 @@
 #include "hw/virtio/vhost-iova-tree.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 
 typedef struct VhostVDPAHostNotifier {
     MemoryRegion mr;
@@ -35,6 +36,7 @@ typedef struct vhost_vdpa {
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
+    const VhostShadowVirtqueueOps *shadow_vq_ops;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 1702365475..72a403d90b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -419,6 +419,10 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 return;
             }
             virtqueue_fill(vq, elem, len, i++);
+
+            if (svq->ops && svq->ops->used_elem_handler) {
+                svq->ops->used_elem_handler(svq->vdev, elem);
+            }
         }
 
         virtqueue_flush(vq, i);
@@ -607,12 +611,14 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ operations hooks
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -634,6 +640,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
+    svq->ops = ops;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 6b370c918c..9e62f3280d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -410,7 +410,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
+        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
+                                                            v->shadow_vq_ops);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (14 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 15/25] vhost: Add custom used buffer callback Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  9:10   ` Jason Wang
  2022-04-13 16:31 ` [RFC PATCH v7 17/25] vhost: Add vhost_iova_tree_find Eugenio Pérez
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like multiqueue.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
support and virtio-net driver changes MAC or the number of queues
virtio-net device model will be updated with the new one.

Others cvq commands could be added here straightforwardly but they have
been not tested.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 3 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a8dde49198..38e6912255 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
@@ -69,6 +70,30 @@ const int vdpa_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+    BIT_ULL(VIRTIO_NET_F_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
+    BIT_ULL(VIRTIO_NET_F_MTU) |
+    BIT_ULL(VIRTIO_NET_F_MAC) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+    BIT_ULL(VIRTIO_NET_F_STATUS) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+    BIT_ULL(VIRTIO_NET_F_MQ) |
+    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+    BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -196,6 +221,46 @@ static int vhost_vdpa_get_iova_range(int fd,
     return ret < 0 ? -errno : 0;
 }
 
+static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
+                                       const VirtQueueElement *elem)
+{
+    struct virtio_net_ctrl_hdr ctrl;
+    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+    size_t s;
+    struct iovec in = {
+        .iov_base = &status,
+        .iov_len = sizeof(status),
+    };
+
+    s = iov_to_buf(elem->out_sg, elem->out_num, 0, &ctrl, sizeof(ctrl.class));
+    if (s != sizeof(ctrl.class)) {
+        return;
+    }
+
+    switch (ctrl.class) {
+    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
+    case VIRTIO_NET_CTRL_MQ:
+        break;
+    default:
+        return;
+    };
+
+    s = iov_to_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
+    if (s != sizeof(status) || status != VIRTIO_NET_OK) {
+        return;
+    }
+
+    status = VIRTIO_NET_ERR;
+    virtio_net_handle_ctrl_iov(vdev, &in, 1, elem->out_sg, elem->out_num);
+    if (status != VIRTIO_NET_OK) {
+        error_report("Bad CVQ processing in model");
+    }
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+    .used_elem_handler = vhost_vdpa_net_handle_ctrl,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                        const char *device,
                                        const char *name,
@@ -225,6 +290,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.shadow_vqs_enabled = svq;
     s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
                               NULL;
+    if (!is_datapath) {
+        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+    }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         if (iova_tree) {
@@ -315,9 +383,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     }
     if (opts->x_svq) {
         struct vhost_vdpa_iova_range iova_range;
-
-        if (has_cvq) {
-            error_setg(errp, "vdpa svq does not work with cvq");
+        uint64_t invalid_dev_features =
+            features & ~vdpa_svq_device_features &
+            /* Transport are all accepted at this point */
+            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
+
+        if (invalid_dev_features) {
+            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
+                       invalid_dev_features);
             goto err_svq;
         }
         vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 17/25] vhost: Add vhost_iova_tree_find
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (15 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-13 16:31 ` [RFC PATCH v7 18/25] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Just a simple wrapper so we can find DMAMap entries based on iova

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  2 ++
 hw/virtio/vhost-iova-tree.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 2fc825d7b1..bacd17d99c 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -20,6 +20,8 @@ VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree);
 void vhost_iova_tree_release(VhostIOVATree *iova_tree);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_release);
 
+const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
+                                   const DMAMap *map);
 const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
                                         const DMAMap *map);
 int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 31445cbdfc..c3d89a85ad 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -73,6 +73,20 @@ void vhost_iova_tree_release(VhostIOVATree *iova_tree)
     g_free(iova_tree);
 }
 
+/**
+ * Find a mapping in the tree that matches map
+ *
+ * @iova_tree  The iova tree
+ * @map        The map
+ *
+ * Return a matching map that contains argument map or NULL
+ */
+const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
+                                   const DMAMap *map)
+{
+    return iova_tree_find(iova_tree->iova_taddr_map, map);
+}
+
 /**
  * Find the IOVA address stored from a memory address
  *
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 18/25] vdpa: Add map/unmap operation callback to SVQ
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (16 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 17/25] vhost: Add vhost_iova_tree_find Eugenio Pérez
@ 2022-04-13 16:31 ` Eugenio Pérez
  2022-04-14  9:13   ` Jason Wang
  2022-04-13 16:32 ` [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject Eugenio Pérez
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h | 21 +++++++++++++++++++--
 hw/virtio/vhost-shadow-virtqueue.c |  8 +++++++-
 hw/virtio/vhost-vdpa.c             | 20 +++++++++++++++++++-
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 2809dee27b..e06ac52158 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -26,6 +26,15 @@ typedef struct VhostShadowVirtqueueOps {
     VirtQueueElementCallback used_elem_handler;
 } VhostShadowVirtqueueOps;
 
+typedef int (*vhost_svq_map_op)(hwaddr iova, hwaddr size, void *vaddr,
+                                bool readonly, void *opaque);
+typedef int (*vhost_svq_unmap_op)(hwaddr iova, hwaddr size, void *opaque);
+
+typedef struct VhostShadowVirtqueueMapOps {
+    vhost_svq_map_op map;
+    vhost_svq_unmap_op unmap;
+} VhostShadowVirtqueueMapOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -73,6 +82,12 @@ typedef struct VhostShadowVirtqueue {
     /* Optional callbacks */
     const VhostShadowVirtqueueOps *ops;
 
+    /* Device memory mapping callbacks */
+    const VhostShadowVirtqueueMapOps *map_ops;
+
+    /* Device memory mapping callbacks opaque */
+    void *map_ops_opaque;
+
     /* Optional custom used virtqueue element handler */
     VirtQueueElementCallback used_elem_cb;
 
@@ -102,8 +117,10 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-                                    const VhostShadowVirtqueueOps *ops);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
+                                    const VhostShadowVirtqueueOps *ops,
+                                    const VhostShadowVirtqueueMapOps *map_ops,
+                                    void *map_ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 72a403d90b..87980e2a9c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -612,13 +612,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  *
  * @iova_tree: Tree to perform descriptors translations
  * @ops: SVQ operations hooks
+ * @map_ops: SVQ mapping operation hooks
+ * @map_ops_opaque: Opaque data to pass to mapping operations
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-                                    const VhostShadowVirtqueueOps *ops)
+                                    const VhostShadowVirtqueueOps *ops,
+                                    const VhostShadowVirtqueueMapOps *map_ops,
+                                    void *map_ops_opaque)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -641,6 +645,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
     svq->ops = ops;
+    svq->map_ops = map_ops;
+    svq->map_ops_opaque = map_ops_opaque;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 9e62f3280d..1948c5ca7d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -384,6 +384,22 @@ static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
     return ret;
 }
 
+static int vhost_vdpa_svq_map(hwaddr iova, hwaddr size, void *vaddr,
+                              bool readonly, void *opaque)
+{
+    return vhost_vdpa_dma_map(opaque, iova, size, vaddr, readonly);
+}
+
+static int vhost_vdpa_svq_unmap(hwaddr iova, hwaddr size, void *opaque)
+{
+    return vhost_vdpa_dma_unmap(opaque, iova, size);
+}
+
+static const VhostShadowVirtqueueMapOps vhost_vdpa_svq_map_ops = {
+    .map = vhost_vdpa_svq_map,
+    .unmap = vhost_vdpa_svq_unmap,
+};
+
 static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
                                Error **errp)
 {
@@ -411,7 +427,9 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
         g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
-                                                            v->shadow_vq_ops);
+                                                       v->shadow_vq_ops,
+                                                       &vhost_vdpa_svq_map_ops,
+                                                       v);
 
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (17 preceding siblings ...)
  2022-04-13 16:31 ` [RFC PATCH v7 18/25] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-14  9:09   ` Jason Wang
  2022-04-13 16:32 ` [RFC PATCH v7 20/25] vdpa: add NetClientState->start() callback Eugenio Pérez
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

This allows qemu to inject packets to the device without guest's notice.

This will be use to inject net CVQ messages to restore status in the destination

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.h |   5 +
 hw/virtio/vhost-shadow-virtqueue.c | 179 +++++++++++++++++++++++++----
 2 files changed, 160 insertions(+), 24 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index e06ac52158..2a5229e77f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,9 @@
 
 typedef struct SVQElement {
     VirtQueueElement elem;
+    hwaddr in_iova;
+    hwaddr out_iova;
+    bool not_from_guest;
 } SVQElement;
 
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
@@ -106,6 +109,8 @@ typedef struct VhostShadowVirtqueue {
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
+bool vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+                      size_t out_num, size_t in_num);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 87980e2a9c..f3600df133 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -16,6 +16,7 @@
 #include "qemu/log.h"
 #include "qemu/memalign.h"
 #include "linux-headers/linux/vhost.h"
+#include "qemu/iov.h"
 
 /**
  * Validate the transport device features that both guests can use with the SVQ
@@ -122,7 +123,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
     return true;
 }
 
-static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
+                                        SVQElement *svq_elem, hwaddr *sg,
                                         const struct iovec *iovec, size_t num,
                                         bool more_descs, bool write)
 {
@@ -130,15 +132,39 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
     unsigned n;
     uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
     vring_desc_t *descs = svq->vring.desc;
-    bool ok;
 
     if (num == 0) {
         return true;
     }
 
-    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
-    if (unlikely(!ok)) {
-        return false;
+    if (svq_elem->not_from_guest) {
+        DMAMap map = {
+            .translated_addr = (hwaddr)iovec->iov_base,
+            .size = ROUND_UP(iovec->iov_len, 4096) - 1,
+            .perm = write ? IOMMU_RW : IOMMU_RO,
+        };
+        int r;
+
+        if (unlikely(num != 1)) {
+            error_report("Unexpected chain of element injected");
+            return false;
+        }
+        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
+        if (unlikely(r != IOVA_OK)) {
+            error_report("Cannot map injected element");
+            return false;
+        }
+
+        r = svq->map_ops->map(map.iova, map.size + 1,
+                              (void *)map.translated_addr, !write,
+                              svq->map_ops_opaque);
+        assert(r == 0);
+        sg[0] = map.iova;
+    } else {
+        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+        if (unlikely(!ok)) {
+            return false;
+        }
     }
 
     for (n = 0; n < num; n++) {
@@ -166,7 +192,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
     bool ok;
-    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+    g_autofree hwaddr *sgs = NULL;
+    hwaddr *in_sgs, *out_sgs;
 
     *head = svq->free_head;
 
@@ -177,15 +204,23 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-                                     elem->in_num > 0, false);
+    if (!svq_elem->not_from_guest) {
+        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+        in_sgs = out_sgs = sgs;
+    } else {
+        in_sgs = &svq_elem->in_iova;
+        out_sgs = &svq_elem->out_iova;
+    }
+    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, elem->out_sg,
+                                     elem->out_num, elem->in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
-                                     true);
+    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, elem->in_sg,
+                                     elem->in_num, false, true);
     if (unlikely(!ok)) {
+        /* TODO unwind out_sg */
         return false;
     }
 
@@ -230,6 +265,43 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
     event_notifier_set(&svq->hdev_kick);
 }
 
+bool vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+                      size_t out_num, size_t in_num)
+{
+    size_t out_size = iov_size(iov, out_num);
+    size_t out_buf_size = ROUND_UP(out_size, 4096);
+    size_t in_size = iov_size(iov + out_num, in_num);
+    size_t in_buf_size = ROUND_UP(in_size, 4096);
+    SVQElement *svq_elem;
+    uint16_t num_slots = (in_num ? 1 : 0) + (out_num ? 1 : 0);
+
+    if (unlikely(num_slots == 0 || svq->next_guest_avail_elem ||
+                 vhost_svq_available_slots(svq) < num_slots)) {
+        return false;
+    }
+
+    svq_elem = virtqueue_alloc_element(sizeof(SVQElement), 1, 1);
+    if (out_num) {
+        void *out = qemu_memalign(4096, out_buf_size);
+        svq_elem->elem.out_sg[0].iov_base = out;
+        svq_elem->elem.out_sg[0].iov_len = out_size;
+        iov_to_buf(iov, out_num, 0, out, out_size);
+        memset(out + out_size, 0, out_buf_size - out_size);
+    }
+    if (in_num) {
+        void *in = qemu_memalign(4096, in_buf_size);
+        svq_elem->elem.in_sg[0].iov_base = in;
+        svq_elem->elem.in_sg[0].iov_len = in_size;
+        memset(in, 0, in_buf_size);
+    }
+
+    svq_elem->not_from_guest = true;
+    vhost_svq_add(svq, svq_elem);
+    vhost_svq_kick(svq);
+
+    return true;
+}
+
 /**
  * Forward available buffers.
  *
@@ -267,6 +339,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 break;
             }
 
+            svq_elem->not_from_guest = false;
             elem = &svq_elem->elem;
             if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
                 /*
@@ -391,6 +464,31 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
     return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
 }
 
+static int vhost_svq_unmap(VhostShadowVirtqueue *svq, hwaddr iova, size_t size)
+{
+    DMAMap needle = {
+        .iova = iova,
+        .size = size,
+    };
+    const DMAMap *overlap;
+
+    while ((overlap = vhost_iova_tree_find(svq->iova_tree, &needle))) {
+        DMAMap needle = *overlap;
+
+        if (svq->map_ops->unmap) {
+            int r = svq->map_ops->unmap(overlap->iova, overlap->size + 1,
+                                        svq->map_ops_opaque);
+            if (unlikely(r != 0)) {
+                return r;
+            }
+        }
+        qemu_vfree((void *)overlap->translated_addr);
+        vhost_iova_tree_remove(svq->iova_tree, &needle);
+    }
+
+    return 0;
+}
+
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                             bool check_for_avail_queue)
 {
@@ -410,23 +508,56 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
             }
 
             elem = &svq_elem->elem;
-            if (unlikely(i >= svq->vring.num)) {
-                qemu_log_mask(LOG_GUEST_ERROR,
-                         "More than %u used buffers obtained in a %u size SVQ",
-                         i, svq->vring.num);
-                virtqueue_fill(vq, elem, len, i);
-                virtqueue_flush(vq, i);
-                return;
-            }
-            virtqueue_fill(vq, elem, len, i++);
-
             if (svq->ops && svq->ops->used_elem_handler) {
                 svq->ops->used_elem_handler(svq->vdev, elem);
             }
+
+            if (svq_elem->not_from_guest) {
+                if (unlikely(!elem->out_num && elem->out_num != 1)) {
+                    error_report("Unexpected out_num > 1");
+                    return;
+                }
+
+                if (elem->out_num) {
+                    int r = vhost_svq_unmap(svq, svq_elem->out_iova,
+                                            elem->out_sg[0].iov_len);
+                    if (unlikely(r != 0)) {
+                        error_report("Cannot unmap out buffer");
+                        return;
+                    }
+                }
+
+                if (unlikely(!elem->in_num && elem->in_num != 1)) {
+                    error_report("Unexpected in_num > 1");
+                    return;
+                }
+
+                if (elem->in_num) {
+                    int r = vhost_svq_unmap(svq, svq_elem->in_iova,
+                                            elem->in_sg[0].iov_len);
+                    if (unlikely(r != 0)) {
+                        error_report("Cannot unmap out buffer");
+                        return;
+                    }
+                }
+            } else {
+                if (unlikely(i >= svq->vring.num)) {
+                    qemu_log_mask(
+                        LOG_GUEST_ERROR,
+                        "More than %u used buffers obtained in a %u size SVQ",
+                        i, svq->vring.num);
+                    virtqueue_fill(vq, elem, len, i);
+                    virtqueue_flush(vq, i);
+                    return;
+                }
+                virtqueue_fill(vq, elem, len, i++);
+            }
         }
 
-        virtqueue_flush(vq, i);
-        event_notifier_set(&svq->svq_call);
+        if (i > 0) {
+            virtqueue_flush(vq, i);
+            event_notifier_set(&svq->svq_call);
+        }
 
         if (check_for_avail_queue && svq->next_guest_avail_elem) {
             /*
@@ -590,13 +721,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
     for (unsigned i = 0; i < svq->vring.num; ++i) {
         g_autofree SVQElement *svq_elem = NULL;
         svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
-        if (svq_elem) {
+        if (svq_elem && !svq_elem->not_from_guest) {
             virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
         }
     }
 
     next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
-    if (next_avail_elem) {
+    if (next_avail_elem && !next_avail_elem->not_from_guest) {
         virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
     }
     svq->vq = NULL;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 20/25] vdpa: add NetClientState->start() callback
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (18 preceding siblings ...)
  2022-04-13 16:32 ` [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-14  9:14   ` Jason Wang
  2022-04-13 16:32 ` [RFC PATCH v7 21/25] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

It allows to inject custom code on device success start, right before
release lock.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/net/net.h  | 2 ++
 hw/net/vhost_net.c | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 523136c7ac..2fc3002ab4 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef bool (NetCanReceive)(NetClientState *);
+typedef void (NetStart)(NetClientState *);
 typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
     NetReceive *receive_raw;
     NetReceiveIOV *receive_iov;
     NetCanReceive *can_receive;
+    NetStart *start;
     NetCleanup *cleanup;
     LinkStatusChanged *link_status_changed;
     QueryRxFilter *query_rx_filter;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2ca4..44a105ec29 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -274,6 +274,10 @@ static int vhost_net_start_one(struct vhost_net *net,
             }
         }
     }
+
+    if (net->nc->info->start) {
+        net->nc->info->start(net->nc);
+    }
     return 0;
 fail:
     file.fd = -1;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 21/25] vdpa: Add vhost_vdpa_start_control_svq
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (19 preceding siblings ...)
  2022-04-13 16:32 ` [RFC PATCH v7 20/25] vdpa: add NetClientState->start() callback Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-13 16:32 ` [RFC PATCH v7 22/25] vhost: Update kernel headers Eugenio Pérez
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

This will send CVQ commands in the destination machine, seting up
everything o there is no guest-visible change.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 net/vhost-vdpa.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 38e6912255..15c3e4f703 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -203,10 +203,73 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, const uint8_t *buf,
     return 0;
 }
 
+static bool vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
+                                         VirtIODevice *vdev)
+{
+    VirtIONet *n = VIRTIO_NET(vdev);
+    uint64_t features = vdev->host_features;
+
+    if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+        const struct virtio_net_ctrl_hdr ctrl = {
+            .class = VIRTIO_NET_CTRL_MAC,
+            .cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+        };
+        uint8_t mac[6];
+        const struct iovec data[] = {
+            {
+                .iov_base = (void *)&ctrl,
+                .iov_len = sizeof(ctrl),
+            },{
+                .iov_base = mac,
+                .iov_len = sizeof(mac),
+            },{
+                .iov_base = NULL,
+                .iov_len = sizeof(virtio_net_ctrl_ack),
+            }
+        };
+        bool ret;
+
+        /* TODO: Only best effort? */
+        memcpy(mac, n->mac, sizeof(mac));
+        ret = vhost_svq_inject(svq, data, 2, 1);
+        if (!ret) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static void vhost_vdpa_start(NetClientState *nc)
+{
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_vdpa *v = &s->vhost_vdpa;
+    struct vhost_dev *dev = &s->vhost_net->dev;
+    VhostShadowVirtqueue *svq;
+
+    if (nc->is_datapath) {
+        /* This is not the cvq dev */
+        return;
+    }
+
+    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
+        return;
+    }
+
+    if (!v->shadow_vqs_enabled) {
+        return;
+    }
+
+    svq = g_ptr_array_index(v->shadow_vqs, 0);
+    vhost_vdpa_start_control_svq(svq, dev->vdev);
+}
+
 static NetClientInfo net_vhost_vdpa_info = {
         .type = NET_CLIENT_DRIVER_VHOST_VDPA,
         .size = sizeof(VhostVDPAState),
         .receive = vhost_vdpa_receive,
+        .start = vhost_vdpa_start,
         .cleanup = vhost_vdpa_cleanup,
         .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
         .has_ufo = vhost_vdpa_has_ufo,
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 22/25] vhost: Update kernel headers
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (20 preceding siblings ...)
  2022-04-13 16:32 ` [RFC PATCH v7 21/25] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-13 16:32 ` [RFC PATCH v7 23/25] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/standard-headers/linux/vhost_types.h | 11 ++++++++-
 linux-headers/linux/vhost.h                  | 25 ++++++++++++++++----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
index 0bd2684a2a..ce78551b0f 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -87,7 +87,7 @@ struct vhost_msg {
 
 struct vhost_msg_v2 {
 	uint32_t type;
-	uint32_t reserved;
+	uint32_t asid;
 	union {
 		struct vhost_iotlb_msg iotlb;
 		uint8_t padding[64];
@@ -153,4 +153,13 @@ struct vhost_vdpa_iova_range {
 /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
 #define VHOST_NET_F_VIRTIO_NET_HDR 27
 
+/* Use message type V2 */
+#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
+/* IOTLB can accept batching hints */
+#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
+/* IOTLB can accept address space identifier through V2 type of IOTLB
+ * message
+ */
+#define VHOST_BACKEND_F_IOTLB_ASID  0x3
+
 #endif
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index c998860d7b..5e083490f1 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -89,11 +89,6 @@
 
 /* Set or get vhost backend capability */
 
-/* Use message type V2 */
-#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
-/* IOTLB can accept batching hints */
-#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
-
 #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
 #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
 
@@ -150,4 +145,24 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
+/* Get the number of virtqueue groups. */
+#define VHOST_VDPA_GET_GROUP_NUM	_IOR(VHOST_VIRTIO, 0x79, unsigned int)
+
+/* Get the number of address spaces. */
+#define VHOST_VDPA_GET_AS_NUM		_IOR(VHOST_VIRTIO, 0x7A, unsigned int)
+
+/* Get the group for a virtqueue: read index, write group in num,
+ * The virtqueue index is stored in the index field of
+ * vhost_vring_state. The group for this specific virtqueue is
+ * returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_GROUP	_IOWR(VHOST_VIRTIO, 0x7B,	\
+					      struct vhost_vring_state)
+/* Set the ASID for a virtqueue group. The group index is stored in
+ * the index field of vhost_vring_state, the ASID associated with this
+ * group is stored at num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_SET_GROUP_ASID	_IOW(VHOST_VIRTIO, 0x7C, \
+					     struct vhost_vring_state)
+
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 23/25] vhost: Make possible to check for device exclusive vq group
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (21 preceding siblings ...)
  2022-04-13 16:32 ` [RFC PATCH v7 22/25] vhost: Update kernel headers Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-13 16:32 ` [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device Eugenio Pérez
  2022-04-13 16:32 ` [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq Eugenio Pérez
  24 siblings, 0 replies; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

CVQ needs to be in its own group, not shared with any data vq. Enable
the checking of it here, before introducing address space id concepts.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost.h |  2 +
 hw/net/vhost_net.c        |  4 +-
 hw/virtio/vhost-vdpa.c    | 79 ++++++++++++++++++++++++++++++++++++++-
 hw/virtio/trace-events    |  1 +
 4 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 58a73e7b7a..034868fa9e 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -78,6 +78,8 @@ struct vhost_dev {
     int vq_index_end;
     /* if non-zero, minimum required value for max_queues */
     int num_queues;
+    /* Must be a vq group different than any other vhost dev */
+    bool independent_vq_group;
     uint64_t features;
     uint64_t acked_features;
     uint64_t backend_features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 44a105ec29..10480e19e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -343,14 +343,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < nvhosts; i++) {
+        bool cvq_idx = i >= data_queue_pairs;
 
-        if (i < data_queue_pairs) {
+        if (!cvq_idx) {
             peer = qemu_get_peer(ncs, i);
         } else { /* Control Virtqueue */
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
         }
 
         net = get_vhost_net(peer);
+        net->dev.independent_vq_group = !!cvq_idx;
         vhost_net_set_vq_index(net, i * 2, index_end);
 
         /* Suppress the masking guest notifiers on vhost user
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 1948c5ca7d..4096555242 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -678,7 +678,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
     uint64_t features;
     uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
     int r;
 
     if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
@@ -1098,6 +1099,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
     return true;
 }
 
+static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
+                                      struct vhost_vring_state *state)
+{
+    int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
+    trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
+    return ret;
+}
+
+static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    struct vhost_vring_state this_vq_group = {
+        .index = dev->vq_index,
+    };
+    int ret;
+
+    if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
+        return true;
+    }
+
+    if (!v->shadow_vqs_enabled) {
+        return true;
+    }
+
+    ret = vhost_vdpa_get_vring_group(dev, &this_vq_group);
+    if (unlikely(ret)) {
+        goto call_err;
+    }
+
+    for (int i = 1; i < dev->nvqs; ++i) {
+        struct vhost_vring_state vq_group = {
+            .index = dev->vq_index + i,
+        };
+
+        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
+        if (unlikely(ret)) {
+            goto call_err;
+        }
+        if (unlikely(vq_group.num != this_vq_group.num)) {
+            error_report("VQ %d group is different than VQ %d one",
+                         this_vq_group.index, vq_group.index);
+            return false;
+        }
+    }
+
+    for (int i = 0; i < dev->vq_index_end; ++i) {
+        struct vhost_vring_state vq_group = {
+            .index = i,
+        };
+
+        if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
+            continue;
+        }
+
+        ret = vhost_vdpa_get_vring_group(dev, &vq_group);
+        if (unlikely(ret)) {
+            goto call_err;
+        }
+        if (unlikely(vq_group.num == this_vq_group.num)) {
+            error_report("VQ %d group is the same as VQ %d one",
+                         this_vq_group.index, vq_group.index);
+            return false;
+        }
+    }
+
+    return true;
+
+call_err:
+    error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
+    return false;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
@@ -1106,6 +1179,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 
     if (started) {
         vhost_vdpa_host_notifiers_init(dev);
+        if (dev->independent_vq_group &&
+            !vhost_dev_is_independent_group(dev)) {
+            return -1;
+        }
         ok = vhost_vdpa_svqs_start(dev);
         if (unlikely(!ok)) {
             return -1;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 333348d9d5..e6fdc03514 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -43,6 +43,7 @@ vhost_vdpa_set_vring_ready(void *dev) "dev: %p"
 vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
 vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
 vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
+vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
 vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (22 preceding siblings ...)
  2022-04-13 16:32 ` [RFC PATCH v7 23/25] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-14  9:10   ` Jason Wang
  2022-04-13 16:32 ` [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq Eugenio Pérez
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

We can configure ASID per group, but we still use asid 0 for every vdpa
device. Multiple asid support for cvq will be introduced in next
patches

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 include/hw/virtio/vhost.h |  4 ++
 hw/net/vhost_net.c        |  5 +++
 hw/virtio/vhost-vdpa.c    | 95 ++++++++++++++++++++++++++++++++-------
 net/vhost-vdpa.c          |  4 +-
 hw/virtio/trace-events    |  9 ++--
 5 files changed, 94 insertions(+), 23 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 034868fa9e..640cf82168 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -76,8 +76,12 @@ struct vhost_dev {
     int vq_index;
     /* one past the last vq index for the virtio device (not vhost) */
     int vq_index_end;
+    /* one past the last vq index of this virtqueue group */
+    int vq_group_index_end;
     /* if non-zero, minimum required value for max_queues */
     int num_queues;
+    /* address space id */
+    uint32_t address_space_id;
     /* Must be a vq group different than any other vhost dev */
     bool independent_vq_group;
     uint64_t features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 10480e19e5..a34df739a7 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -344,15 +344,20 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
     for (i = 0; i < nvhosts; i++) {
         bool cvq_idx = i >= data_queue_pairs;
+        uint32_t vq_group_end;
 
         if (!cvq_idx) {
             peer = qemu_get_peer(ncs, i);
+            vq_group_end = 2 * data_queue_pairs;
         } else { /* Control Virtqueue */
             peer = qemu_get_peer(ncs, n->max_queue_pairs);
+            vq_group_end = 2 * data_queue_pairs + 1;
         }
 
         net = get_vhost_net(peer);
+        net->dev.address_space_id = !!cvq_idx;
         net->dev.independent_vq_group = !!cvq_idx;
+        net->dev.vq_group_index_end = vq_group_end;
         vhost_net_set_vq_index(net, i * 2, index_end);
 
         /* Suppress the masking guest notifiers on vhost user
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 4096555242..5ed211287c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -79,6 +79,9 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
     int ret = 0;
 
     msg.type = v->msg_type;
+    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
     msg.iotlb.iova = iova;
     msg.iotlb.size = size;
     msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
@@ -90,8 +93,9 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
         return 0;
     }
 
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
-                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+                             msg.iotlb.type);
 
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -109,6 +113,9 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
     int fd = v->device_fd;
     int ret = 0;
 
+    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
     msg.type = v->msg_type;
     msg.iotlb.iova = iova;
     msg.iotlb.size = size;
@@ -119,7 +126,7 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
         return 0;
     }
 
-    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
+    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
                                msg.iotlb.size, msg.iotlb.type);
 
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
@@ -134,6 +141,7 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
 static void vhost_vdpa_listener_commit(MemoryListener *listener)
 {
     struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+    struct vhost_dev *dev = v->dev;
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
     size_t num = v->iotlb_updates->len;
@@ -142,9 +150,14 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
         return;
     }
 
+    if (dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_ASID)) {
+        msg.asid = v->dev->address_space_id;
+    }
+
     msg.type = v->msg_type;
     msg.iotlb.type = VHOST_IOTLB_BATCH_BEGIN;
-    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.asid,
+                                          msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write BEGIN_BATCH, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -162,7 +175,8 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
     }
 
     msg.iotlb.type = VHOST_IOTLB_BATCH_END;
-    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.asid,
+                                     msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -1171,10 +1185,48 @@ call_err:
     return false;
 }
 
+static int vhost_vdpa_set_vq_group_address_space_id(struct vhost_dev *dev,
+                                                struct vhost_vring_state *asid)
+{
+    trace_vhost_vdpa_set_vq_group_address_space_id(dev, asid->index, asid->num);
+    return vhost_vdpa_call(dev, VHOST_VDPA_SET_GROUP_ASID, asid);
+}
+
+static int vhost_vdpa_set_address_space_id(struct vhost_dev *dev)
+{
+    struct vhost_vring_state vq_group = {
+        .index = dev->vq_index,
+    };
+    struct vhost_vring_state asid;
+    int ret;
+
+    if (!dev->address_space_id) {
+        return 0;
+    }
+
+    ret = vhost_vdpa_get_vring_group(dev, &vq_group);
+    if (unlikely(ret)) {
+        error_report("Can't read vq group, errno=%d (%s)", ret,
+                     g_strerror(-ret));
+        return ret;
+    }
+
+    asid.index = vq_group.num;
+    asid.num = dev->address_space_id;
+    ret = vhost_vdpa_set_vq_group_address_space_id(dev, &asid);
+    if (unlikely(ret)) {
+        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
+            asid.index, asid.num, ret, g_strerror(-ret));
+    }
+    return ret;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
     struct vhost_vdpa *v = dev->opaque;
-    bool ok;
+    bool vq_group_end, ok;
+    int r = 0;
+
     trace_vhost_vdpa_dev_start(dev, started);
 
     if (started) {
@@ -1183,6 +1235,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
             !vhost_dev_is_independent_group(dev)) {
             return -1;
         }
+        r = vhost_vdpa_set_address_space_id(dev);
+        if (unlikely(r)) {
+            return r;
+        }
         ok = vhost_vdpa_svqs_start(dev);
         if (unlikely(!ok)) {
             return -1;
@@ -1196,21 +1252,26 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
         vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
     }
 
-    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
-        return 0;
+    vq_group_end = dev->vq_index + dev->nvqs == dev->vq_group_index_end;
+    if (vq_group_end && started) {
+        memory_listener_register(&v->listener, &address_space_memory);
     }
 
-    if (started) {
-        memory_listener_register(&v->listener, &address_space_memory);
-        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-    } else {
-        vhost_vdpa_reset_device(dev);
-        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
-                                   VIRTIO_CONFIG_S_DRIVER);
-        memory_listener_unregister(&v->listener);
+    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        if (started) {
+            r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+        } else {
+            vhost_vdpa_reset_device(dev);
+            vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
+                                       VIRTIO_CONFIG_S_DRIVER);
+        }
+    }
 
-        return 0;
+    if (vq_group_end && !started) {
+        memory_listener_unregister(&v->listener);
     }
+
+    return r;
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 15c3e4f703..a6f803ea4e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -473,8 +473,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false, opts->x_svq,
-                                 iova_tree);
+                                 vdpa_device_fd, i, 1,
+                                 false, opts->x_svq, iova_tree);
         if (!nc)
             goto err;
     }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e6fdc03514..2858deac60 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -23,10 +23,10 @@ vhost_user_postcopy_waker_found(uint64_t client_addr) "0x%"PRIx64
 vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%"PRIx64
 
 # vhost-vdpa.c
-vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
-vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
-vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
-vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
+vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
+vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
+vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
 vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
 vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
@@ -44,6 +44,7 @@ vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
 vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
 vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
 vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
+vhost_vdpa_set_vq_group_address_space_id(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
 vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
 vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
 vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq
  2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
                   ` (23 preceding siblings ...)
  2022-04-13 16:32 ` [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device Eugenio Pérez
@ 2022-04-13 16:32 ` Eugenio Pérez
  2022-04-14  9:09   ` Jason Wang
  24 siblings, 1 reply; 50+ messages in thread
From: Eugenio Pérez @ 2022-04-13 16:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Jason Wang, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

This isolates shadow cvq in its own group.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 qapi/net.json    |  8 +++-
 net/vhost-vdpa.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 92848e4362..39c245e6cd 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -447,9 +447,12 @@
 #
 # @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
 #         (default: false)
+# @x-cvq-svq: Start device with (experimental) shadow virtqueue in its own
+#             virtqueue group. (Since 7.1)
+#             (default: false)
 #
 # Features:
-# @unstable: Member @x-svq is experimental.
+# @unstable: Members @x-svq and x-cvq-svq are experimental.
 #
 # Since: 5.1
 ##
@@ -457,7 +460,8 @@
   'data': {
     '*vhostdev':     'str',
     '*queues':       'int',
-    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
+    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] },
+    '*x-cvq-svq':    {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetClientDriver:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a6f803ea4e..851dacb902 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -377,6 +377,17 @@ static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
     return ret;
 }
 
+static int vhost_vdpa_get_backend_features(int fd, uint64_t *features,
+                                           Error **errp)
+{
+    int ret = ioctl(fd, VHOST_GET_BACKEND_FEATURES, features);
+    if (ret) {
+        error_setg_errno(errp, errno,
+            "Fail to query backend features from vhost-vDPA device");
+    }
+    return ret;
+}
+
 static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
                                           int *has_cvq, Error **errp)
 {
@@ -410,16 +421,56 @@ static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
     return 1;
 }
 
+/**
+ * Check vdpa device to support CVQ group asid 1
+ *
+ * @vdpa_device_fd: Vdpa device fd
+ * @queue_pairs: Queue pairs
+ * @errp: Error
+ */
+static int vhost_vdpa_check_cvq_svq(int vdpa_device_fd, int queue_pairs,
+                                    Error **errp)
+{
+    uint64_t backend_features;
+    unsigned num_as;
+    int r;
+
+    r = vhost_vdpa_get_backend_features(vdpa_device_fd, &backend_features,
+                                        errp);
+    if (unlikely(r)) {
+        return -1;
+    }
+
+    if (unlikely(!(backend_features & VHOST_BACKEND_F_IOTLB_ASID))) {
+        error_setg(errp, "Device without IOTLB_ASID feature");
+        return -1;
+    }
+
+    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
+    if (unlikely(r)) {
+        error_setg_errno(errp, errno,
+                         "Cannot retrieve number of supported ASs");
+        return -1;
+    }
+    if (unlikely(num_as < 2)) {
+        error_setg(errp, "Insufficient number of ASs (%u, min: 2)", num_as);
+    }
+
+    return 0;
+}
+
 int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp)
 {
     const NetdevVhostVDPAOptions *opts;
+    struct vhost_vdpa_iova_range iova_range;
     uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
     int queue_pairs, r, i, has_cvq = 0;
     g_autoptr(VhostIOVATree) iova_tree = NULL;
+    ERRP_GUARD();
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -444,8 +495,9 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         qemu_close(vdpa_device_fd);
         return queue_pairs;
     }
-    if (opts->x_svq) {
-        struct vhost_vdpa_iova_range iova_range;
+    if (opts->x_cvq_svq || opts->x_svq) {
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+
         uint64_t invalid_dev_features =
             features & ~vdpa_svq_device_features &
             /* Transport are all accepted at this point */
@@ -457,7 +509,21 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                        invalid_dev_features);
             goto err_svq;
         }
-        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+    }
+
+    if (opts->x_cvq_svq) {
+        if (!has_cvq) {
+            error_setg(errp, "Cannot use x-cvq-svq with a device without cvq");
+            goto err_svq;
+        }
+
+        r = vhost_vdpa_check_cvq_svq(vdpa_device_fd, queue_pairs, errp);
+        if (unlikely(r)) {
+            error_prepend(errp, "Cannot configure CVQ SVQ: ");
+            goto err_svq;
+        }
+    }
+    if (opts->x_svq) {
         iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
     }
 
@@ -472,11 +538,35 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     }
 
     if (has_cvq) {
+        g_autoptr(VhostIOVATree) cvq_iova_tree = NULL;
+
+        if (opts->x_cvq_svq) {
+            cvq_iova_tree = vhost_iova_tree_new(iova_range.first,
+                                                iova_range.last);
+        } else if (opts->x_svq) {
+            cvq_iova_tree = vhost_iova_tree_acquire(iova_tree);
+        }
+
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
                                  vdpa_device_fd, i, 1,
-                                 false, opts->x_svq, iova_tree);
+                                 false, opts->x_cvq_svq || opts->x_svq,
+                                 cvq_iova_tree);
         if (!nc)
             goto err;
+
+        if (opts->x_cvq_svq) {
+            struct vhost_vring_state asid = {
+                .index = 1,
+                .num = 1,
+            };
+
+            r = ioctl(vdpa_device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
+            if (unlikely(r)) {
+                error_setg_errno(errp, errno,
+                                 "Cannot set cvq group independent asid");
+                goto err;
+            }
+        }
     }
 
     return 0;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ
  2022-04-13 16:31 ` [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ Eugenio Pérez
@ 2022-04-14  3:48   ` Jason Wang
  2022-04-22 14:16     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  3:48 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> Only the first one of them were properly enqueued back.


I wonder if it's better to use two patches:

1) using private chain

2) fix the chain issue

Patch looks good itself.

Thanks


>
> While we're at it, harden SVQ: The device could have access to modify
> them, and it definitely have access when we implement packed vq. Harden
> SVQ maintaining a private copy of the descriptor chain. Other fields
> like buffer addresses are already maintained sepparatedly.
>
> Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  6 ++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 27 +++++++++++++++++++++------
>   2 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index e5e24c536d..c132c994e9 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -53,6 +53,12 @@ typedef struct VhostShadowVirtqueue {
>       /* Next VirtQueue element that guest made available */
>       VirtQueueElement *next_guest_avail_elem;
>   
> +    /*
> +     * Backup next field for each descriptor so we can recover securely, not
> +     * needing to trust the device access.
> +     */
> +    uint16_t *desc_next;
> +
>       /* Next head to expose to the device */
>       uint16_t shadow_avail_idx;
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index b232803d1b..a2531d5874 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -138,6 +138,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>       for (n = 0; n < num; n++) {
>           if (more_descs || (n + 1 < num)) {
>               descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +            descs[i].next = cpu_to_le16(svq->desc_next[i]);
>           } else {
>               descs[i].flags = flags;
>           }
> @@ -145,10 +146,10 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>           descs[i].len = cpu_to_le32(iovec[n].iov_len);
>   
>           last = i;
> -        i = cpu_to_le16(descs[i].next);
> +        i = cpu_to_le16(svq->desc_next[i]);
>       }
>   
> -    svq->free_head = le16_to_cpu(descs[last].next);
> +    svq->free_head = le16_to_cpu(svq->desc_next[last]);
>   }
>   
>   static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> @@ -333,13 +334,22 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
>       svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
>   }
>   
> +static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
> +                                             uint16_t num, uint16_t i)
> +{
> +    for (uint16_t j = 0; j < num; ++j) {
> +        i = le16_to_cpu(svq->desc_next[i]);
> +    }
> +
> +    return i;
> +}
> +
>   static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>                                              uint32_t *len)
>   {
> -    vring_desc_t *descs = svq->vring.desc;
>       const vring_used_t *used = svq->vring.used;
>       vring_used_elem_t used_elem;
> -    uint16_t last_used;
> +    uint16_t last_used, last_used_chain, num;
>   
>       if (!vhost_svq_more_used(svq)) {
>           return NULL;
> @@ -365,7 +375,10 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>           return NULL;
>       }
>   
> -    descs[used_elem.id].next = svq->free_head;
> +    num = svq->ring_id_maps[used_elem.id]->in_num +
> +          svq->ring_id_maps[used_elem.id]->out_num;
> +    last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
> +    svq->desc_next[last_used_chain] = svq->free_head;
>       svq->free_head = used_elem.id;
>   
>       *len = used_elem.len;
> @@ -540,8 +553,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
>       svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
> +    svq->desc_next = g_new0(uint16_t, svq->vring.num);
>       for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> -        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +        svq->desc_next[i] = cpu_to_le16(i + 1);
>       }
>   }
>   
> @@ -574,6 +588,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>           virtqueue_detach_element(svq->vq, next_avail_elem, 0);
>       }
>       svq->vq = NULL;
> +    g_free(svq->desc_next);
>       g_free(svq->ring_id_maps);
>       qemu_vfree(svq->vring.desc);
>       qemu_vfree(svq->vring.used);



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 02/25] vdpa: Add missing tracing to batch mapping functions
  2022-04-13 16:31 ` [RFC PATCH v7 02/25] vdpa: Add missing tracing to batch mapping functions Eugenio Pérez
@ 2022-04-14  3:49   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  3:49 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> These functions were not traced properly.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>


>   hw/virtio/vhost-vdpa.c | 2 ++
>   hw/virtio/trace-events | 2 ++
>   2 files changed, 4 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 8adf7c0b92..9e5fe15d03 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -129,6 +129,7 @@ static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
>           .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
>       };
>   
> +    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
>       if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>           error_report("failed to write, fd=%d, errno=%d (%s)",
>                        fd, errno, strerror(errno));
> @@ -163,6 +164,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>       msg.type = v->msg_type;
>       msg.iotlb.type = VHOST_IOTLB_BATCH_END;
>   
> +    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
>       if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>           error_report("failed to write, fd=%d, errno=%d (%s)",
>                        fd, errno, strerror(errno));
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index a5102eac9e..333348d9d5 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -25,6 +25,8 @@ vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%"
>   # vhost-vdpa.c
>   vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>   vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> +vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>   vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
>   vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
>   vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 03/25] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base
  2022-04-13 16:31 ` [RFC PATCH v7 03/25] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base Eugenio Pérez
@ 2022-04-14  3:50   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  3:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   hw/virtio/vhost-vdpa.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 9e5fe15d03..1f229ff4cb 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1172,11 +1172,11 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> +    int vdpa_idx = ring->index - dev->vq_index;
>       int ret;
>   
>       if (v->shadow_vqs_enabled) {
> -        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> -                                                      ring->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>   
>           /*
>            * Setting base as last used idx, so destination will see as available



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 04/25] util: Return void on iova_tree_remove
  2022-04-13 16:31 ` [RFC PATCH v7 04/25] util: Return void on iova_tree_remove Eugenio Pérez
@ 2022-04-14  3:50   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  3:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> It always returns IOVA_OK so nobody uses it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>


>   include/qemu/iova-tree.h | 4 +---
>   util/iova-tree.c         | 4 +---
>   2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index c938fb0793..16bbfdf5f8 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -72,10 +72,8 @@ int iova_tree_insert(IOVATree *tree, const DMAMap *map);
>    * provided.  The range does not need to be exactly what has inserted,
>    * all the mappings that are included in the provided range will be
>    * removed from the tree.  Here map->translated_addr is meaningless.
> - *
> - * Return: 0 if succeeded, or <0 if error.
>    */
> -int iova_tree_remove(IOVATree *tree, const DMAMap *map);
> +void iova_tree_remove(IOVATree *tree, const DMAMap *map);
>   
>   /**
>    * iova_tree_find:
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 6dff29c1f6..fee530a579 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -164,15 +164,13 @@ void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator)
>       g_tree_foreach(tree->tree, iova_tree_traverse, iterator);
>   }
>   
> -int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> +void iova_tree_remove(IOVATree *tree, const DMAMap *map)
>   {
>       const DMAMap *overlap;
>   
>       while ((overlap = iova_tree_find(tree, map))) {
>           g_tree_remove(tree->tree, overlap);
>       }
> -
> -    return IOVA_OK;
>   }
>   
>   /**



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2()
  2022-04-13 16:31 ` [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2() Eugenio Pérez
@ 2022-04-14  3:51   ` Jason Wang
  2022-04-14  4:01   ` Jason Wang
  1 sibling, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  3:51 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> From: Philippe Mathieu-Daudé <philmd@redhat.com>
>
> Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538
>
>    The old API took the size of the memory to duplicate as a guint,
>    whereas most memory functions take memory sizes as a gsize. This
>    made it easy to accidentally pass a gsize to g_memdup(). For large
>    values, that would lead to a silent truncation of the size from 64
>    to 32 bits, and result in a heap area being returned which is
>    significantly smaller than what the caller expects. This can likely
>    be exploited in various modules to cause a heap buffer overflow.
>
> Replace g_memdup() by the safer g_memdup2() wrapper.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>


>   hw/net/virtio-net.c       | 3 ++-
>   hw/virtio/virtio-crypto.c | 6 +++---
>   2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 1067e72b39..e4748a7e6c 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1443,7 +1443,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>           }
>   
>           iov_cnt = elem->out_num;
> -        iov2 = iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
> +        iov2 = iov = g_memdup2(elem->out_sg,
> +                               sizeof(struct iovec) * elem->out_num);
>           s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
>           iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
>           if (s != sizeof(ctrl)) {
> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> index dcd80b904d..0e31e3cc04 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -242,7 +242,7 @@ static void virtio_crypto_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>           }
>   
>           out_num = elem->out_num;
> -        out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
> +        out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
>           out_iov = out_iov_copy;
>   
>           in_num = elem->in_num;
> @@ -605,11 +605,11 @@ virtio_crypto_handle_request(VirtIOCryptoReq *request)
>       }
>   
>       out_num = elem->out_num;
> -    out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
> +    out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
>       out_iov = out_iov_copy;
>   
>       in_num = elem->in_num;
> -    in_iov_copy = g_memdup(elem->in_sg, sizeof(in_iov[0]) * in_num);
> +    in_iov_copy = g_memdup2(elem->in_sg, sizeof(in_iov[0]) * in_num);
>       in_iov = in_iov_copy;
>   
>       if (unlikely(iov_to_buf(out_iov, out_num, 0, &req, sizeof(req))



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2()
  2022-04-13 16:31 ` [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2() Eugenio Pérez
  2022-04-14  3:51   ` Jason Wang
@ 2022-04-14  4:01   ` Jason Wang
  1 sibling, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  4:01 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> From: Philippe Mathieu-Daudé <philmd@redhat.com>
>
> Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538
>
>    The old API took the size of the memory to duplicate as a guint,
>    whereas most memory functions take memory sizes as a gsize. This
>    made it easy to accidentally pass a gsize to g_memdup(). For large
>    values, that would lead to a silent truncation of the size from 64
>    to 32 bits, and result in a heap area being returned which is
>    significantly smaller than what the caller expects. This can likely
>    be exploited in various modules to cause a heap buffer overflow.
>
> Replace g_memdup() by the safer g_memdup2() wrapper.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>


>   hw/net/virtio-net.c       | 3 ++-
>   hw/virtio/virtio-crypto.c | 6 +++---
>   2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 1067e72b39..e4748a7e6c 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1443,7 +1443,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>           }
>   
>           iov_cnt = elem->out_num;
> -        iov2 = iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
> +        iov2 = iov = g_memdup2(elem->out_sg,
> +                               sizeof(struct iovec) * elem->out_num);
>           s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
>           iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
>           if (s != sizeof(ctrl)) {
> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> index dcd80b904d..0e31e3cc04 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -242,7 +242,7 @@ static void virtio_crypto_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>           }
>   
>           out_num = elem->out_num;
> -        out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
> +        out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
>           out_iov = out_iov_copy;
>   
>           in_num = elem->in_num;
> @@ -605,11 +605,11 @@ virtio_crypto_handle_request(VirtIOCryptoReq *request)
>       }
>   
>       out_num = elem->out_num;
> -    out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
> +    out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
>       out_iov = out_iov_copy;
>   
>       in_num = elem->in_num;
> -    in_iov_copy = g_memdup(elem->in_sg, sizeof(in_iov[0]) * in_num);
> +    in_iov_copy = g_memdup2(elem->in_sg, sizeof(in_iov[0]) * in_num);
>       in_iov = in_iov_copy;
>   
>       if (unlikely(iov_to_buf(out_iov, out_num, 0, &req, sizeof(req))



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit
  2022-04-13 16:31 ` [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit Eugenio Pérez
@ 2022-04-14  4:11   ` Jason Wang
  2022-04-22  9:17     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  4:11 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> With the introduction of many ASID it can happen that many changes on
> different listeners come before the commit call.


I think we have at most one listener even for the case of MQ/CVQ?


>   Since kernel vhost-vdpa
> still does not support it, send it all in one shot.
>
> This also have one extra advantage: If there is no update to notify, we
> save the iotlb_{begin,end} calls.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-vdpa.h |  2 +-
>   hw/virtio/vhost-vdpa.c         | 69 +++++++++++++++++-----------------
>   2 files changed, 36 insertions(+), 35 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index a29dbb3f53..4961acea8b 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -27,7 +27,7 @@ typedef struct vhost_vdpa {
>       int device_fd;
>       int index;
>       uint32_t msg_type;
> -    bool iotlb_batch_begin_sent;
> +    GArray *iotlb_updates;
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
>       uint64_t acked_features;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 1f229ff4cb..27ee678dc9 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -85,6 +85,11 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
>       msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
>       msg.iotlb.type = VHOST_IOTLB_UPDATE;
>   
> +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_BATCH)) {
> +        g_array_append_val(v->iotlb_updates, msg);
> +        return 0;
> +    }


I think it's better to use a consistent way for !batch and batch (E.g we 
can do this even for the backend that doesn't support batching?)

Otherwise the codes are hard to be maintained.


> +
>      trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
>                               msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
>   
> @@ -109,6 +114,11 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
>       msg.iotlb.size = size;
>       msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
>   
> +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_BATCH)) {
> +        g_array_append_val(v->iotlb_updates, msg);
> +        return 0;
> +    }
> +
>       trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
>                                  msg.iotlb.size, msg.iotlb.type);
>   
> @@ -121,56 +131,47 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
>       return ret;
>   }
>   
> -static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
> -{
> -    int fd = v->device_fd;
> -    struct vhost_msg_v2 msg = {
> -        .type = v->msg_type,
> -        .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
> -    };
> -
> -    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
> -    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> -        error_report("failed to write, fd=%d, errno=%d (%s)",
> -                     fd, errno, strerror(errno));
> -    }
> -}
> -
> -static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> -{
> -    if (v->dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH) &&
> -        !v->iotlb_batch_begin_sent) {
> -        vhost_vdpa_listener_begin_batch(v);
> -    }
> -
> -    v->iotlb_batch_begin_sent = true;
> -}
> -
>   static void vhost_vdpa_listener_commit(MemoryListener *listener)
>   {
>       struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> -    struct vhost_dev *dev = v->dev;
>       struct vhost_msg_v2 msg = {};
>       int fd = v->device_fd;
> +    size_t num = v->iotlb_updates->len;
>   
> -    if (!(dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
> +    if (!num) {
>           return;
>       }
>   
> -    if (!v->iotlb_batch_begin_sent) {
> -        return;
> +    msg.type = v->msg_type;
> +    msg.iotlb.type = VHOST_IOTLB_BATCH_BEGIN;
> +    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
> +    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {


We need check whehter the vhost-vDPA support batching first?


> +        error_report("failed to write BEGIN_BATCH, fd=%d, errno=%d (%s)",
> +                     fd, errno, strerror(errno));
> +        goto done;
>       }
>   
> -    msg.type = v->msg_type;
> -    msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> +    for (size_t i = 0; i < num; ++i) {
> +        struct vhost_msg_v2 *update = &g_array_index(v->iotlb_updates,
> +                                                     struct vhost_msg_v2, i);
> +        if (write(fd, update, sizeof(*update)) != sizeof(*update)) {
> +            error_report("failed to write dma update, fd=%d, errno=%d (%s)",
> +                         fd, errno, strerror(errno));
> +            goto done;


Maybe it's time to introduce v3 to allow a batch of messaged to be 
passed to vhost-vDPA in a single system call.

Thanks


> +        }
> +    }
>   
> +    msg.iotlb.type = VHOST_IOTLB_BATCH_END;
>       trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
>       if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>           error_report("failed to write, fd=%d, errno=%d (%s)",
>                        fd, errno, strerror(errno));
>       }
>   
> -    v->iotlb_batch_begin_sent = false;
> +done:
> +    g_array_set_size(v->iotlb_updates, 0);
> +    return;
> +
>   }
>   
>   static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> @@ -227,7 +228,6 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>           iova = mem_region.iova;
>       }
>   
> -    vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
>                                vaddr, section->readonly);
>       if (ret) {
> @@ -292,7 +292,6 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>           iova = result->iova;
>           vhost_iova_tree_remove(v->iova_tree, &mem_region);
>       }
> -    vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
>       if (ret) {
>           error_report("vhost_vdpa dma unmap error!");
> @@ -446,6 +445,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>       dev->opaque =  opaque ;
>       v->listener = vhost_vdpa_memory_listener;
>       v->msg_type = VHOST_IOTLB_MSG_V2;
> +    v->iotlb_updates = g_array_new(false, false, sizeof(struct vhost_msg_v2));
>       ret = vhost_vdpa_init_svq(dev, v, errp);
>       if (ret) {
>           goto err;
> @@ -579,6 +579,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>       trace_vhost_vdpa_cleanup(dev, v);
>       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       memory_listener_unregister(&v->listener);
> +    g_array_free(v->iotlb_updates, true);
>       vhost_vdpa_svq_cleanup(dev);
>   
>       dev->opaque = NULL;



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 07/25] vhost: Add reference counting to vhost_iova_tree
  2022-04-13 16:31 ` [RFC PATCH v7 07/25] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
@ 2022-04-14  5:30   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  5:30 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> Now that different vqs can have different ASIDs its easier to track them
> using reference counters.
>
> QEMU's glib version still does not have them so we've copied g_rc_box,
> so the implementation can be converted to glib's one when the minimum
> version is raised.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


I'm not sure if it's too early to introduce things like this since we 
have at most 2 ASIDs. This is probably only needed when we want to 
expose ASIDs to guest.

Let's see how it goes for the following patch anyhow.

Thanks


>   hw/virtio/vhost-iova-tree.h |  5 +++--
>   hw/virtio/vhost-iova-tree.c | 21 +++++++++++++++++++--
>   2 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> index 6a4f24e0f9..2fc825d7b1 100644
> --- a/hw/virtio/vhost-iova-tree.h
> +++ b/hw/virtio/vhost-iova-tree.h
> @@ -16,8 +16,9 @@
>   typedef struct VhostIOVATree VhostIOVATree;
>   
>   VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
> -void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
> -G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
> +VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree);
> +void vhost_iova_tree_release(VhostIOVATree *iova_tree);
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_release);
>   
>   const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
>                                           const DMAMap *map);
> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> index 55fed1fefb..31445cbdfc 100644
> --- a/hw/virtio/vhost-iova-tree.c
> +++ b/hw/virtio/vhost-iova-tree.c
> @@ -28,6 +28,9 @@ struct VhostIOVATree {
>   
>       /* IOVA address to qemu memory maps. */
>       IOVATree *iova_taddr_map;
> +
> +    /* Reference count */
> +    size_t refcnt;
>   };
>   
>   /**
> @@ -44,14 +47,28 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
>       tree->iova_last = iova_last;
>   
>       tree->iova_taddr_map = iova_tree_new();
> +    tree->refcnt = 1;
>       return tree;
>   }
>   
>   /**
> - * Delete an iova tree
> + * Increases the reference count of the iova tree
> + */
> +VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree)
> +{
> +    ++iova_tree->refcnt;
> +    return iova_tree;
> +}
> +
> +/**
> + * Decrease reference counter of iova tree, freeing if it reaches 0
>    */
> -void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
> +void vhost_iova_tree_release(VhostIOVATree *iova_tree)
>   {
> +    if (--iova_tree->refcnt) {
> +        return;
> +    }
> +
>       iova_tree_destroy(iova_tree->iova_taddr_map);
>       g_free(iova_tree);
>   }



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-04-13 16:31 ` [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
@ 2022-04-14  5:32   ` Jason Wang
  2022-04-18 10:36     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  5:32 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> Finally offering the possibility to enable SVQ from the command line.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   qapi/net.json    |  9 ++++++++-
>   net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
>   2 files changed, 48 insertions(+), 9 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index b92f3f5fb4..92848e4362 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -445,12 +445,19 @@
>   # @queues: number of queues to be created for multiqueue vhost-vdpa
>   #          (default: 1)
>   #
> +# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
> +#         (default: false)
> +#
> +# Features:
> +# @unstable: Member @x-svq is experimental.
> +#
>   # Since: 5.1
>   ##
>   { 'struct': 'NetdevVhostVDPAOptions',
>     'data': {
>       '*vhostdev':     'str',
> -    '*queues':       'int' } }
> +    '*queues':       'int',
> +    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
>   
>   ##
>   # @NetClientDriver:
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1e9fe47c03..9261101af2 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -128,6 +128,7 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
>   {
>       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>   
> +    g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_release);
>       if (s->vhost_net) {
>           vhost_net_cleanup(s->vhost_net);
>           g_free(s->vhost_net);
> @@ -187,13 +188,23 @@ static NetClientInfo net_vhost_vdpa_info = {
>           .check_peer_type = vhost_vdpa_check_peer_type,
>   };
>   
> +static int vhost_vdpa_get_iova_range(int fd,
> +                                     struct vhost_vdpa_iova_range *iova_range)
> +{
> +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> +
> +    return ret < 0 ? -errno : 0;
> +}
> +
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -                                           const char *device,
> -                                           const char *name,
> -                                           int vdpa_device_fd,
> -                                           int queue_pair_index,
> -                                           int nvqs,
> -                                           bool is_datapath)
> +                                       const char *device,
> +                                       const char *name,
> +                                       int vdpa_device_fd,
> +                                       int queue_pair_index,
> +                                       int nvqs,
> +                                       bool is_datapath,


It's better not mix style changes here (especially it looks correct).


> +                                       bool svq,
> +                                       VhostIOVATree *iova_tree)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -211,8 +222,14 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
> +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> +    s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
> +                              NULL;
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {
> +        if (iova_tree) {
> +            vhost_iova_tree_release(iova_tree);
> +        }
>           qemu_del_net_client(nc);
>           return NULL;
>       }
> @@ -266,6 +283,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       g_autofree NetClientState **ncs = NULL;
>       NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
> +    g_autoptr(VhostIOVATree) iova_tree = NULL;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>       opts = &netdev->u.vhost_vdpa;
> @@ -285,19 +303,31 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           qemu_close(vdpa_device_fd);
>           return queue_pairs;
>       }
> +    if (opts->x_svq) {
> +        struct vhost_vdpa_iova_range iova_range;
> +
> +        if (has_cvq) {
> +            error_setg(errp, "vdpa svq does not work with cvq");
> +            goto err_svq;
> +        }
> +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> +    }
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>   
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true);
> +                                     vdpa_device_fd, i, 2, true, opts->x_svq,
> +                                     iova_tree);
>           if (!ncs[i])
>               goto err;
>       }
>   
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false);
> +                                 vdpa_device_fd, i, 1, false, opts->x_svq,
> +                                 iova_tree);


So we had at most 1 iova_tree here, so the refcnt looks unnecessary.

Thanks


>           if (!nc)
>               goto err;
>       }
> @@ -308,6 +338,8 @@ err:
>       if (i) {
>           qemu_del_net_client(ncs[0]);
>       }
> +
> +err_svq:
>       qemu_close(vdpa_device_fd);
>   
>       return -1;



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 09/25] vhost: move descriptor translation to vhost_svq_vring_write_descs
  2022-04-13 16:31 ` [RFC PATCH v7 09/25] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
@ 2022-04-14  5:48   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  5:48 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> It's done for both in and out descriptors so it's better placed here.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.c | 26 +++++++++++++++-----------
>   1 file changed, 15 insertions(+), 11 deletions(-)


Acked-by: Jason Wang <jasowang@redhat.com>


>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index a2531d5874..f874374651 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -122,17 +122,23 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
>       return true;
>   }
>   
> -static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> -                                    const struct iovec *iovec, size_t num,
> -                                    bool more_descs, bool write)
> +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> +                                        const struct iovec *iovec, size_t num,
> +                                        bool more_descs, bool write)
>   {
>       uint16_t i = svq->free_head, last = svq->free_head;
>       unsigned n;
>       uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>       vring_desc_t *descs = svq->vring.desc;
> +    bool ok;
>   
>       if (num == 0) {
> -        return;
> +        return true;
> +    }
> +
> +    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> +    if (unlikely(!ok)) {
> +        return false;
>       }
>   
>       for (n = 0; n < num; n++) {
> @@ -150,6 +156,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>       }
>   
>       svq->free_head = le16_to_cpu(svq->desc_next[last]);
> +    return true;
>   }
>   
>   static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> @@ -169,21 +176,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>           return false;
>       }
>   
> -    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> +                                     elem->in_num > 0, false);
>       if (unlikely(!ok)) {
>           return false;
>       }
> -    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> -                            elem->in_num > 0, false);
>   
> -
> -    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
> +    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> +                                     true);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
> -
>       /*
>        * Put the entry in the available array (but don't update avail->idx until
>        * they do sync).



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 10/25] vdpa: Fix index calculus at vhost_vdpa_svqs_start
  2022-04-13 16:31 ` [RFC PATCH v7 10/25] vdpa: Fix index calculus at vhost_vdpa_svqs_start Eugenio Pérez
@ 2022-04-14  5:59   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  5:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan


在 2022/4/14 00:31, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---


Acked-by: Jason Wang <jasowang@redhat.com>

(Need some changelog anyhow).

Thanks


>   hw/virtio/vhost-vdpa.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 27ee678dc9..6b370c918c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1019,7 +1019,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
>           VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>           struct vhost_vring_addr addr = {
> -            .index = i,
> +            .index = dev->vq_index + i,
>           };
>           int r;
>           bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq
  2022-04-13 16:32 ` [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq Eugenio Pérez
@ 2022-04-14  9:09   ` Jason Wang
  2022-04-18 14:16     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  9:09 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 12:33 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This isolates shadow cvq in its own group.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  qapi/net.json    |  8 +++-
>  net/vhost-vdpa.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 100 insertions(+), 6 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index 92848e4362..39c245e6cd 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -447,9 +447,12 @@
>  #
>  # @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
>  #         (default: false)
> +# @x-cvq-svq: Start device with (experimental) shadow virtqueue in its own
> +#             virtqueue group. (Since 7.1)
> +#             (default: false)
>  #
>  # Features:
> -# @unstable: Member @x-svq is experimental.
> +# @unstable: Members @x-svq and x-cvq-svq are experimental.
>  #
>  # Since: 5.1
>  ##
> @@ -457,7 +460,8 @@
>    'data': {
>      '*vhostdev':     'str',
>      '*queues':       'int',
> -    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
> +    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] },
> +    '*x-cvq-svq':    {'type': 'bool', 'features' : [ 'unstable'] } } }
>
>  ##
>  # @NetClientDriver:
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a6f803ea4e..851dacb902 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -377,6 +377,17 @@ static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
>      return ret;
>  }
>
> +static int vhost_vdpa_get_backend_features(int fd, uint64_t *features,
> +                                           Error **errp)
> +{
> +    int ret = ioctl(fd, VHOST_GET_BACKEND_FEATURES, features);
> +    if (ret) {
> +        error_setg_errno(errp, errno,
> +            "Fail to query backend features from vhost-vDPA device");
> +    }
> +    return ret;
> +}
> +
>  static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
>                                            int *has_cvq, Error **errp)
>  {
> @@ -410,16 +421,56 @@ static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
>      return 1;
>  }
>
> +/**
> + * Check vdpa device to support CVQ group asid 1
> + *
> + * @vdpa_device_fd: Vdpa device fd
> + * @queue_pairs: Queue pairs
> + * @errp: Error
> + */
> +static int vhost_vdpa_check_cvq_svq(int vdpa_device_fd, int queue_pairs,
> +                                    Error **errp)
> +{
> +    uint64_t backend_features;
> +    unsigned num_as;
> +    int r;
> +
> +    r = vhost_vdpa_get_backend_features(vdpa_device_fd, &backend_features,
> +                                        errp);
> +    if (unlikely(r)) {
> +        return -1;
> +    }
> +
> +    if (unlikely(!(backend_features & VHOST_BACKEND_F_IOTLB_ASID))) {
> +        error_setg(errp, "Device without IOTLB_ASID feature");
> +        return -1;
> +    }
> +
> +    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, errno,
> +                         "Cannot retrieve number of supported ASs");
> +        return -1;
> +    }
> +    if (unlikely(num_as < 2)) {
> +        error_setg(errp, "Insufficient number of ASs (%u, min: 2)", num_as);
> +    }
> +

This is not sufficient, we still need to check whether CVQ doesn't
share a group with other virtqueues.

Thanks

> +    return 0;
> +}
> +
>  int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>                          NetClientState *peer, Error **errp)
>  {
>      const NetdevVhostVDPAOptions *opts;
> +    struct vhost_vdpa_iova_range iova_range;
>      uint64_t features;
>      int vdpa_device_fd;
>      g_autofree NetClientState **ncs = NULL;
>      NetClientState *nc;
>      int queue_pairs, r, i, has_cvq = 0;
>      g_autoptr(VhostIOVATree) iova_tree = NULL;
> +    ERRP_GUARD();
>
>      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>      opts = &netdev->u.vhost_vdpa;
> @@ -444,8 +495,9 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>          qemu_close(vdpa_device_fd);
>          return queue_pairs;
>      }
> -    if (opts->x_svq) {
> -        struct vhost_vdpa_iova_range iova_range;
> +    if (opts->x_cvq_svq || opts->x_svq) {
> +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> +
>          uint64_t invalid_dev_features =
>              features & ~vdpa_svq_device_features &
>              /* Transport are all accepted at this point */
> @@ -457,7 +509,21 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>                         invalid_dev_features);
>              goto err_svq;
>          }
> -        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> +    }
> +
> +    if (opts->x_cvq_svq) {
> +        if (!has_cvq) {
> +            error_setg(errp, "Cannot use x-cvq-svq with a device without cvq");
> +            goto err_svq;
> +        }
> +
> +        r = vhost_vdpa_check_cvq_svq(vdpa_device_fd, queue_pairs, errp);
> +        if (unlikely(r)) {
> +            error_prepend(errp, "Cannot configure CVQ SVQ: ");
> +            goto err_svq;
> +        }
> +    }
> +    if (opts->x_svq) {
>          iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>      }
>
> @@ -472,11 +538,35 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>      }
>
>      if (has_cvq) {
> +        g_autoptr(VhostIOVATree) cvq_iova_tree = NULL;
> +
> +        if (opts->x_cvq_svq) {
> +            cvq_iova_tree = vhost_iova_tree_new(iova_range.first,
> +                                                iova_range.last);
> +        } else if (opts->x_svq) {
> +            cvq_iova_tree = vhost_iova_tree_acquire(iova_tree);
> +        }
> +
>          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>                                   vdpa_device_fd, i, 1,
> -                                 false, opts->x_svq, iova_tree);
> +                                 false, opts->x_cvq_svq || opts->x_svq,
> +                                 cvq_iova_tree);
>          if (!nc)
>              goto err;
> +
> +        if (opts->x_cvq_svq) {
> +            struct vhost_vring_state asid = {
> +                .index = 1,
> +                .num = 1,
> +            };
> +
> +            r = ioctl(vdpa_device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> +            if (unlikely(r)) {
> +                error_setg_errno(errp, errno,
> +                                 "Cannot set cvq group independent asid");
> +                goto err;
> +            }
> +        }
>      }
>
>      return 0;
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject
  2022-04-13 16:32 ` [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject Eugenio Pérez
@ 2022-04-14  9:09   ` Jason Wang
  2022-04-18 13:58     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  9:09 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows qemu to inject packets to the device without guest's notice.

Does it mean it can support guests without _F_ANNOUNCE?

>
> This will be use to inject net CVQ messages to restore status in the destination

I guess for restoring, we should set cvq.ready = true but all other
(TX/RX) as false before we complete the restoring? If yes, I don't see
codes to do that.

>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.h |   5 +
>  hw/virtio/vhost-shadow-virtqueue.c | 179 +++++++++++++++++++++++++----
>  2 files changed, 160 insertions(+), 24 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index e06ac52158..2a5229e77f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -17,6 +17,9 @@
>
>  typedef struct SVQElement {
>      VirtQueueElement elem;
> +    hwaddr in_iova;
> +    hwaddr out_iova;
> +    bool not_from_guest;

Let's add a comment for those fields.

>  } SVQElement;
>
>  typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> @@ -106,6 +109,8 @@ typedef struct VhostShadowVirtqueue {
>
>  bool vhost_svq_valid_features(uint64_t features, Error **errp);
>
> +bool vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> +                      size_t out_num, size_t in_num);
>  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>  void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
>  void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 87980e2a9c..f3600df133 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -16,6 +16,7 @@
>  #include "qemu/log.h"
>  #include "qemu/memalign.h"
>  #include "linux-headers/linux/vhost.h"
> +#include "qemu/iov.h"
>
>  /**
>   * Validate the transport device features that both guests can use with the SVQ
> @@ -122,7 +123,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
>      return true;
>  }
>
> -static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                        SVQElement *svq_elem, hwaddr *sg,
>                                          const struct iovec *iovec, size_t num,
>                                          bool more_descs, bool write)
>  {
> @@ -130,15 +132,39 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>      unsigned n;
>      uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>      vring_desc_t *descs = svq->vring.desc;
> -    bool ok;
>
>      if (num == 0) {
>          return true;
>      }
>
> -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> -    if (unlikely(!ok)) {
> -        return false;
> +    if (svq_elem->not_from_guest) {
> +        DMAMap map = {
> +            .translated_addr = (hwaddr)iovec->iov_base,
> +            .size = ROUND_UP(iovec->iov_len, 4096) - 1,
> +            .perm = write ? IOMMU_RW : IOMMU_RO,
> +        };
> +        int r;
> +
> +        if (unlikely(num != 1)) {
> +            error_report("Unexpected chain of element injected");
> +            return false;
> +        }
> +        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
> +        if (unlikely(r != IOVA_OK)) {
> +            error_report("Cannot map injected element");
> +            return false;
> +        }
> +
> +        r = svq->map_ops->map(map.iova, map.size + 1,
> +                              (void *)map.translated_addr, !write,
> +                              svq->map_ops_opaque);
> +        assert(r == 0);
> +        sg[0] = map.iova;
> +    } else {
> +        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> +        if (unlikely(!ok)) {
> +            return false;
> +        }
>      }
>
>      for (n = 0; n < num; n++) {
> @@ -166,7 +192,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
>      unsigned avail_idx;
>      vring_avail_t *avail = svq->vring.avail;
>      bool ok;
> -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> +    g_autofree hwaddr *sgs = NULL;
> +    hwaddr *in_sgs, *out_sgs;
>
>      *head = svq->free_head;
>
> @@ -177,15 +204,23 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
>          return false;
>      }
>
> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> -                                     elem->in_num > 0, false);
> +    if (!svq_elem->not_from_guest) {
> +        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> +        in_sgs = out_sgs = sgs;
> +    } else {
> +        in_sgs = &svq_elem->in_iova;
> +        out_sgs = &svq_elem->out_iova;
> +    }
> +    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, elem->out_sg,
> +                                     elem->out_num, elem->in_num > 0, false);
>      if (unlikely(!ok)) {
>          return false;
>      }
>
> -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> -                                     true);
> +    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, elem->in_sg,
> +                                     elem->in_num, false, true);
>      if (unlikely(!ok)) {
> +        /* TODO unwind out_sg */
>          return false;
>      }
>
> @@ -230,6 +265,43 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>      event_notifier_set(&svq->hdev_kick);
>  }
>
> +bool vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> +                      size_t out_num, size_t in_num)
> +{
> +    size_t out_size = iov_size(iov, out_num);
> +    size_t out_buf_size = ROUND_UP(out_size, 4096);
> +    size_t in_size = iov_size(iov + out_num, in_num);
> +    size_t in_buf_size = ROUND_UP(in_size, 4096);
> +    SVQElement *svq_elem;
> +    uint16_t num_slots = (in_num ? 1 : 0) + (out_num ? 1 : 0);
> +
> +    if (unlikely(num_slots == 0 || svq->next_guest_avail_elem ||
> +                 vhost_svq_available_slots(svq) < num_slots)) {
> +        return false;
> +    }
> +
> +    svq_elem = virtqueue_alloc_element(sizeof(SVQElement), 1, 1);
> +    if (out_num) {
> +        void *out = qemu_memalign(4096, out_buf_size);
> +        svq_elem->elem.out_sg[0].iov_base = out;
> +        svq_elem->elem.out_sg[0].iov_len = out_size;
> +        iov_to_buf(iov, out_num, 0, out, out_size);
> +        memset(out + out_size, 0, out_buf_size - out_size);
> +    }
> +    if (in_num) {
> +        void *in = qemu_memalign(4096, in_buf_size);
> +        svq_elem->elem.in_sg[0].iov_base = in;
> +        svq_elem->elem.in_sg[0].iov_len = in_size;
> +        memset(in, 0, in_buf_size);
> +    }
> +
> +    svq_elem->not_from_guest = true;
> +    vhost_svq_add(svq, svq_elem);
> +    vhost_svq_kick(svq);
> +

Should we wait for the completion before moving forward? Otherwise we
will have a race.

And if we wait for the completion (e.g doing busy polling), I think we
can avoid the auxiliary structures like
in_iova/out_iova/not_from_guest by doing mapping before
vhost_svq_add() to keep it clean.

Thanks

> +    return true;
> +}
> +
>  /**
>   * Forward available buffers.
>   *
> @@ -267,6 +339,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>                  break;
>              }
>
> +            svq_elem->not_from_guest = false;
>              elem = &svq_elem->elem;
>              if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
>                  /*
> @@ -391,6 +464,31 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
>      return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>  }
>
> +static int vhost_svq_unmap(VhostShadowVirtqueue *svq, hwaddr iova, size_t size)
> +{
> +    DMAMap needle = {
> +        .iova = iova,
> +        .size = size,
> +    };
> +    const DMAMap *overlap;
> +
> +    while ((overlap = vhost_iova_tree_find(svq->iova_tree, &needle))) {
> +        DMAMap needle = *overlap;
> +
> +        if (svq->map_ops->unmap) {
> +            int r = svq->map_ops->unmap(overlap->iova, overlap->size + 1,
> +                                        svq->map_ops_opaque);
> +            if (unlikely(r != 0)) {
> +                return r;
> +            }
> +        }
> +        qemu_vfree((void *)overlap->translated_addr);
> +        vhost_iova_tree_remove(svq->iova_tree, &needle);
> +    }
> +
> +    return 0;
> +}
> +
>  static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>                              bool check_for_avail_queue)
>  {
> @@ -410,23 +508,56 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>              }
>
>              elem = &svq_elem->elem;
> -            if (unlikely(i >= svq->vring.num)) {
> -                qemu_log_mask(LOG_GUEST_ERROR,
> -                         "More than %u used buffers obtained in a %u size SVQ",
> -                         i, svq->vring.num);
> -                virtqueue_fill(vq, elem, len, i);
> -                virtqueue_flush(vq, i);
> -                return;
> -            }
> -            virtqueue_fill(vq, elem, len, i++);
> -
>              if (svq->ops && svq->ops->used_elem_handler) {
>                  svq->ops->used_elem_handler(svq->vdev, elem);
>              }
> +
> +            if (svq_elem->not_from_guest) {
> +                if (unlikely(!elem->out_num && elem->out_num != 1)) {
> +                    error_report("Unexpected out_num > 1");
> +                    return;
> +                }
> +
> +                if (elem->out_num) {
> +                    int r = vhost_svq_unmap(svq, svq_elem->out_iova,
> +                                            elem->out_sg[0].iov_len);
> +                    if (unlikely(r != 0)) {
> +                        error_report("Cannot unmap out buffer");
> +                        return;
> +                    }
> +                }
> +
> +                if (unlikely(!elem->in_num && elem->in_num != 1)) {
> +                    error_report("Unexpected in_num > 1");
> +                    return;
> +                }
> +
> +                if (elem->in_num) {
> +                    int r = vhost_svq_unmap(svq, svq_elem->in_iova,
> +                                            elem->in_sg[0].iov_len);
> +                    if (unlikely(r != 0)) {
> +                        error_report("Cannot unmap out buffer");
> +                        return;
> +                    }
> +                }
> +            } else {
> +                if (unlikely(i >= svq->vring.num)) {
> +                    qemu_log_mask(
> +                        LOG_GUEST_ERROR,
> +                        "More than %u used buffers obtained in a %u size SVQ",
> +                        i, svq->vring.num);
> +                    virtqueue_fill(vq, elem, len, i);
> +                    virtqueue_flush(vq, i);
> +                    return;
> +                }
> +                virtqueue_fill(vq, elem, len, i++);
> +            }
>          }
>
> -        virtqueue_flush(vq, i);
> -        event_notifier_set(&svq->svq_call);
> +        if (i > 0) {
> +            virtqueue_flush(vq, i);
> +            event_notifier_set(&svq->svq_call);
> +        }
>
>          if (check_for_avail_queue && svq->next_guest_avail_elem) {
>              /*
> @@ -590,13 +721,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>      for (unsigned i = 0; i < svq->vring.num; ++i) {
>          g_autofree SVQElement *svq_elem = NULL;
>          svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
> -        if (svq_elem) {
> +        if (svq_elem && !svq_elem->not_from_guest) {
>              virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
>          }
>      }
>
>      next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> -    if (next_avail_elem) {
> +    if (next_avail_elem && !next_avail_elem->not_from_guest) {
>          virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
>      }
>      svq->vq = NULL;
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue
  2022-04-13 16:31 ` [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
@ 2022-04-14  9:10   ` Jason Wang
  2022-04-18 10:55     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  9:10 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Introduce the control virtqueue support for vDPA shadow virtqueue. This
> is needed for advanced networking features like multiqueue.
>
> To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
> VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
> support and virtio-net driver changes MAC or the number of queues
> virtio-net device model will be updated with the new one.
>
> Others cvq commands could be added here straightforwardly but they have
> been not tested.

If I understand the code correctly, the cvq can still see all the
guest mappings. I wonder if it's simpler to:

1) find a way to reuse the ctrl handler in virtio-net.c
2) do not expose all the guest memory to shadow cvq.

>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  net/vhost-vdpa.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 77 insertions(+), 3 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index a8dde49198..38e6912255 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -11,6 +11,7 @@
>
>  #include "qemu/osdep.h"
>  #include "clients.h"
> +#include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "net/vhost-vdpa.h"
>  #include "hw/virtio/vhost-vdpa.h"
> @@ -69,6 +70,30 @@ const int vdpa_feature_bits[] = {
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> +/** Supported device specific feature bits with SVQ */
> +static const uint64_t vdpa_svq_device_features =
> +    BIT_ULL(VIRTIO_NET_F_CSUM) |
> +    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
> +    BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
> +    BIT_ULL(VIRTIO_NET_F_MTU) |
> +    BIT_ULL(VIRTIO_NET_F_MAC) |
> +    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
> +    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
> +    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
> +    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
> +    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
> +    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
> +    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
> +    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
> +    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
> +    BIT_ULL(VIRTIO_NET_F_STATUS) |
> +    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
> +    BIT_ULL(VIRTIO_NET_F_MQ) |
> +    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
> +    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
> +    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> +    BIT_ULL(VIRTIO_NET_F_STANDBY);

I wonder what's the reason for having a dedicated feature whitelist for SVQ?

> +
>  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
>  {
>      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> @@ -196,6 +221,46 @@ static int vhost_vdpa_get_iova_range(int fd,
>      return ret < 0 ? -errno : 0;
>  }
>
> +static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
> +                                       const VirtQueueElement *elem)
> +{
> +    struct virtio_net_ctrl_hdr ctrl;
> +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> +    size_t s;
> +    struct iovec in = {
> +        .iov_base = &status,
> +        .iov_len = sizeof(status),
> +    };
> +
> +    s = iov_to_buf(elem->out_sg, elem->out_num, 0, &ctrl, sizeof(ctrl.class));
> +    if (s != sizeof(ctrl.class)) {
> +        return;
> +    }
> +
> +    switch (ctrl.class) {
> +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> +    case VIRTIO_NET_CTRL_MQ:
> +        break;
> +    default:
> +        return;
> +    };

Any reason that we only support those two commands?

> +
> +    s = iov_to_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
> +    if (s != sizeof(status) || status != VIRTIO_NET_OK) {
> +        return;
> +    }
> +
> +    status = VIRTIO_NET_ERR;
> +    virtio_net_handle_ctrl_iov(vdev, &in, 1, elem->out_sg, elem->out_num);
> +    if (status != VIRTIO_NET_OK) {

status is guaranteed to be VIRTIO_NET_ERROR, so we hit the error for sure?

Thanks

> +        error_report("Bad CVQ processing in model");
> +    }
> +}
> +
> +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> +    .used_elem_handler = vhost_vdpa_net_handle_ctrl,
> +};
> +
>  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>                                         const char *device,
>                                         const char *name,
> @@ -225,6 +290,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>      s->vhost_vdpa.shadow_vqs_enabled = svq;
>      s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
>                                NULL;
> +    if (!is_datapath) {
> +        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> +    }
>      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>      if (ret) {
>          if (iova_tree) {
> @@ -315,9 +383,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>      }
>      if (opts->x_svq) {
>          struct vhost_vdpa_iova_range iova_range;
> -
> -        if (has_cvq) {
> -            error_setg(errp, "vdpa svq does not work with cvq");
> +        uint64_t invalid_dev_features =
> +            features & ~vdpa_svq_device_features &
> +            /* Transport are all accepted at this point */
> +            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
> +                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
> +
> +        if (invalid_dev_features) {
> +            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
> +                       invalid_dev_features);
>              goto err_svq;
>          }
>          vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device
  2022-04-13 16:32 ` [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device Eugenio Pérez
@ 2022-04-14  9:10   ` Jason Wang
  2022-04-18 14:03     ` Eugenio Perez Martin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2022-04-14  9:10 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 12:33 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> We can configure ASID per group, but we still use asid 0 for every vdpa
> device. Multiple asid support for cvq will be introduced in next
> patches
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/hw/virtio/vhost.h |  4 ++
>  hw/net/vhost_net.c        |  5 +++
>  hw/virtio/vhost-vdpa.c    | 95 ++++++++++++++++++++++++++++++++-------
>  net/vhost-vdpa.c          |  4 +-
>  hw/virtio/trace-events    |  9 ++--
>  5 files changed, 94 insertions(+), 23 deletions(-)
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 034868fa9e..640cf82168 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -76,8 +76,12 @@ struct vhost_dev {
>      int vq_index;
>      /* one past the last vq index for the virtio device (not vhost) */
>      int vq_index_end;
> +    /* one past the last vq index of this virtqueue group */
> +    int vq_group_index_end;
>      /* if non-zero, minimum required value for max_queues */
>      int num_queues;
> +    /* address space id */

Instead of doing shortcuts like this, I think we need to have
abstraction as what kernel did. That is to say, introduce structures
like:

struct vhost_vdpa_dev_group;
struct vhost_vdpa_as;

Then having pointers to those structures like

struct vhost_vdpa {
        ...
        struct vhost_vdpa_dev_group *group;
};

struct vhost_vdpa_group {
        ...
        uint32_t id;
        struct vhost_vdpa_as;
};

struct vhost_vdpa_as {
        uint32_t id;
        MemoryListener listener;
};

We can read the group topology during initialization and allocate the
structure accordingly. If the CVQ has its own group:

1) We know we will have 2 AS otherwise 1 AS
2) allocate #AS and attach the group to the corresponding AS

Then we know the

1) map/unmap and listener is done per as instead of per group or vdpa.
2) AS attach/detach is done per group

And it would simplify the future extension when we want to advertise
the as/groups to guests.

To simplify the reviewing, we can introduce the above concept before
the ASID uAPIs and assume a 1 group 1 as a model as a start.

Thanks

> +    uint32_t address_space_id;
>      /* Must be a vq group different than any other vhost dev */
>      bool independent_vq_group;
>      uint64_t features;
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 10480e19e5..a34df739a7 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -344,15 +344,20 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>
>      for (i = 0; i < nvhosts; i++) {
>          bool cvq_idx = i >= data_queue_pairs;
> +        uint32_t vq_group_end;
>
>          if (!cvq_idx) {
>              peer = qemu_get_peer(ncs, i);
> +            vq_group_end = 2 * data_queue_pairs;
>          } else { /* Control Virtqueue */
>              peer = qemu_get_peer(ncs, n->max_queue_pairs);
> +            vq_group_end = 2 * data_queue_pairs + 1;
>          }
>
>          net = get_vhost_net(peer);
> +        net->dev.address_space_id = !!cvq_idx;
>          net->dev.independent_vq_group = !!cvq_idx;
> +        net->dev.vq_group_index_end = vq_group_end;
>          vhost_net_set_vq_index(net, i * 2, index_end);
>
>          /* Suppress the masking guest notifiers on vhost user
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 4096555242..5ed211287c 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -79,6 +79,9 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
>      int ret = 0;
>
>      msg.type = v->msg_type;
> +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
> +        msg.asid = v->dev->address_space_id;
> +    }
>      msg.iotlb.iova = iova;
>      msg.iotlb.size = size;
>      msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
> @@ -90,8 +93,9 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
>          return 0;
>      }
>
> -   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
> -                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
> +    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> +                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
> +                             msg.iotlb.type);
>
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
> @@ -109,6 +113,9 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
>      int fd = v->device_fd;
>      int ret = 0;
>
> +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
> +        msg.asid = v->dev->address_space_id;
> +    }
>      msg.type = v->msg_type;
>      msg.iotlb.iova = iova;
>      msg.iotlb.size = size;
> @@ -119,7 +126,7 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
>          return 0;
>      }
>
> -    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
> +    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
>                                 msg.iotlb.size, msg.iotlb.type);
>
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> @@ -134,6 +141,7 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
>  static void vhost_vdpa_listener_commit(MemoryListener *listener)
>  {
>      struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +    struct vhost_dev *dev = v->dev;
>      struct vhost_msg_v2 msg = {};
>      int fd = v->device_fd;
>      size_t num = v->iotlb_updates->len;
> @@ -142,9 +150,14 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>          return;
>      }
>
> +    if (dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_ASID)) {
> +        msg.asid = v->dev->address_space_id;
> +    }
> +
>      msg.type = v->msg_type;
>      msg.iotlb.type = VHOST_IOTLB_BATCH_BEGIN;
> -    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
> +    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.asid,
> +                                          msg.iotlb.type);
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write BEGIN_BATCH, fd=%d, errno=%d (%s)",
>                       fd, errno, strerror(errno));
> @@ -162,7 +175,8 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>      }
>
>      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> +    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.asid,
> +                                     msg.iotlb.type);
>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
>                       fd, errno, strerror(errno));
> @@ -1171,10 +1185,48 @@ call_err:
>      return false;
>  }
>
> +static int vhost_vdpa_set_vq_group_address_space_id(struct vhost_dev *dev,
> +                                                struct vhost_vring_state *asid)
> +{
> +    trace_vhost_vdpa_set_vq_group_address_space_id(dev, asid->index, asid->num);
> +    return vhost_vdpa_call(dev, VHOST_VDPA_SET_GROUP_ASID, asid);
> +}
> +
> +static int vhost_vdpa_set_address_space_id(struct vhost_dev *dev)
> +{
> +    struct vhost_vring_state vq_group = {
> +        .index = dev->vq_index,
> +    };
> +    struct vhost_vring_state asid;
> +    int ret;
> +
> +    if (!dev->address_space_id) {
> +        return 0;
> +    }
> +
> +    ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> +    if (unlikely(ret)) {
> +        error_report("Can't read vq group, errno=%d (%s)", ret,
> +                     g_strerror(-ret));
> +        return ret;
> +    }
> +
> +    asid.index = vq_group.num;
> +    asid.num = dev->address_space_id;
> +    ret = vhost_vdpa_set_vq_group_address_space_id(dev, &asid);
> +    if (unlikely(ret)) {
> +        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> +            asid.index, asid.num, ret, g_strerror(-ret));
> +    }
> +    return ret;
> +}
> +
>  static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>  {
>      struct vhost_vdpa *v = dev->opaque;
> -    bool ok;
> +    bool vq_group_end, ok;
> +    int r = 0;
> +
>      trace_vhost_vdpa_dev_start(dev, started);
>
>      if (started) {
> @@ -1183,6 +1235,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>              !vhost_dev_is_independent_group(dev)) {
>              return -1;
>          }
> +        r = vhost_vdpa_set_address_space_id(dev);
> +        if (unlikely(r)) {
> +            return r;
> +        }
>          ok = vhost_vdpa_svqs_start(dev);
>          if (unlikely(!ok)) {
>              return -1;
> @@ -1196,21 +1252,26 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>          vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>      }
>
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> -        return 0;
> +    vq_group_end = dev->vq_index + dev->nvqs == dev->vq_group_index_end;
> +    if (vq_group_end && started) {
> +        memory_listener_register(&v->listener, &address_space_memory);
>      }
>
> -    if (started) {
> -        memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> +        if (started) {
> +            r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +        } else {
> +            vhost_vdpa_reset_device(dev);
> +            vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                                       VIRTIO_CONFIG_S_DRIVER);
> +        }
> +    }
>
> -        return 0;
> +    if (vq_group_end && !started) {
> +        memory_listener_unregister(&v->listener);
>      }
> +
> +    return r;
>  }
>
>  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 15c3e4f703..a6f803ea4e 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -473,8 +473,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>
>      if (has_cvq) {
>          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false, opts->x_svq,
> -                                 iova_tree);
> +                                 vdpa_device_fd, i, 1,
> +                                 false, opts->x_svq, iova_tree);
>          if (!nc)
>              goto err;
>      }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index e6fdc03514..2858deac60 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -23,10 +23,10 @@ vhost_user_postcopy_waker_found(uint64_t client_addr) "0x%"PRIx64
>  vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%"PRIx64
>
>  # vhost-vdpa.c
> -vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> -vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> -vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> +vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> +vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
>  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
>  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> @@ -44,6 +44,7 @@ vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
>  vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
>  vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
>  vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> +vhost_vdpa_set_vq_group_address_space_id(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
>  vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
>  vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
>  vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 18/25] vdpa: Add map/unmap operation callback to SVQ
  2022-04-13 16:31 ` [RFC PATCH v7 18/25] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
@ 2022-04-14  9:13   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  9:13 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>

Let's document the motivation here. It looks to me we don't have more
than one kind of map ops implemented in this series.

Thanks

> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.h | 21 +++++++++++++++++++--
>  hw/virtio/vhost-shadow-virtqueue.c |  8 +++++++-
>  hw/virtio/vhost-vdpa.c             | 20 +++++++++++++++++++-
>  3 files changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 2809dee27b..e06ac52158 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -26,6 +26,15 @@ typedef struct VhostShadowVirtqueueOps {
>      VirtQueueElementCallback used_elem_handler;
>  } VhostShadowVirtqueueOps;
>
> +typedef int (*vhost_svq_map_op)(hwaddr iova, hwaddr size, void *vaddr,
> +                                bool readonly, void *opaque);
> +typedef int (*vhost_svq_unmap_op)(hwaddr iova, hwaddr size, void *opaque);
> +
> +typedef struct VhostShadowVirtqueueMapOps {
> +    vhost_svq_map_op map;
> +    vhost_svq_unmap_op unmap;
> +} VhostShadowVirtqueueMapOps;
> +
>  /* Shadow virtqueue to relay notifications */
>  typedef struct VhostShadowVirtqueue {
>      /* Shadow vring */
> @@ -73,6 +82,12 @@ typedef struct VhostShadowVirtqueue {
>      /* Optional callbacks */
>      const VhostShadowVirtqueueOps *ops;
>
> +    /* Device memory mapping callbacks */
> +    const VhostShadowVirtqueueMapOps *map_ops;
> +
> +    /* Device memory mapping callbacks opaque */
> +    void *map_ops_opaque;
> +
>      /* Optional custom used virtqueue element handler */
>      VirtQueueElementCallback used_elem_cb;
>
> @@ -102,8 +117,10 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>                       VirtQueue *vq);
>  void vhost_svq_stop(VhostShadowVirtqueue *svq);
>
> -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> -                                    const VhostShadowVirtqueueOps *ops);
> +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
> +                                    const VhostShadowVirtqueueOps *ops,
> +                                    const VhostShadowVirtqueueMapOps *map_ops,
> +                                    void *map_ops_opaque);
>
>  void vhost_svq_free(gpointer vq);
>  G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 72a403d90b..87980e2a9c 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -612,13 +612,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   *
>   * @iova_tree: Tree to perform descriptors translations
>   * @ops: SVQ operations hooks
> + * @map_ops: SVQ mapping operation hooks
> + * @map_ops_opaque: Opaque data to pass to mapping operations
>   *
>   * Returns the new virtqueue or NULL.
>   *
>   * In case of error, reason is reported through error_report.
>   */
>  VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
> -                                    const VhostShadowVirtqueueOps *ops)
> +                                    const VhostShadowVirtqueueOps *ops,
> +                                    const VhostShadowVirtqueueMapOps *map_ops,
> +                                    void *map_ops_opaque)
>  {
>      g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>      int r;
> @@ -641,6 +645,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
>      event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>      svq->iova_tree = iova_tree;
>      svq->ops = ops;
> +    svq->map_ops = map_ops;
> +    svq->map_ops_opaque = map_ops_opaque;
>      return g_steal_pointer(&svq);
>
>  err_init_hdev_call:
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 9e62f3280d..1948c5ca7d 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -384,6 +384,22 @@ static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
>      return ret;
>  }
>
> +static int vhost_vdpa_svq_map(hwaddr iova, hwaddr size, void *vaddr,
> +                              bool readonly, void *opaque)
> +{
> +    return vhost_vdpa_dma_map(opaque, iova, size, vaddr, readonly);
> +}
> +
> +static int vhost_vdpa_svq_unmap(hwaddr iova, hwaddr size, void *opaque)
> +{
> +    return vhost_vdpa_dma_unmap(opaque, iova, size);
> +}
> +
> +static const VhostShadowVirtqueueMapOps vhost_vdpa_svq_map_ops = {
> +    .map = vhost_vdpa_svq_map,
> +    .unmap = vhost_vdpa_svq_unmap,
> +};
> +
>  static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                 Error **errp)
>  {
> @@ -411,7 +427,9 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>      shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
>      for (unsigned n = 0; n < hdev->nvqs; ++n) {
>          g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
> -                                                            v->shadow_vq_ops);
> +                                                       v->shadow_vq_ops,
> +                                                       &vhost_vdpa_svq_map_ops,
> +                                                       v);
>
>          if (unlikely(!svq)) {
>              error_setg(errp, "Cannot create svq %u", n);
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 20/25] vdpa: add NetClientState->start() callback
  2022-04-13 16:32 ` [RFC PATCH v7 20/25] vdpa: add NetClientState->start() callback Eugenio Pérez
@ 2022-04-14  9:14   ` Jason Wang
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Wang @ 2022-04-14  9:14 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 12:33 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> It allows to inject custom code on device success start, right before
> release lock.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/net/net.h  | 2 ++
>  hw/net/vhost_net.c | 4 ++++

I wonder if we can do it in the vhost-vdpa layer.

Thanks

>  2 files changed, 6 insertions(+)
>
> diff --git a/include/net/net.h b/include/net/net.h
> index 523136c7ac..2fc3002ab4 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -44,6 +44,7 @@ typedef struct NICConf {
>
>  typedef void (NetPoll)(NetClientState *, bool enable);
>  typedef bool (NetCanReceive)(NetClientState *);
> +typedef void (NetStart)(NetClientState *);
>  typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
>  typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
>  typedef void (NetCleanup) (NetClientState *);
> @@ -71,6 +72,7 @@ typedef struct NetClientInfo {
>      NetReceive *receive_raw;
>      NetReceiveIOV *receive_iov;
>      NetCanReceive *can_receive;
> +    NetStart *start;
>      NetCleanup *cleanup;
>      LinkStatusChanged *link_status_changed;
>      QueryRxFilter *query_rx_filter;
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 30379d2ca4..44a105ec29 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -274,6 +274,10 @@ static int vhost_net_start_one(struct vhost_net *net,
>              }
>          }
>      }
> +
> +    if (net->nc->info->start) {
> +        net->nc->info->start(net->nc);
> +    }
>      return 0;
>  fail:
>      file.fd = -1;
> --
> 2.27.0
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions
  2022-04-14  5:32   ` Jason Wang
@ 2022-04-18 10:36     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-18 10:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-level, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 7:32 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/4/14 00:31, Eugenio Pérez 写道:
> > Finally offering the possibility to enable SVQ from the command line.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   qapi/net.json    |  9 ++++++++-
> >   net/vhost-vdpa.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> >   2 files changed, 48 insertions(+), 9 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index b92f3f5fb4..92848e4362 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -445,12 +445,19 @@
> >   # @queues: number of queues to be created for multiqueue vhost-vdpa
> >   #          (default: 1)
> >   #
> > +# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
> > +#         (default: false)
> > +#
> > +# Features:
> > +# @unstable: Member @x-svq is experimental.
> > +#
> >   # Since: 5.1
> >   ##
> >   { 'struct': 'NetdevVhostVDPAOptions',
> >     'data': {
> >       '*vhostdev':     'str',
> > -    '*queues':       'int' } }
> > +    '*queues':       'int',
> > +    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
> >
> >   ##
> >   # @NetClientDriver:
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 1e9fe47c03..9261101af2 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -128,6 +128,7 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
> >   {
> >       VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> >
> > +    g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_release);
> >       if (s->vhost_net) {
> >           vhost_net_cleanup(s->vhost_net);
> >           g_free(s->vhost_net);
> > @@ -187,13 +188,23 @@ static NetClientInfo net_vhost_vdpa_info = {
> >           .check_peer_type = vhost_vdpa_check_peer_type,
> >   };
> >
> > +static int vhost_vdpa_get_iova_range(int fd,
> > +                                     struct vhost_vdpa_iova_range *iova_range)
> > +{
> > +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> > +
> > +    return ret < 0 ? -errno : 0;
> > +}
> > +
> >   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > -                                           const char *device,
> > -                                           const char *name,
> > -                                           int vdpa_device_fd,
> > -                                           int queue_pair_index,
> > -                                           int nvqs,
> > -                                           bool is_datapath)
> > +                                       const char *device,
> > +                                       const char *name,
> > +                                       int vdpa_device_fd,
> > +                                       int queue_pair_index,
> > +                                       int nvqs,
> > +                                       bool is_datapath,
>
>
> It's better not mix style changes here (especially it looks correct).
>
>
> > +                                       bool svq,
> > +                                       VhostIOVATree *iova_tree)
> >   {
> >       NetClientState *nc = NULL;
> >       VhostVDPAState *s;
> > @@ -211,8 +222,14 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >
> >       s->vhost_vdpa.device_fd = vdpa_device_fd;
> >       s->vhost_vdpa.index = queue_pair_index;
> > +    s->vhost_vdpa.shadow_vqs_enabled = svq;
> > +    s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
> > +                              NULL;
> >       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >       if (ret) {
> > +        if (iova_tree) {
> > +            vhost_iova_tree_release(iova_tree);
> > +        }
> >           qemu_del_net_client(nc);
> >           return NULL;
> >       }
> > @@ -266,6 +283,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >       g_autofree NetClientState **ncs = NULL;
> >       NetClientState *nc;
> >       int queue_pairs, i, has_cvq = 0;
> > +    g_autoptr(VhostIOVATree) iova_tree = NULL;
> >
> >       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >       opts = &netdev->u.vhost_vdpa;
> > @@ -285,19 +303,31 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >           qemu_close(vdpa_device_fd);
> >           return queue_pairs;
> >       }
> > +    if (opts->x_svq) {
> > +        struct vhost_vdpa_iova_range iova_range;
> > +
> > +        if (has_cvq) {
> > +            error_setg(errp, "vdpa svq does not work with cvq");
> > +            goto err_svq;
> > +        }
> > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> > +    }
> >
> >       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >
> >       for (i = 0; i < queue_pairs; i++) {
> >           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                     vdpa_device_fd, i, 2, true);
> > +                                     vdpa_device_fd, i, 2, true, opts->x_svq,
> > +                                     iova_tree);
> >           if (!ncs[i])
> >               goto err;
> >       }
> >
> >       if (has_cvq) {
> >           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                 vdpa_device_fd, i, 1, false);
> > +                                 vdpa_device_fd, i, 1, false, opts->x_svq,
> > +                                 iova_tree);
>
>
> So we had at most 1 iova_tree here, so the refcnt looks unnecessary.
>

It's needed later, I can reorder the patch order so refcount is
introduced right before the patch that uses it.

Thanks!

> Thanks
>
>
> >           if (!nc)
> >               goto err;
> >       }
> > @@ -308,6 +338,8 @@ err:
> >       if (i) {
> >           qemu_del_net_client(ncs[0]);
> >       }
> > +
> > +err_svq:
> >       qemu_close(vdpa_device_fd);
> >
> >       return -1;
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue
  2022-04-14  9:10   ` Jason Wang
@ 2022-04-18 10:55     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-18 10:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Apr 14, 2022 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > Introduce the control virtqueue support for vDPA shadow virtqueue. This
> > is needed for advanced networking features like multiqueue.
> >
> > To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
> > VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
> > support and virtio-net driver changes MAC or the number of queues
> > virtio-net device model will be updated with the new one.
> >
> > Others cvq commands could be added here straightforwardly but they have
> > been not tested.
>
> If I understand the code correctly, the cvq can still see all the
> guest mappings. I wonder if it's simpler to:
>
> 1) find a way to reuse the ctrl handler in virtio-net.c

It's reused, that's why virtio_net_handle_ctrl_iov is extracted from
virtio_net_handle_ctrl.

> 2) do not expose all the guest memory to shadow cvq.
>

It can be done that way actually, but it would include a map and unmap
for each control command call. I'll explore that approach, thanks!

> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  net/vhost-vdpa.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 77 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index a8dde49198..38e6912255 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -11,6 +11,7 @@
> >
> >  #include "qemu/osdep.h"
> >  #include "clients.h"
> > +#include "hw/virtio/virtio-net.h"
> >  #include "net/vhost_net.h"
> >  #include "net/vhost-vdpa.h"
> >  #include "hw/virtio/vhost-vdpa.h"
> > @@ -69,6 +70,30 @@ const int vdpa_feature_bits[] = {
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > +/** Supported device specific feature bits with SVQ */
> > +static const uint64_t vdpa_svq_device_features =
> > +    BIT_ULL(VIRTIO_NET_F_CSUM) |
> > +    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
> > +    BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
> > +    BIT_ULL(VIRTIO_NET_F_MTU) |
> > +    BIT_ULL(VIRTIO_NET_F_MAC) |
> > +    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
> > +    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
> > +    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
> > +    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
> > +    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
> > +    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
> > +    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
> > +    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
> > +    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
> > +    BIT_ULL(VIRTIO_NET_F_STATUS) |
> > +    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
> > +    BIT_ULL(VIRTIO_NET_F_MQ) |
> > +    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
> > +    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
> > +    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
> > +    BIT_ULL(VIRTIO_NET_F_STANDBY);
>
> I wonder what's the reason for having a dedicated feature whitelist for SVQ?
>

We cannot be sure that future commands do not require modifications to
qemu. Same as with the switch, I can dedicate time to test all of the
currently supported cvq commands and delete this.

> > +
> >  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> >  {
> >      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > @@ -196,6 +221,46 @@ static int vhost_vdpa_get_iova_range(int fd,
> >      return ret < 0 ? -errno : 0;
> >  }
> >
> > +static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
> > +                                       const VirtQueueElement *elem)
> > +{
> > +    struct virtio_net_ctrl_hdr ctrl;
> > +    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> > +    size_t s;
> > +    struct iovec in = {
> > +        .iov_base = &status,
> > +        .iov_len = sizeof(status),
> > +    };
> > +
> > +    s = iov_to_buf(elem->out_sg, elem->out_num, 0, &ctrl, sizeof(ctrl.class));
> > +    if (s != sizeof(ctrl.class)) {
> > +        return;
> > +    }
> > +
> > +    switch (ctrl.class) {
> > +    case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> > +    case VIRTIO_NET_CTRL_MQ:
> > +        break;
> > +    default:
> > +        return;
> > +    };
>
> Any reason that we only support those two commands?
>

Lack of testing, basically. I can try to test all of them for the next
patch series.

> > +
> > +    s = iov_to_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
> > +    if (s != sizeof(status) || status != VIRTIO_NET_OK) {
> > +        return;
> > +    }
> > +
> > +    status = VIRTIO_NET_ERR;
> > +    virtio_net_handle_ctrl_iov(vdev, &in, 1, elem->out_sg, elem->out_num);
> > +    if (status != VIRTIO_NET_OK) {
>
> status is guaranteed to be VIRTIO_NET_ERROR, so we hit the error for sure?
>

Status is modified through "in" iovec virtio_net_handle_ctrl_iov
parameter, but it is not immediate just seeing this piece of code in
isolation. I can try to make it clearer.

Thanks!

> Thanks
>
> > +        error_report("Bad CVQ processing in model");
> > +    }
> > +}
> > +
> > +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
> > +    .used_elem_handler = vhost_vdpa_net_handle_ctrl,
> > +};
> > +
> >  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >                                         const char *device,
> >                                         const char *name,
> > @@ -225,6 +290,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> >      s->vhost_vdpa.shadow_vqs_enabled = svq;
> >      s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
> >                                NULL;
> > +    if (!is_datapath) {
> > +        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
> > +    }
> >      ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
> >      if (ret) {
> >          if (iova_tree) {
> > @@ -315,9 +383,15 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >      }
> >      if (opts->x_svq) {
> >          struct vhost_vdpa_iova_range iova_range;
> > -
> > -        if (has_cvq) {
> > -            error_setg(errp, "vdpa svq does not work with cvq");
> > +        uint64_t invalid_dev_features =
> > +            features & ~vdpa_svq_device_features &
> > +            /* Transport are all accepted at this point */
> > +            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
> > +                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
> > +
> > +        if (invalid_dev_features) {
> > +            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
> > +                       invalid_dev_features);
> >              goto err_svq;
> >          }
> >          vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > --
> > 2.27.0
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject
  2022-04-14  9:09   ` Jason Wang
@ 2022-04-18 13:58     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-18 13:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Apr 14, 2022 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > This allows qemu to inject packets to the device without guest's notice.

s/packets/buffers/ actually.

>
> Does it mean it can support guests without _F_ANNOUNCE?
>

Technically it is possible. We could inject the packet using this to
data virtqueues, but that implies that they start in SVQ mode
actually.

Once we have a way to transition from/to shadow virtqueue dynamically,
data virtqueues could start shadowed, the gratuitous ARP can be sent,
and then we can move to passthrough mode again.

> >
> > This will be use to inject net CVQ messages to restore status in the destination
>
> I guess for restoring, we should set cvq.ready = true but all other
> (TX/RX) as false before we complete the restoring? If yes, I don't see
> codes to do that.
>

Right, that will be ready for the next version.

> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/virtio/vhost-shadow-virtqueue.h |   5 +
> >  hw/virtio/vhost-shadow-virtqueue.c | 179 +++++++++++++++++++++++++----
> >  2 files changed, 160 insertions(+), 24 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index e06ac52158..2a5229e77f 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -17,6 +17,9 @@
> >
> >  typedef struct SVQElement {
> >      VirtQueueElement elem;
> > +    hwaddr in_iova;
> > +    hwaddr out_iova;
> > +    bool not_from_guest;
>
> Let's add a comment for those fields.
>

Sure.

> >  } SVQElement;
> >
> >  typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
> > @@ -106,6 +109,8 @@ typedef struct VhostShadowVirtqueue {
> >
> >  bool vhost_svq_valid_features(uint64_t features, Error **errp);
> >
> > +bool vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> > +                      size_t out_num, size_t in_num);
> >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> >  void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
> >  void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index 87980e2a9c..f3600df133 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -16,6 +16,7 @@
> >  #include "qemu/log.h"
> >  #include "qemu/memalign.h"
> >  #include "linux-headers/linux/vhost.h"
> > +#include "qemu/iov.h"
> >
> >  /**
> >   * Validate the transport device features that both guests can use with the SVQ
> > @@ -122,7 +123,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> >      return true;
> >  }
> >
> > -static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> > +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
> > +                                        SVQElement *svq_elem, hwaddr *sg,
> >                                          const struct iovec *iovec, size_t num,
> >                                          bool more_descs, bool write)
> >  {
> > @@ -130,15 +132,39 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> >      unsigned n;
> >      uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> >      vring_desc_t *descs = svq->vring.desc;
> > -    bool ok;
> >
> >      if (num == 0) {
> >          return true;
> >      }
> >
> > -    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> > -    if (unlikely(!ok)) {
> > -        return false;
> > +    if (svq_elem->not_from_guest) {
> > +        DMAMap map = {
> > +            .translated_addr = (hwaddr)iovec->iov_base,
> > +            .size = ROUND_UP(iovec->iov_len, 4096) - 1,
> > +            .perm = write ? IOMMU_RW : IOMMU_RO,
> > +        };
> > +        int r;
> > +
> > +        if (unlikely(num != 1)) {
> > +            error_report("Unexpected chain of element injected");
> > +            return false;
> > +        }
> > +        r = vhost_iova_tree_map_alloc(svq->iova_tree, &map);
> > +        if (unlikely(r != IOVA_OK)) {
> > +            error_report("Cannot map injected element");
> > +            return false;
> > +        }
> > +
> > +        r = svq->map_ops->map(map.iova, map.size + 1,
> > +                              (void *)map.translated_addr, !write,
> > +                              svq->map_ops_opaque);
> > +        assert(r == 0);
> > +        sg[0] = map.iova;
> > +    } else {
> > +        bool ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> > +        if (unlikely(!ok)) {
> > +            return false;
> > +        }
> >      }
> >
> >      for (n = 0; n < num; n++) {
> > @@ -166,7 +192,8 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
> >      unsigned avail_idx;
> >      vring_avail_t *avail = svq->vring.avail;
> >      bool ok;
> > -    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > +    g_autofree hwaddr *sgs = NULL;
> > +    hwaddr *in_sgs, *out_sgs;
> >
> >      *head = svq->free_head;
> >
> > @@ -177,15 +204,23 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement *svq_elem,
> >          return false;
> >      }
> >
> > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
> > -                                     elem->in_num > 0, false);
> > +    if (!svq_elem->not_from_guest) {
> > +        sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
> > +        in_sgs = out_sgs = sgs;
> > +    } else {
> > +        in_sgs = &svq_elem->in_iova;
> > +        out_sgs = &svq_elem->out_iova;
> > +    }
> > +    ok = vhost_svq_vring_write_descs(svq, svq_elem, out_sgs, elem->out_sg,
> > +                                     elem->out_num, elem->in_num > 0, false);
> >      if (unlikely(!ok)) {
> >          return false;
> >      }
> >
> > -    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
> > -                                     true);
> > +    ok = vhost_svq_vring_write_descs(svq, svq_elem, in_sgs, elem->in_sg,
> > +                                     elem->in_num, false, true);
> >      if (unlikely(!ok)) {
> > +        /* TODO unwind out_sg */
> >          return false;
> >      }
> >
> > @@ -230,6 +265,43 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> >      event_notifier_set(&svq->hdev_kick);
> >  }
> >
> > +bool vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
> > +                      size_t out_num, size_t in_num)
> > +{
> > +    size_t out_size = iov_size(iov, out_num);
> > +    size_t out_buf_size = ROUND_UP(out_size, 4096);
> > +    size_t in_size = iov_size(iov + out_num, in_num);
> > +    size_t in_buf_size = ROUND_UP(in_size, 4096);
> > +    SVQElement *svq_elem;
> > +    uint16_t num_slots = (in_num ? 1 : 0) + (out_num ? 1 : 0);
> > +
> > +    if (unlikely(num_slots == 0 || svq->next_guest_avail_elem ||
> > +                 vhost_svq_available_slots(svq) < num_slots)) {
> > +        return false;
> > +    }
> > +
> > +    svq_elem = virtqueue_alloc_element(sizeof(SVQElement), 1, 1);
> > +    if (out_num) {
> > +        void *out = qemu_memalign(4096, out_buf_size);
> > +        svq_elem->elem.out_sg[0].iov_base = out;
> > +        svq_elem->elem.out_sg[0].iov_len = out_size;
> > +        iov_to_buf(iov, out_num, 0, out, out_size);
> > +        memset(out + out_size, 0, out_buf_size - out_size);
> > +    }
> > +    if (in_num) {
> > +        void *in = qemu_memalign(4096, in_buf_size);
> > +        svq_elem->elem.in_sg[0].iov_base = in;
> > +        svq_elem->elem.in_sg[0].iov_len = in_size;
> > +        memset(in, 0, in_buf_size);
> > +    }
> > +
> > +    svq_elem->not_from_guest = true;
> > +    vhost_svq_add(svq, svq_elem);
> > +    vhost_svq_kick(svq);
> > +
>
> Should we wait for the completion before moving forward? Otherwise we
> will have a race.
>
> And if we wait for the completion (e.g doing busy polling), I think we
> can avoid the auxiliary structures like
> in_iova/out_iova/not_from_guest by doing mapping before
> vhost_svq_add() to keep it clean.
>

Ok I can move it to that model.

Thanks!

> Thanks
>
> > +    return true;
> > +}
> > +
> >  /**
> >   * Forward available buffers.
> >   *
> > @@ -267,6 +339,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> >                  break;
> >              }
> >
> > +            svq_elem->not_from_guest = false;
> >              elem = &svq_elem->elem;
> >              if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
> >                  /*
> > @@ -391,6 +464,31 @@ static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
> >      return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> >  }
> >
> > +static int vhost_svq_unmap(VhostShadowVirtqueue *svq, hwaddr iova, size_t size)
> > +{
> > +    DMAMap needle = {
> > +        .iova = iova,
> > +        .size = size,
> > +    };
> > +    const DMAMap *overlap;
> > +
> > +    while ((overlap = vhost_iova_tree_find(svq->iova_tree, &needle))) {
> > +        DMAMap needle = *overlap;
> > +
> > +        if (svq->map_ops->unmap) {
> > +            int r = svq->map_ops->unmap(overlap->iova, overlap->size + 1,
> > +                                        svq->map_ops_opaque);
> > +            if (unlikely(r != 0)) {
> > +                return r;
> > +            }
> > +        }
> > +        qemu_vfree((void *)overlap->translated_addr);
> > +        vhost_iova_tree_remove(svq->iova_tree, &needle);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >                              bool check_for_avail_queue)
> >  {
> > @@ -410,23 +508,56 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >              }
> >
> >              elem = &svq_elem->elem;
> > -            if (unlikely(i >= svq->vring.num)) {
> > -                qemu_log_mask(LOG_GUEST_ERROR,
> > -                         "More than %u used buffers obtained in a %u size SVQ",
> > -                         i, svq->vring.num);
> > -                virtqueue_fill(vq, elem, len, i);
> > -                virtqueue_flush(vq, i);
> > -                return;
> > -            }
> > -            virtqueue_fill(vq, elem, len, i++);
> > -
> >              if (svq->ops && svq->ops->used_elem_handler) {
> >                  svq->ops->used_elem_handler(svq->vdev, elem);
> >              }
> > +
> > +            if (svq_elem->not_from_guest) {
> > +                if (unlikely(!elem->out_num && elem->out_num != 1)) {
> > +                    error_report("Unexpected out_num > 1");
> > +                    return;
> > +                }
> > +
> > +                if (elem->out_num) {
> > +                    int r = vhost_svq_unmap(svq, svq_elem->out_iova,
> > +                                            elem->out_sg[0].iov_len);
> > +                    if (unlikely(r != 0)) {
> > +                        error_report("Cannot unmap out buffer");
> > +                        return;
> > +                    }
> > +                }
> > +
> > +                if (unlikely(!elem->in_num && elem->in_num != 1)) {
> > +                    error_report("Unexpected in_num > 1");
> > +                    return;
> > +                }
> > +
> > +                if (elem->in_num) {
> > +                    int r = vhost_svq_unmap(svq, svq_elem->in_iova,
> > +                                            elem->in_sg[0].iov_len);
> > +                    if (unlikely(r != 0)) {
> > +                        error_report("Cannot unmap out buffer");
> > +                        return;
> > +                    }
> > +                }
> > +            } else {
> > +                if (unlikely(i >= svq->vring.num)) {
> > +                    qemu_log_mask(
> > +                        LOG_GUEST_ERROR,
> > +                        "More than %u used buffers obtained in a %u size SVQ",
> > +                        i, svq->vring.num);
> > +                    virtqueue_fill(vq, elem, len, i);
> > +                    virtqueue_flush(vq, i);
> > +                    return;
> > +                }
> > +                virtqueue_fill(vq, elem, len, i++);
> > +            }
> >          }
> >
> > -        virtqueue_flush(vq, i);
> > -        event_notifier_set(&svq->svq_call);
> > +        if (i > 0) {
> > +            virtqueue_flush(vq, i);
> > +            event_notifier_set(&svq->svq_call);
> > +        }
> >
> >          if (check_for_avail_queue && svq->next_guest_avail_elem) {
> >              /*
> > @@ -590,13 +721,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >      for (unsigned i = 0; i < svq->vring.num; ++i) {
> >          g_autofree SVQElement *svq_elem = NULL;
> >          svq_elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > -        if (svq_elem) {
> > +        if (svq_elem && !svq_elem->not_from_guest) {
> >              virtqueue_detach_element(svq->vq, &svq_elem->elem, 0);
> >          }
> >      }
> >
> >      next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > -    if (next_avail_elem) {
> > +    if (next_avail_elem && !next_avail_elem->not_from_guest) {
> >          virtqueue_detach_element(svq->vq, &next_avail_elem->elem, 0);
> >      }
> >      svq->vq = NULL;
> > --
> > 2.27.0
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device
  2022-04-14  9:10   ` Jason Wang
@ 2022-04-18 14:03     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-18 14:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Apr 14, 2022 at 12:33 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > We can configure ASID per group, but we still use asid 0 for every vdpa
> > device. Multiple asid support for cvq will be introduced in next
> > patches
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  include/hw/virtio/vhost.h |  4 ++
> >  hw/net/vhost_net.c        |  5 +++
> >  hw/virtio/vhost-vdpa.c    | 95 ++++++++++++++++++++++++++++++++-------
> >  net/vhost-vdpa.c          |  4 +-
> >  hw/virtio/trace-events    |  9 ++--
> >  5 files changed, 94 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 034868fa9e..640cf82168 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -76,8 +76,12 @@ struct vhost_dev {
> >      int vq_index;
> >      /* one past the last vq index for the virtio device (not vhost) */
> >      int vq_index_end;
> > +    /* one past the last vq index of this virtqueue group */
> > +    int vq_group_index_end;
> >      /* if non-zero, minimum required value for max_queues */
> >      int num_queues;
> > +    /* address space id */
>
> Instead of doing shortcuts like this, I think we need to have
> abstraction as what kernel did. That is to say, introduce structures
> like:
>
> struct vhost_vdpa_dev_group;
> struct vhost_vdpa_as;
>
> Then having pointers to those structures like
>
> struct vhost_vdpa {
>         ...
>         struct vhost_vdpa_dev_group *group;
> };
>
> struct vhost_vdpa_group {
>         ...
>         uint32_t id;
>         struct vhost_vdpa_as;
> };
>
> struct vhost_vdpa_as {
>         uint32_t id;
>         MemoryListener listener;
> };
>
> We can read the group topology during initialization and allocate the
> structure accordingly. If the CVQ has its own group:
>
> 1) We know we will have 2 AS otherwise 1 AS
> 2) allocate #AS and attach the group to the corresponding AS
>
> Then we know the
>
> 1) map/unmap and listener is done per as instead of per group or vdpa.
> 2) AS attach/detach is done per group
>
> And it would simplify the future extension when we want to advertise
> the as/groups to guests.
>
> To simplify the reviewing, we can introduce the above concept before
> the ASID uAPIs and assume a 1 group 1 as a model as a start.
>

I think it's doable, let me refactor the code that way and I'll come
back with the results.

Thanks!

> Thanks
>
> > +    uint32_t address_space_id;
> >      /* Must be a vq group different than any other vhost dev */
> >      bool independent_vq_group;
> >      uint64_t features;
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 10480e19e5..a34df739a7 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -344,15 +344,20 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >
> >      for (i = 0; i < nvhosts; i++) {
> >          bool cvq_idx = i >= data_queue_pairs;
> > +        uint32_t vq_group_end;
> >
> >          if (!cvq_idx) {
> >              peer = qemu_get_peer(ncs, i);
> > +            vq_group_end = 2 * data_queue_pairs;
> >          } else { /* Control Virtqueue */
> >              peer = qemu_get_peer(ncs, n->max_queue_pairs);
> > +            vq_group_end = 2 * data_queue_pairs + 1;
> >          }
> >
> >          net = get_vhost_net(peer);
> > +        net->dev.address_space_id = !!cvq_idx;
> >          net->dev.independent_vq_group = !!cvq_idx;
> > +        net->dev.vq_group_index_end = vq_group_end;
> >          vhost_net_set_vq_index(net, i * 2, index_end);
> >
> >          /* Suppress the masking guest notifiers on vhost user
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 4096555242..5ed211287c 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -79,6 +79,9 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> >      int ret = 0;
> >
> >      msg.type = v->msg_type;
> > +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
> > +        msg.asid = v->dev->address_space_id;
> > +    }
> >      msg.iotlb.iova = iova;
> >      msg.iotlb.size = size;
> >      msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
> > @@ -90,8 +93,9 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> >          return 0;
> >      }
> >
> > -   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
> > -                            msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
> > +    trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> > +                             msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
> > +                             msg.iotlb.type);
> >
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> >          error_report("failed to write, fd=%d, errno=%d (%s)",
> > @@ -109,6 +113,9 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> >      int fd = v->device_fd;
> >      int ret = 0;
> >
> > +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
> > +        msg.asid = v->dev->address_space_id;
> > +    }
> >      msg.type = v->msg_type;
> >      msg.iotlb.iova = iova;
> >      msg.iotlb.size = size;
> > @@ -119,7 +126,7 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> >          return 0;
> >      }
> >
> > -    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
> > +    trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
> >                                 msg.iotlb.size, msg.iotlb.type);
> >
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > @@ -134,6 +141,7 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> >  static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >  {
> >      struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > +    struct vhost_dev *dev = v->dev;
> >      struct vhost_msg_v2 msg = {};
> >      int fd = v->device_fd;
> >      size_t num = v->iotlb_updates->len;
> > @@ -142,9 +150,14 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >          return;
> >      }
> >
> > +    if (dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_ASID)) {
> > +        msg.asid = v->dev->address_space_id;
> > +    }
> > +
> >      msg.type = v->msg_type;
> >      msg.iotlb.type = VHOST_IOTLB_BATCH_BEGIN;
> > -    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
> > +    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.asid,
> > +                                          msg.iotlb.type);
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> >          error_report("failed to write BEGIN_BATCH, fd=%d, errno=%d (%s)",
> >                       fd, errno, strerror(errno));
> > @@ -162,7 +175,8 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >      }
> >
> >      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> > -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> > +    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.asid,
> > +                                     msg.iotlb.type);
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> >          error_report("failed to write, fd=%d, errno=%d (%s)",
> >                       fd, errno, strerror(errno));
> > @@ -1171,10 +1185,48 @@ call_err:
> >      return false;
> >  }
> >
> > +static int vhost_vdpa_set_vq_group_address_space_id(struct vhost_dev *dev,
> > +                                                struct vhost_vring_state *asid)
> > +{
> > +    trace_vhost_vdpa_set_vq_group_address_space_id(dev, asid->index, asid->num);
> > +    return vhost_vdpa_call(dev, VHOST_VDPA_SET_GROUP_ASID, asid);
> > +}
> > +
> > +static int vhost_vdpa_set_address_space_id(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vring_state vq_group = {
> > +        .index = dev->vq_index,
> > +    };
> > +    struct vhost_vring_state asid;
> > +    int ret;
> > +
> > +    if (!dev->address_space_id) {
> > +        return 0;
> > +    }
> > +
> > +    ret = vhost_vdpa_get_vring_group(dev, &vq_group);
> > +    if (unlikely(ret)) {
> > +        error_report("Can't read vq group, errno=%d (%s)", ret,
> > +                     g_strerror(-ret));
> > +        return ret;
> > +    }
> > +
> > +    asid.index = vq_group.num;
> > +    asid.num = dev->address_space_id;
> > +    ret = vhost_vdpa_set_vq_group_address_space_id(dev, &asid);
> > +    if (unlikely(ret)) {
> > +        error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> > +            asid.index, asid.num, ret, g_strerror(-ret));
> > +    }
> > +    return ret;
> > +}
> > +
> >  static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >  {
> >      struct vhost_vdpa *v = dev->opaque;
> > -    bool ok;
> > +    bool vq_group_end, ok;
> > +    int r = 0;
> > +
> >      trace_vhost_vdpa_dev_start(dev, started);
> >
> >      if (started) {
> > @@ -1183,6 +1235,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >              !vhost_dev_is_independent_group(dev)) {
> >              return -1;
> >          }
> > +        r = vhost_vdpa_set_address_space_id(dev);
> > +        if (unlikely(r)) {
> > +            return r;
> > +        }
> >          ok = vhost_vdpa_svqs_start(dev);
> >          if (unlikely(!ok)) {
> >              return -1;
> > @@ -1196,21 +1252,26 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >          vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >      }
> >
> > -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > -        return 0;
> > +    vq_group_end = dev->vq_index + dev->nvqs == dev->vq_group_index_end;
> > +    if (vq_group_end && started) {
> > +        memory_listener_register(&v->listener, &address_space_memory);
> >      }
> >
> > -    if (started) {
> > -        memory_listener_register(&v->listener, &address_space_memory);
> > -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > -    } else {
> > -        vhost_vdpa_reset_device(dev);
> > -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > -                                   VIRTIO_CONFIG_S_DRIVER);
> > -        memory_listener_unregister(&v->listener);
> > +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > +        if (started) {
> > +            r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +        } else {
> > +            vhost_vdpa_reset_device(dev);
> > +            vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > +                                       VIRTIO_CONFIG_S_DRIVER);
> > +        }
> > +    }
> >
> > -        return 0;
> > +    if (vq_group_end && !started) {
> > +        memory_listener_unregister(&v->listener);
> >      }
> > +
> > +    return r;
> >  }
> >
> >  static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 15c3e4f703..a6f803ea4e 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -473,8 +473,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >
> >      if (has_cvq) {
> >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> > -                                 vdpa_device_fd, i, 1, false, opts->x_svq,
> > -                                 iova_tree);
> > +                                 vdpa_device_fd, i, 1,
> > +                                 false, opts->x_svq, iova_tree);
> >          if (!nc)
> >              goto err;
> >      }
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index e6fdc03514..2858deac60 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -23,10 +23,10 @@ vhost_user_postcopy_waker_found(uint64_t client_addr) "0x%"PRIx64
> >  vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%"PRIx64
> >
> >  # vhost-vdpa.c
> > -vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > -vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > -vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > +vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > +vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > +vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
> > +vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint32_t asid, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" type: %"PRIu8
> >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> >  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
> >  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> > @@ -44,6 +44,7 @@ vhost_vdpa_dump_config(void *dev, const char *line) "dev: %p %s"
> >  vhost_vdpa_set_config(void *dev, uint32_t offset, uint32_t size, uint32_t flags) "dev: %p offset: %"PRIu32" size: %"PRIu32" flags: 0x%"PRIx32
> >  vhost_vdpa_get_config(void *dev, void *config, uint32_t config_len) "dev: %p config: %p config_len: %"PRIu32
> >  vhost_vdpa_get_vring_group(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> > +vhost_vdpa_set_vq_group_address_space_id(void *dev, unsigned int index, unsigned int num) "dev: %p index: %u num: %u"
> >  vhost_vdpa_dev_start(void *dev, bool started) "dev: %p started: %d"
> >  vhost_vdpa_set_log_base(void *dev, uint64_t base, unsigned long long size, int refcnt, int fd, void *log) "dev: %p base: 0x%"PRIx64" size: %llu refcnt: %d fd: %d log: %p"
> >  vhost_vdpa_set_vring_addr(void *dev, unsigned int index, unsigned int flags, uint64_t desc_user_addr, uint64_t used_user_addr, uint64_t avail_user_addr, uint64_t log_guest_addr) "dev: %p index: %u flags: 0x%x desc_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" log_guest_addr: 0x%"PRIx64
> > --
> > 2.27.0
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq
  2022-04-14  9:09   ` Jason Wang
@ 2022-04-18 14:16     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-18 14:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-devel, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 11:10 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Apr 14, 2022 at 12:33 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > This isolates shadow cvq in its own group.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  qapi/net.json    |  8 +++-
> >  net/vhost-vdpa.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 100 insertions(+), 6 deletions(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 92848e4362..39c245e6cd 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -447,9 +447,12 @@
> >  #
> >  # @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
> >  #         (default: false)
> > +# @x-cvq-svq: Start device with (experimental) shadow virtqueue in its own
> > +#             virtqueue group. (Since 7.1)
> > +#             (default: false)
> >  #
> >  # Features:
> > -# @unstable: Member @x-svq is experimental.
> > +# @unstable: Members @x-svq and x-cvq-svq are experimental.
> >  #
> >  # Since: 5.1
> >  ##
> > @@ -457,7 +460,8 @@
> >    'data': {
> >      '*vhostdev':     'str',
> >      '*queues':       'int',
> > -    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
> > +    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] },
> > +    '*x-cvq-svq':    {'type': 'bool', 'features' : [ 'unstable'] } } }
> >
> >  ##
> >  # @NetClientDriver:
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index a6f803ea4e..851dacb902 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -377,6 +377,17 @@ static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
> >      return ret;
> >  }
> >
> > +static int vhost_vdpa_get_backend_features(int fd, uint64_t *features,
> > +                                           Error **errp)
> > +{
> > +    int ret = ioctl(fd, VHOST_GET_BACKEND_FEATURES, features);
> > +    if (ret) {
> > +        error_setg_errno(errp, errno,
> > +            "Fail to query backend features from vhost-vDPA device");
> > +    }
> > +    return ret;
> > +}
> > +
> >  static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
> >                                            int *has_cvq, Error **errp)
> >  {
> > @@ -410,16 +421,56 @@ static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
> >      return 1;
> >  }
> >
> > +/**
> > + * Check vdpa device to support CVQ group asid 1
> > + *
> > + * @vdpa_device_fd: Vdpa device fd
> > + * @queue_pairs: Queue pairs
> > + * @errp: Error
> > + */
> > +static int vhost_vdpa_check_cvq_svq(int vdpa_device_fd, int queue_pairs,
> > +                                    Error **errp)
> > +{
> > +    uint64_t backend_features;
> > +    unsigned num_as;
> > +    int r;
> > +
> > +    r = vhost_vdpa_get_backend_features(vdpa_device_fd, &backend_features,
> > +                                        errp);
> > +    if (unlikely(r)) {
> > +        return -1;
> > +    }
> > +
> > +    if (unlikely(!(backend_features & VHOST_BACKEND_F_IOTLB_ASID))) {
> > +        error_setg(errp, "Device without IOTLB_ASID feature");
> > +        return -1;
> > +    }
> > +
> > +    r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, &num_as);
> > +    if (unlikely(r)) {
> > +        error_setg_errno(errp, errno,
> > +                         "Cannot retrieve number of supported ASs");
> > +        return -1;
> > +    }
> > +    if (unlikely(num_as < 2)) {
> > +        error_setg(errp, "Insufficient number of ASs (%u, min: 2)", num_as);
> > +    }
> > +
>
> This is not sufficient, we still need to check whether CVQ doesn't
> share a group with other virtqueues.
>

That is done at vhost-vdpa.c:vhost_dev_is_independent_group. This is
because we don't know the cvq index at this moment: Since the guest
still has not acked features, we don't know if cvq is at index 2 or is
the last one the device offers.

> Thanks
>
> > +    return 0;
> > +}
> > +
> >  int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >                          NetClientState *peer, Error **errp)
> >  {
> >      const NetdevVhostVDPAOptions *opts;
> > +    struct vhost_vdpa_iova_range iova_range;
> >      uint64_t features;
> >      int vdpa_device_fd;
> >      g_autofree NetClientState **ncs = NULL;
> >      NetClientState *nc;
> >      int queue_pairs, r, i, has_cvq = 0;
> >      g_autoptr(VhostIOVATree) iova_tree = NULL;
> > +    ERRP_GUARD();
> >
> >      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >      opts = &netdev->u.vhost_vdpa;
> > @@ -444,8 +495,9 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >          qemu_close(vdpa_device_fd);
> >          return queue_pairs;
> >      }
> > -    if (opts->x_svq) {
> > -        struct vhost_vdpa_iova_range iova_range;
> > +    if (opts->x_cvq_svq || opts->x_svq) {
> > +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > +
> >          uint64_t invalid_dev_features =
> >              features & ~vdpa_svq_device_features &
> >              /* Transport are all accepted at this point */
> > @@ -457,7 +509,21 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >                         invalid_dev_features);
> >              goto err_svq;
> >          }
> > -        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
> > +    }
> > +
> > +    if (opts->x_cvq_svq) {
> > +        if (!has_cvq) {
> > +            error_setg(errp, "Cannot use x-cvq-svq with a device without cvq");
> > +            goto err_svq;
> > +        }
> > +
> > +        r = vhost_vdpa_check_cvq_svq(vdpa_device_fd, queue_pairs, errp);
> > +        if (unlikely(r)) {
> > +            error_prepend(errp, "Cannot configure CVQ SVQ: ");
> > +            goto err_svq;
> > +        }
> > +    }
> > +    if (opts->x_svq) {
> >          iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
> >      }
> >
> > @@ -472,11 +538,35 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> >      }
> >
> >      if (has_cvq) {
> > +        g_autoptr(VhostIOVATree) cvq_iova_tree = NULL;
> > +
> > +        if (opts->x_cvq_svq) {
> > +            cvq_iova_tree = vhost_iova_tree_new(iova_range.first,
> > +                                                iova_range.last);
> > +        } else if (opts->x_svq) {
> > +            cvq_iova_tree = vhost_iova_tree_acquire(iova_tree);
> > +        }
> > +
> >          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >                                   vdpa_device_fd, i, 1,
> > -                                 false, opts->x_svq, iova_tree);
> > +                                 false, opts->x_cvq_svq || opts->x_svq,
> > +                                 cvq_iova_tree);
> >          if (!nc)
> >              goto err;
> > +
> > +        if (opts->x_cvq_svq) {
> > +            struct vhost_vring_state asid = {
> > +                .index = 1,
> > +                .num = 1,
> > +            };
> > +
> > +            r = ioctl(vdpa_device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> > +            if (unlikely(r)) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Cannot set cvq group independent asid");
> > +                goto err;
> > +            }
> > +        }
> >      }
> >
> >      return 0;
> > --
> > 2.27.0
> >
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit
  2022-04-14  4:11   ` Jason Wang
@ 2022-04-22  9:17     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-22  9:17 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-level, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 6:12 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/4/14 00:31, Eugenio Pérez 写道:
> > With the introduction of many ASID it can happen that many changes on
> > different listeners come before the commit call.
>
>
> I think we have at most one listener even for the case of MQ/CVQ?
>

In this series we will have one listener per ASID used. To do it
differently requires either not sending all the guest space to CVQ (I
find it ok although it may be slower and a little bit trickier) or to
make the last vhost_vdpa device model to be aware of all others ASID.

>
> >   Since kernel vhost-vdpa
> > still does not support it, send it all in one shot.
> >
> > This also have one extra advantage: If there is no update to notify, we
> > save the iotlb_{begin,end} calls.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   include/hw/virtio/vhost-vdpa.h |  2 +-
> >   hw/virtio/vhost-vdpa.c         | 69 +++++++++++++++++-----------------
> >   2 files changed, 36 insertions(+), 35 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index a29dbb3f53..4961acea8b 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -27,7 +27,7 @@ typedef struct vhost_vdpa {
> >       int device_fd;
> >       int index;
> >       uint32_t msg_type;
> > -    bool iotlb_batch_begin_sent;
> > +    GArray *iotlb_updates;
> >       MemoryListener listener;
> >       struct vhost_vdpa_iova_range iova_range;
> >       uint64_t acked_features;
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 1f229ff4cb..27ee678dc9 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -85,6 +85,11 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> >       msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
> >       msg.iotlb.type = VHOST_IOTLB_UPDATE;
> >
> > +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_BATCH)) {
> > +        g_array_append_val(v->iotlb_updates, msg);
> > +        return 0;
> > +    }
>
>
> I think it's better to use a consistent way for !batch and batch (E.g we
> can do this even for the backend that doesn't support batching?)
>

Yes, with no batching it simply sends all the batch here instead of in commit.

> Otherwise the codes are hard to be maintained.
>

I think it's hard to come back to the previous model as long as we
need two listeners. I can try to remove the need of the asid 1
listener, but if we're not able the possibility for this is to always
delay the maps to memory listener commit callback then.

>
> > +
> >      trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
> >                               msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
> >
> > @@ -109,6 +114,11 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> >       msg.iotlb.size = size;
> >       msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
> >
> > +    if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_BATCH)) {
> > +        g_array_append_val(v->iotlb_updates, msg);
> > +        return 0;
> > +    }
> > +
> >       trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
> >                                  msg.iotlb.size, msg.iotlb.type);
> >
> > @@ -121,56 +131,47 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> >       return ret;
> >   }
> >
> > -static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
> > -{
> > -    int fd = v->device_fd;
> > -    struct vhost_msg_v2 msg = {
> > -        .type = v->msg_type,
> > -        .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
> > -    };
> > -
> > -    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
> > -    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > -        error_report("failed to write, fd=%d, errno=%d (%s)",
> > -                     fd, errno, strerror(errno));
> > -    }
> > -}
> > -
> > -static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> > -{
> > -    if (v->dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH) &&
> > -        !v->iotlb_batch_begin_sent) {
> > -        vhost_vdpa_listener_begin_batch(v);
> > -    }
> > -
> > -    v->iotlb_batch_begin_sent = true;
> > -}
> > -
> >   static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >   {
> >       struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > -    struct vhost_dev *dev = v->dev;
> >       struct vhost_msg_v2 msg = {};
> >       int fd = v->device_fd;
> > +    size_t num = v->iotlb_updates->len;
> >
> > -    if (!(dev->backend_cap & (0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH))) {
> > +    if (!num) {
> >           return;
> >       }
> >
> > -    if (!v->iotlb_batch_begin_sent) {
> > -        return;
> > +    msg.type = v->msg_type;
> > +    msg.iotlb.type = VHOST_IOTLB_BATCH_BEGIN;
> > +    trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
> > +    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>
>
> We need check whehter the vhost-vDPA support batching first?
>

If it's not supported, num == 0.

>
> > +        error_report("failed to write BEGIN_BATCH, fd=%d, errno=%d (%s)",
> > +                     fd, errno, strerror(errno));
> > +        goto done;
> >       }
> >
> > -    msg.type = v->msg_type;
> > -    msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> > +    for (size_t i = 0; i < num; ++i) {
> > +        struct vhost_msg_v2 *update = &g_array_index(v->iotlb_updates,
> > +                                                     struct vhost_msg_v2, i);
> > +        if (write(fd, update, sizeof(*update)) != sizeof(*update)) {
> > +            error_report("failed to write dma update, fd=%d, errno=%d (%s)",
> > +                         fd, errno, strerror(errno));
> > +            goto done;
>
>
> Maybe it's time to introduce v3 to allow a batch of messaged to be
> passed to vhost-vDPA in a single system call.
>

It would be nice but then we're not solving the problem for pre-v3 kernels.

Thanks!

> Thanks
>
>
> > +        }
> > +    }
> >
> > +    msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> >       trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> >       if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> >           error_report("failed to write, fd=%d, errno=%d (%s)",
> >                        fd, errno, strerror(errno));
> >       }
> >
> > -    v->iotlb_batch_begin_sent = false;
> > +done:
> > +    g_array_set_size(v->iotlb_updates, 0);
> > +    return;
> > +
> >   }
> >
> >   static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > @@ -227,7 +228,6 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> >           iova = mem_region.iova;
> >       }
> >
> > -    vhost_vdpa_iotlb_batch_begin_once(v);
> >       ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> >                                vaddr, section->readonly);
> >       if (ret) {
> > @@ -292,7 +292,6 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> >           iova = result->iova;
> >           vhost_iova_tree_remove(v->iova_tree, &mem_region);
> >       }
> > -    vhost_vdpa_iotlb_batch_begin_once(v);
> >       ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> >       if (ret) {
> >           error_report("vhost_vdpa dma unmap error!");
> > @@ -446,6 +445,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> >       dev->opaque =  opaque ;
> >       v->listener = vhost_vdpa_memory_listener;
> >       v->msg_type = VHOST_IOTLB_MSG_V2;
> > +    v->iotlb_updates = g_array_new(false, false, sizeof(struct vhost_msg_v2));
> >       ret = vhost_vdpa_init_svq(dev, v, errp);
> >       if (ret) {
> >           goto err;
> > @@ -579,6 +579,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> >       trace_vhost_vdpa_cleanup(dev, v);
> >       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> >       memory_listener_unregister(&v->listener);
> > +    g_array_free(v->iotlb_updates, true);
> >       vhost_vdpa_svq_cleanup(dev);
> >
> >       dev->opaque = NULL;
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ
  2022-04-14  3:48   ` Jason Wang
@ 2022-04-22 14:16     ` Eugenio Perez Martin
  0 siblings, 0 replies; 50+ messages in thread
From: Eugenio Perez Martin @ 2022-04-22 14:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	qemu-level, Gautam Dawar, Harpreet Singh Anand, Gonglei (Arei),
	Eli Cohen, Liuxiangdong, Zhu Lingshan

On Thu, Apr 14, 2022 at 5:48 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/4/14 00:31, Eugenio Pérez 写道:
> > Only the first one of them were properly enqueued back.
>
>
> I wonder if it's better to use two patches:
>
> 1) using private chain
>
> 2) fix the chain issue
>
> Patch looks good itself.
>
> Thanks
>

Sure, it can be done that way for the next version.

Thanks!

>
> >
> > While we're at it, harden SVQ: The device could have access to modify
> > them, and it definitely have access when we implement packed vq. Harden
> > SVQ maintaining a private copy of the descriptor chain. Other fields
> > like buffer addresses are already maintained sepparatedly.
> >
> > Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.h |  6 ++++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 27 +++++++++++++++++++++------
> >   2 files changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index e5e24c536d..c132c994e9 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -53,6 +53,12 @@ typedef struct VhostShadowVirtqueue {
> >       /* Next VirtQueue element that guest made available */
> >       VirtQueueElement *next_guest_avail_elem;
> >
> > +    /*
> > +     * Backup next field for each descriptor so we can recover securely, not
> > +     * needing to trust the device access.
> > +     */
> > +    uint16_t *desc_next;
> > +
> >       /* Next head to expose to the device */
> >       uint16_t shadow_avail_idx;
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index b232803d1b..a2531d5874 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -138,6 +138,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> >       for (n = 0; n < num; n++) {
> >           if (more_descs || (n + 1 < num)) {
> >               descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > +            descs[i].next = cpu_to_le16(svq->desc_next[i]);
> >           } else {
> >               descs[i].flags = flags;
> >           }
> > @@ -145,10 +146,10 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
> >           descs[i].len = cpu_to_le32(iovec[n].iov_len);
> >
> >           last = i;
> > -        i = cpu_to_le16(descs[i].next);
> > +        i = cpu_to_le16(svq->desc_next[i]);
> >       }
> >
> > -    svq->free_head = le16_to_cpu(descs[last].next);
> > +    svq->free_head = le16_to_cpu(svq->desc_next[last]);
> >   }
> >
> >   static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > @@ -333,13 +334,22 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
> >       svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> >   }
> >
> > +static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
> > +                                             uint16_t num, uint16_t i)
> > +{
> > +    for (uint16_t j = 0; j < num; ++j) {
> > +        i = le16_to_cpu(svq->desc_next[i]);
> > +    }
> > +
> > +    return i;
> > +}
> > +
> >   static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> >                                              uint32_t *len)
> >   {
> > -    vring_desc_t *descs = svq->vring.desc;
> >       const vring_used_t *used = svq->vring.used;
> >       vring_used_elem_t used_elem;
> > -    uint16_t last_used;
> > +    uint16_t last_used, last_used_chain, num;
> >
> >       if (!vhost_svq_more_used(svq)) {
> >           return NULL;
> > @@ -365,7 +375,10 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
> >           return NULL;
> >       }
> >
> > -    descs[used_elem.id].next = svq->free_head;
> > +    num = svq->ring_id_maps[used_elem.id]->in_num +
> > +          svq->ring_id_maps[used_elem.id]->out_num;
> > +    last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
> > +    svq->desc_next[last_used_chain] = svq->free_head;
> >       svq->free_head = used_elem.id;
> >
> >       *len = used_elem.len;
> > @@ -540,8 +553,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> >       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> >       memset(svq->vring.used, 0, device_size);
> >       svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
> > +    svq->desc_next = g_new0(uint16_t, svq->vring.num);
> >       for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> > -        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > +        svq->desc_next[i] = cpu_to_le16(i + 1);
> >       }
> >   }
> >
> > @@ -574,6 +588,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >           virtqueue_detach_element(svq->vq, next_avail_elem, 0);
> >       }
> >       svq->vq = NULL;
> > +    g_free(svq->desc_next);
> >       g_free(svq->ring_id_maps);
> >       qemu_vfree(svq->vring.desc);
> >       qemu_vfree(svq->vring.used);
>



^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-04-22 14:18 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 16:31 [RFC PATCH v7 00/25] Net Control VQ support with asid in vDPA SVQ Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 01/25] vhost: Track descriptor chain in private at SVQ Eugenio Pérez
2022-04-14  3:48   ` Jason Wang
2022-04-22 14:16     ` Eugenio Perez Martin
2022-04-13 16:31 ` [RFC PATCH v7 02/25] vdpa: Add missing tracing to batch mapping functions Eugenio Pérez
2022-04-14  3:49   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 03/25] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base Eugenio Pérez
2022-04-14  3:50   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 04/25] util: Return void on iova_tree_remove Eugenio Pérez
2022-04-14  3:50   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 05/25] hw/virtio: Replace g_memdup() by g_memdup2() Eugenio Pérez
2022-04-14  3:51   ` Jason Wang
2022-04-14  4:01   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 06/25] vdpa: Send all updates in memory listener commit Eugenio Pérez
2022-04-14  4:11   ` Jason Wang
2022-04-22  9:17     ` Eugenio Perez Martin
2022-04-13 16:31 ` [RFC PATCH v7 07/25] vhost: Add reference counting to vhost_iova_tree Eugenio Pérez
2022-04-14  5:30   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 08/25] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
2022-04-14  5:32   ` Jason Wang
2022-04-18 10:36     ` Eugenio Perez Martin
2022-04-13 16:31 ` [RFC PATCH v7 09/25] vhost: move descriptor translation to vhost_svq_vring_write_descs Eugenio Pérez
2022-04-14  5:48   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 10/25] vdpa: Fix index calculus at vhost_vdpa_svqs_start Eugenio Pérez
2022-04-14  5:59   ` Jason Wang
2022-04-13 16:31 ` [RFC PATCH v7 11/25] virtio-net: Expose ctrl virtqueue logic Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 12/25] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 13/25] virtio: Make virtqueue_alloc_element non-static Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 14/25] vhost: Add SVQElement Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 15/25] vhost: Add custom used buffer callback Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 16/25] vdpa: control virtqueue support on shadow virtqueue Eugenio Pérez
2022-04-14  9:10   ` Jason Wang
2022-04-18 10:55     ` Eugenio Perez Martin
2022-04-13 16:31 ` [RFC PATCH v7 17/25] vhost: Add vhost_iova_tree_find Eugenio Pérez
2022-04-13 16:31 ` [RFC PATCH v7 18/25] vdpa: Add map/unmap operation callback to SVQ Eugenio Pérez
2022-04-14  9:13   ` Jason Wang
2022-04-13 16:32 ` [RFC PATCH v7 19/25] vhost: Add vhost_svq_inject Eugenio Pérez
2022-04-14  9:09   ` Jason Wang
2022-04-18 13:58     ` Eugenio Perez Martin
2022-04-13 16:32 ` [RFC PATCH v7 20/25] vdpa: add NetClientState->start() callback Eugenio Pérez
2022-04-14  9:14   ` Jason Wang
2022-04-13 16:32 ` [RFC PATCH v7 21/25] vdpa: Add vhost_vdpa_start_control_svq Eugenio Pérez
2022-04-13 16:32 ` [RFC PATCH v7 22/25] vhost: Update kernel headers Eugenio Pérez
2022-04-13 16:32 ` [RFC PATCH v7 23/25] vhost: Make possible to check for device exclusive vq group Eugenio Pérez
2022-04-13 16:32 ` [RFC PATCH v7 24/25] vdpa: Add asid attribute to vdpa device Eugenio Pérez
2022-04-14  9:10   ` Jason Wang
2022-04-18 14:03     ` Eugenio Perez Martin
2022-04-13 16:32 ` [RFC PATCH v7 25/25] vdpa: Add x-cvq-svq Eugenio Pérez
2022-04-14  9:09   ` Jason Wang
2022-04-18 14:16     ` Eugenio Perez Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.