All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 0/4] vhost-vdpa: add support for vIOMMU
@ 2023-03-21 14:23 Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu() Cindy Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Cindy Lu @ 2023-03-21 14:23 UTC (permalink / raw)
  To: lulu, jasowang, mst; +Cc: qemu-devel

These patches are to support vIOMMU in vdpa device

changes in V3
1. Move function vfio_get_xlat_addr to memory.c
2. Use the existing memory listener, while the MR is
iommu MR then call the function iommu_region_add/
iommu_region_del

changes in V4
1.make the comments in vfio_get_xlat_addr more general

changes in V5
1. Address the comments in the last version
2. Add a new arg in the function vfio_get_xlat_addr, which shows whether
the memory is backed by a discard manager. So the device can have its
own warning.

changes in V6
move the error_report for the unpopulated discard back to
memeory_get_xlat_addr

changes in V7
organize the error massage to avoid the duplicate information

changes in V8
Organize the code follow the comments in the last version

changes in V9
Organize the code follow the comments

changes in V10
Address the comments

changes in V11
Address the comments
fix the crash found in test

changes in V12
Address the comments, squash patch 1 into the next patch
improve the code style issue

changes in V13
fail to start if IOMMU and svq enable at same time
improve the code style issue

changes in V14
Address the comments

changes in V15
Address the comments

Cindy Lu (4):
  vhost: expose function vhost_dev_has_iommu()
  vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del()
  vhost-vdpa: Add check for full 64-bit in region delete
  vhost-vdpa: Add support for vIOMMU.

 hw/virtio/trace-events         |   2 +-
 hw/virtio/vhost-vdpa.c         | 182 ++++++++++++++++++++++++++++++---
 hw/virtio/vhost.c              |   2 +-
 include/hw/virtio/vhost-vdpa.h |  11 ++
 include/hw/virtio/vhost.h      |   1 +
 5 files changed, 184 insertions(+), 14 deletions(-)

-- 
2.34.3



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu()
  2023-03-21 14:23 [PATCH v15 0/4] vhost-vdpa: add support for vIOMMU Cindy Lu
@ 2023-03-21 14:23 ` Cindy Lu
  2023-03-23  3:48   ` Jason Wang
  2023-03-21 14:23 ` [PATCH v15 2/4] vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del() Cindy Lu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Cindy Lu @ 2023-03-21 14:23 UTC (permalink / raw)
  To: lulu, jasowang, mst; +Cc: qemu-devel

To support vIOMMU in vdpa, need to exposed the function
vhost_dev_has_iommu, vdpa will use this function to check
if vIOMMU enable.

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/virtio/vhost.c         | 2 +-
 include/hw/virtio/vhost.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a266396576..fd746b085b 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -107,7 +107,7 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
     }
 }
 
-static bool vhost_dev_has_iommu(struct vhost_dev *dev)
+bool vhost_dev_has_iommu(struct vhost_dev *dev)
 {
     VirtIODevice *vdev = dev->vdev;
 
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..f7f10c8fb7 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -336,4 +336,5 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
                            struct vhost_inflight *inflight);
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
                            struct vhost_inflight *inflight);
+bool vhost_dev_has_iommu(struct vhost_dev *dev);
 #endif
-- 
2.34.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 2/4] vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del()
  2023-03-21 14:23 [PATCH v15 0/4] vhost-vdpa: add support for vIOMMU Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu() Cindy Lu
@ 2023-03-21 14:23 ` Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 3/4] vhost-vdpa: Add check for full 64-bit in region delete Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU Cindy Lu
  3 siblings, 0 replies; 11+ messages in thread
From: Cindy Lu @ 2023-03-21 14:23 UTC (permalink / raw)
  To: lulu, jasowang, mst; +Cc: qemu-devel

In trace_vhost_vdpa_listener_region_del, the value for llend
should change to int128_get64(int128_sub(llend, int128_one()))

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index bc6bad23d5..92c2413c76 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -288,7 +288,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
     iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
     llend = vhost_vdpa_section_end(section);
 
-    trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend));
+    trace_vhost_vdpa_listener_region_del(v, iova,
+        int128_get64(int128_sub(llend, int128_one())));
 
     if (int128_ge(int128_make64(iova), llend)) {
         return;
-- 
2.34.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 3/4] vhost-vdpa: Add check for full 64-bit in region delete
  2023-03-21 14:23 [PATCH v15 0/4] vhost-vdpa: add support for vIOMMU Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu() Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 2/4] vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del() Cindy Lu
@ 2023-03-21 14:23 ` Cindy Lu
  2023-03-21 14:23 ` [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU Cindy Lu
  3 siblings, 0 replies; 11+ messages in thread
From: Cindy Lu @ 2023-03-21 14:23 UTC (permalink / raw)
  To: lulu, jasowang, mst; +Cc: qemu-devel

The unmap ioctl doesn't accept a full 64-bit span. So need to
add check for the section's size in vhost_vdpa_listener_region_del().

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 92c2413c76..0c8c37e786 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -316,10 +316,28 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
         vhost_iova_tree_remove(v->iova_tree, *result);
     }
     vhost_vdpa_iotlb_batch_begin_once(v);
+    /*
+     * The unmap ioctl doesn't accept a full 64-bit. need to check it
+     */
+    if (int128_eq(llsize, int128_2_64())) {
+        llsize = int128_rshift(llsize, 1);
+        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
+                                   int128_get64(llsize));
+
+        if (ret) {
+            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+                         "0x%" HWADDR_PRIx ") = %d (%m)",
+                         v, iova, int128_get64(llsize), ret);
+        }
+        iova += int128_get64(llsize);
+    }
     ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
                                int128_get64(llsize));
+
     if (ret) {
-        error_report("vhost_vdpa dma unmap error!");
+        error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+                     "0x%" HWADDR_PRIx ") = %d (%m)",
+                     v, iova, int128_get64(llsize), ret);
     }
 
     memory_region_unref(section->mr);
-- 
2.34.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.
  2023-03-21 14:23 [PATCH v15 0/4] vhost-vdpa: add support for vIOMMU Cindy Lu
                   ` (2 preceding siblings ...)
  2023-03-21 14:23 ` [PATCH v15 3/4] vhost-vdpa: Add check for full 64-bit in region delete Cindy Lu
@ 2023-03-21 14:23 ` Cindy Lu
  2023-03-23  3:47   ` Jason Wang
  3 siblings, 1 reply; 11+ messages in thread
From: Cindy Lu @ 2023-03-21 14:23 UTC (permalink / raw)
  To: lulu, jasowang, mst; +Cc: qemu-devel

1. The vIOMMU support will make vDPA can work in IOMMU mode. This
will fix security issues while using the no-IOMMU mode.
To support this feature we need to add new functions for IOMMU MR adds and
deletes.

Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
the function will return fail.

2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()

Verified in vp_vdpa and vdpa_sim_net driver

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/virtio/trace-events         |   2 +-
 hw/virtio/vhost-vdpa.c         | 159 ++++++++++++++++++++++++++++++---
 include/hw/virtio/vhost-vdpa.h |  11 +++
 3 files changed, 161 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 8f8d05cf9b..de4da2c65c 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
 vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
 vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
 vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
-vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
 vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
 vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 0c8c37e786..39720d12a6 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -26,6 +26,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "hw/virtio/virtio-access.h"
 
 /*
  * Return one past the end of the end of section. Be careful with uint64_t
@@ -60,13 +61,21 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
                      iova_min, section->offset_within_address_space);
         return true;
     }
+    /*
+     * While using vIOMMU, sometimes the section will be larger than iova_max,
+     * but the memory that actually maps is smaller, so move the check to
+     * function vhost_vdpa_iommu_map_notify(). That function will use the actual
+     * size that maps to the kernel
+     */
 
-    llend = vhost_vdpa_section_end(section);
-    if (int128_gt(llend, int128_make64(iova_max))) {
-        error_report("RAM section out of device range (max=0x%" PRIx64
-                     ", end addr=0x%" PRIx64 ")",
-                     iova_max, int128_get64(llend));
-        return true;
+    if (!memory_region_is_iommu(section->mr)) {
+        llend = vhost_vdpa_section_end(section);
+        if (int128_gt(llend, int128_make64(iova_max))) {
+            error_report("RAM section out of device range (max=0x%" PRIx64
+                         ", end addr=0x%" PRIx64 ")",
+                         iova_max, int128_get64(llend));
+            return true;
+        }
     }
 
     return false;
@@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
     v->iotlb_batch_begin_sent = true;
 }
 
-static void vhost_vdpa_listener_commit(MemoryListener *listener)
+static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
 {
-    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
     struct vhost_dev *dev = v->dev;
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
@@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
     msg.type = v->msg_type;
     msg.iotlb.type = VHOST_IOTLB_BATCH_END;
 
-    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
+    trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);
     if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
         error_report("failed to write, fd=%d, errno=%d (%s)",
                      fd, errno, strerror(errno));
@@ -185,6 +193,124 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
     v->iotlb_batch_begin_sent = false;
 }
 
+static void vhost_vdpa_listener_commit(MemoryListener *listener)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+    vhost_vdpa_iotlb_batch_end_once(v);
+}
+
+static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
+
+    hwaddr iova = iotlb->iova + iommu->iommu_offset;
+    struct vhost_vdpa *v = iommu->dev;
+    void *vaddr;
+    int ret;
+    Int128 llend;
+
+    if (iotlb->target_as != &address_space_memory) {
+        error_report("Wrong target AS \"%s\", only system memory is allowed",
+                     iotlb->target_as->name ? iotlb->target_as->name : "none");
+        return;
+    }
+    RCU_READ_LOCK_GUARD();
+    /* check if RAM section out of device range */
+    llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
+    if (int128_gt(llend, int128_make64(v->iova_range.last))) {
+        error_report("RAM section out of device range (max=0x%" PRIx64
+                     ", end addr=0x%" PRIx64 ")",
+                     v->iova_range.last, int128_get64(llend));
+        return;
+    }
+
+    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        bool read_only;
+
+        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
+            return;
+        }
+        vhost_vdpa_iotlb_batch_begin_once(v);
+        ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
+                                 iotlb->addr_mask + 1, vaddr, read_only);
+        if (ret) {
+            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
+                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
+                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
+        }
+    } else {
+        vhost_vdpa_iotlb_batch_begin_once(v);
+        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
+                                   iotlb->addr_mask + 1);
+        if (ret) {
+            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
+                         "0x%" HWADDR_PRIx ") = %d (%m)",
+                         v, iova, iotlb->addr_mask + 1, ret);
+        }
+    }
+    vhost_vdpa_iotlb_batch_end_once(v);
+}
+
+static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
+                                        MemoryRegionSection *section)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+    struct vdpa_iommu *iommu;
+    Int128 end;
+    int iommu_idx;
+    IOMMUMemoryRegion *iommu_mr;
+    int ret;
+
+    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+
+    iommu = g_malloc0(sizeof(*iommu));
+    end = int128_add(int128_make64(section->offset_within_region),
+                     section->size);
+    end = int128_sub(end, int128_one());
+    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
+                                                   MEMTXATTRS_UNSPECIFIED);
+    iommu->iommu_mr = iommu_mr;
+    iommu_notifier_init(&iommu->n, vhost_vdpa_iommu_map_notify,
+                        IOMMU_NOTIFIER_IOTLB_EVENTS,
+                        section->offset_within_region,
+                        int128_get64(end),
+                        iommu_idx);
+    iommu->iommu_offset = section->offset_within_address_space -
+                          section->offset_within_region;
+    iommu->dev = v;
+
+    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
+    if (ret) {
+        g_free(iommu);
+        return;
+    }
+
+    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
+    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
+
+    return;
+}
+
+static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
+                                        MemoryRegionSection *section)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+
+    struct vdpa_iommu *iommu;
+
+    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
+    {
+        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
+            iommu->n.start == section->offset_within_region) {
+            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
+            QLIST_REMOVE(iommu, iommu_next);
+            g_free(iommu);
+            break;
+        }
+    }
+}
+
 static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                            MemoryRegionSection *section)
 {
@@ -199,6 +325,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
                                             v->iova_range.last)) {
         return;
     }
+    if (memory_region_is_iommu(section->mr)) {
+        vhost_vdpa_iommu_region_add(listener, section);
+        return;
+    }
 
     if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
                  (section->offset_within_region & ~TARGET_PAGE_MASK))) {
@@ -278,6 +408,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
                                             v->iova_range.last)) {
         return;
     }
+    if (memory_region_is_iommu(section->mr)) {
+        vhost_vdpa_iommu_region_del(listener, section);
+    }
 
     if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
                  (section->offset_within_region & ~TARGET_PAGE_MASK))) {
@@ -1182,7 +1315,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
     }
 
     if (started) {
-        memory_listener_register(&v->listener, &address_space_memory);
+        if (vhost_dev_has_iommu(dev) && (v->shadow_vqs_enabled)) {
+            error_report("SVQ can not work while IOMMU enable, please disable"
+                         "IOMMU and try again");
+            return -1;
+        }
+        memory_listener_register(&v->listener, dev->vdev->dma_as);
+
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     }
 
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index c278a2a8de..e64bfc7f98 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -52,6 +52,8 @@ typedef struct vhost_vdpa {
     struct vhost_dev *dev;
     Error *migration_blocker;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
+    QLIST_HEAD(, vdpa_iommu) iommu_list;
+    IOMMUNotifier n;
 } VhostVDPA;
 
 int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
@@ -61,4 +63,13 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
 int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
                          hwaddr size);
 
+typedef struct vdpa_iommu {
+    struct vhost_vdpa *dev;
+    IOMMUMemoryRegion *iommu_mr;
+    hwaddr iommu_offset;
+    IOMMUNotifier n;
+    QLIST_ENTRY(vdpa_iommu) iommu_next;
+} VDPAIOMMUState;
+
+
 #endif
-- 
2.34.3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.
  2023-03-21 14:23 ` [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU Cindy Lu
@ 2023-03-23  3:47   ` Jason Wang
  2023-03-23  8:40     ` Cindy Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2023-03-23  3:47 UTC (permalink / raw)
  To: Cindy Lu; +Cc: mst, qemu-devel

On Tue, Mar 21, 2023 at 10:24 PM Cindy Lu <lulu@redhat.com> wrote:
>
> 1. The vIOMMU support will make vDPA can work in IOMMU mode. This
> will fix security issues while using the no-IOMMU mode.
> To support this feature we need to add new functions for IOMMU MR adds and
> deletes.
>
> Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
> in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
> the function will return fail.
>
> 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
> MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
>
> Verified in vp_vdpa and vdpa_sim_net driver
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
>  hw/virtio/trace-events         |   2 +-
>  hw/virtio/vhost-vdpa.c         | 159 ++++++++++++++++++++++++++++++---
>  include/hw/virtio/vhost-vdpa.h |  11 +++
>  3 files changed, 161 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 8f8d05cf9b..de4da2c65c 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
>  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
>  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
>  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 0c8c37e786..39720d12a6 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -26,6 +26,7 @@
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-access.h"
>
>  /*
>   * Return one past the end of the end of section. Be careful with uint64_t
> @@ -60,13 +61,21 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>                       iova_min, section->offset_within_address_space);
>          return true;
>      }
> +    /*
> +     * While using vIOMMU, sometimes the section will be larger than iova_max,
> +     * but the memory that actually maps is smaller, so move the check to
> +     * function vhost_vdpa_iommu_map_notify(). That function will use the actual
> +     * size that maps to the kernel
> +     */
>
> -    llend = vhost_vdpa_section_end(section);
> -    if (int128_gt(llend, int128_make64(iova_max))) {
> -        error_report("RAM section out of device range (max=0x%" PRIx64
> -                     ", end addr=0x%" PRIx64 ")",
> -                     iova_max, int128_get64(llend));
> -        return true;
> +    if (!memory_region_is_iommu(section->mr)) {
> +        llend = vhost_vdpa_section_end(section);
> +        if (int128_gt(llend, int128_make64(iova_max))) {
> +            error_report("RAM section out of device range (max=0x%" PRIx64
> +                         ", end addr=0x%" PRIx64 ")",
> +                         iova_max, int128_get64(llend));
> +            return true;
> +        }
>      }
>
>      return false;
> @@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
>      v->iotlb_batch_begin_sent = true;
>  }
>
> -static void vhost_vdpa_listener_commit(MemoryListener *listener)
> +static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
>  {
> -    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
>      struct vhost_dev *dev = v->dev;
>      struct vhost_msg_v2 msg = {};
>      int fd = v->device_fd;
> @@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>      msg.type = v->msg_type;
>      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
>
> -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> +    trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);

I suggest to keep the commit trace. The commit and batch are different
things. If you want to trace the batch begin/end you should do it in
vhost_vdpa_iotlb_batch_begin_once() etc.

>      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
>          error_report("failed to write, fd=%d, errno=%d (%s)",
>                       fd, errno, strerror(errno));
> @@ -185,6 +193,124 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
>      v->iotlb_batch_begin_sent = false;
>  }
>
> +static void vhost_vdpa_listener_commit(MemoryListener *listener)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +    vhost_vdpa_iotlb_batch_end_once(v);
> +}
> +
> +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> +{
> +    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> +
> +    hwaddr iova = iotlb->iova + iommu->iommu_offset;
> +    struct vhost_vdpa *v = iommu->dev;
> +    void *vaddr;
> +    int ret;
> +    Int128 llend;
> +
> +    if (iotlb->target_as != &address_space_memory) {
> +        error_report("Wrong target AS \"%s\", only system memory is allowed",
> +                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> +        return;
> +    }
> +    RCU_READ_LOCK_GUARD();
> +    /* check if RAM section out of device range */
> +    llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
> +    if (int128_gt(llend, int128_make64(v->iova_range.last))) {
> +        error_report("RAM section out of device range (max=0x%" PRIx64
> +                     ", end addr=0x%" PRIx64 ")",
> +                     v->iova_range.last, int128_get64(llend));
> +        return;
> +    }
> +
> +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> +        bool read_only;
> +
> +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
> +            return;
> +        }
> +        vhost_vdpa_iotlb_batch_begin_once(v);

I think at most 2 ioctls for this, is this still worth to batch them?

Other looks good.

Thanks

> +        ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> +                                 iotlb->addr_mask + 1, vaddr, read_only);
> +        if (ret) {
> +            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> +                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> +                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
> +        }
> +    } else {
> +        vhost_vdpa_iotlb_batch_begin_once(v);
> +        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> +                                   iotlb->addr_mask + 1);
> +        if (ret) {
> +            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> +                         "0x%" HWADDR_PRIx ") = %d (%m)",
> +                         v, iova, iotlb->addr_mask + 1, ret);
> +        }
> +    }
> +    vhost_vdpa_iotlb_batch_end_once(v);
> +}
> +
> +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> +                                        MemoryRegionSection *section)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +
> +    struct vdpa_iommu *iommu;
> +    Int128 end;
> +    int iommu_idx;
> +    IOMMUMemoryRegion *iommu_mr;
> +    int ret;
> +
> +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> +
> +    iommu = g_malloc0(sizeof(*iommu));
> +    end = int128_add(int128_make64(section->offset_within_region),
> +                     section->size);
> +    end = int128_sub(end, int128_one());
> +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> +                                                   MEMTXATTRS_UNSPECIFIED);
> +    iommu->iommu_mr = iommu_mr;
> +    iommu_notifier_init(&iommu->n, vhost_vdpa_iommu_map_notify,
> +                        IOMMU_NOTIFIER_IOTLB_EVENTS,
> +                        section->offset_within_region,
> +                        int128_get64(end),
> +                        iommu_idx);
> +    iommu->iommu_offset = section->offset_within_address_space -
> +                          section->offset_within_region;
> +    iommu->dev = v;
> +
> +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> +    if (ret) {
> +        g_free(iommu);
> +        return;
> +    }
> +
> +    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
> +    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
> +
> +    return;
> +}
> +
> +static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
> +                                        MemoryRegionSection *section)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +
> +    struct vdpa_iommu *iommu;
> +
> +    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
> +    {
> +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> +            iommu->n.start == section->offset_within_region) {
> +            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
> +            QLIST_REMOVE(iommu, iommu_next);
> +            g_free(iommu);
> +            break;
> +        }
> +    }
> +}
> +
>  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                             MemoryRegionSection *section)
>  {
> @@ -199,6 +325,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                              v->iova_range.last)) {
>          return;
>      }
> +    if (memory_region_is_iommu(section->mr)) {
> +        vhost_vdpa_iommu_region_add(listener, section);
> +        return;
> +    }
>
>      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
>                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> @@ -278,6 +408,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>                                              v->iova_range.last)) {
>          return;
>      }
> +    if (memory_region_is_iommu(section->mr)) {
> +        vhost_vdpa_iommu_region_del(listener, section);
> +    }
>
>      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
>                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> @@ -1182,7 +1315,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>      }
>
>      if (started) {
> -        memory_listener_register(&v->listener, &address_space_memory);
> +        if (vhost_dev_has_iommu(dev) && (v->shadow_vqs_enabled)) {
> +            error_report("SVQ can not work while IOMMU enable, please disable"
> +                         "IOMMU and try again");
> +            return -1;
> +        }
> +        memory_listener_register(&v->listener, dev->vdev->dma_as);
> +
>          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>      }
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index c278a2a8de..e64bfc7f98 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -52,6 +52,8 @@ typedef struct vhost_vdpa {
>      struct vhost_dev *dev;
>      Error *migration_blocker;
>      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> +    QLIST_HEAD(, vdpa_iommu) iommu_list;
> +    IOMMUNotifier n;
>  } VhostVDPA;
>
>  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> @@ -61,4 +63,13 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
>  int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
>                           hwaddr size);
>
> +typedef struct vdpa_iommu {
> +    struct vhost_vdpa *dev;
> +    IOMMUMemoryRegion *iommu_mr;
> +    hwaddr iommu_offset;
> +    IOMMUNotifier n;
> +    QLIST_ENTRY(vdpa_iommu) iommu_next;
> +} VDPAIOMMUState;
> +
> +
>  #endif
> --
> 2.34.3
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu()
  2023-03-21 14:23 ` [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu() Cindy Lu
@ 2023-03-23  3:48   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-23  3:48 UTC (permalink / raw)
  To: Cindy Lu; +Cc: mst, qemu-devel

On Tue, Mar 21, 2023 at 10:23 PM Cindy Lu <lulu@redhat.com> wrote:
>
> To support vIOMMU in vdpa, need to exposed the function
> vhost_dev_has_iommu, vdpa will use this function to check
> if vIOMMU enable.
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>

It looks like you missed my acks for patches 1 - 3.

Thanks

> ---
>  hw/virtio/vhost.c         | 2 +-
>  include/hw/virtio/vhost.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index a266396576..fd746b085b 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -107,7 +107,7 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
>      }
>  }
>
> -static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> +bool vhost_dev_has_iommu(struct vhost_dev *dev)
>  {
>      VirtIODevice *vdev = dev->vdev;
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a52f273347..f7f10c8fb7 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -336,4 +336,5 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>                             struct vhost_inflight *inflight);
>  int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
>                             struct vhost_inflight *inflight);
> +bool vhost_dev_has_iommu(struct vhost_dev *dev);
>  #endif
> --
> 2.34.3
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.
  2023-03-23  3:47   ` Jason Wang
@ 2023-03-23  8:40     ` Cindy Lu
  2023-03-24  2:49       ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Cindy Lu @ 2023-03-23  8:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, qemu-devel

On Thu, Mar 23, 2023 at 11:47 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Mar 21, 2023 at 10:24 PM Cindy Lu <lulu@redhat.com> wrote:
> >
> > 1. The vIOMMU support will make vDPA can work in IOMMU mode. This
> > will fix security issues while using the no-IOMMU mode.
> > To support this feature we need to add new functions for IOMMU MR adds and
> > deletes.
> >
> > Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
> > in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
> > the function will return fail.
> >
> > 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
> > MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
> >
> > Verified in vp_vdpa and vdpa_sim_net driver
> >
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > ---
> >  hw/virtio/trace-events         |   2 +-
> >  hw/virtio/vhost-vdpa.c         | 159 ++++++++++++++++++++++++++++++---
> >  include/hw/virtio/vhost-vdpa.h |  11 +++
> >  3 files changed, 161 insertions(+), 11 deletions(-)
> >
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 8f8d05cf9b..de4da2c65c 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
> >  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> >  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> >  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > +vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> >  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
> >  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 0c8c37e786..39720d12a6 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -26,6 +26,7 @@
> >  #include "cpu.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "hw/virtio/virtio-access.h"
> >
> >  /*
> >   * Return one past the end of the end of section. Be careful with uint64_t
> > @@ -60,13 +61,21 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> >                       iova_min, section->offset_within_address_space);
> >          return true;
> >      }
> > +    /*
> > +     * While using vIOMMU, sometimes the section will be larger than iova_max,
> > +     * but the memory that actually maps is smaller, so move the check to
> > +     * function vhost_vdpa_iommu_map_notify(). That function will use the actual
> > +     * size that maps to the kernel
> > +     */
> >
> > -    llend = vhost_vdpa_section_end(section);
> > -    if (int128_gt(llend, int128_make64(iova_max))) {
> > -        error_report("RAM section out of device range (max=0x%" PRIx64
> > -                     ", end addr=0x%" PRIx64 ")",
> > -                     iova_max, int128_get64(llend));
> > -        return true;
> > +    if (!memory_region_is_iommu(section->mr)) {
> > +        llend = vhost_vdpa_section_end(section);
> > +        if (int128_gt(llend, int128_make64(iova_max))) {
> > +            error_report("RAM section out of device range (max=0x%" PRIx64
> > +                         ", end addr=0x%" PRIx64 ")",
> > +                         iova_max, int128_get64(llend));
> > +            return true;
> > +        }
> >      }
> >
> >      return false;
> > @@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> >      v->iotlb_batch_begin_sent = true;
> >  }
> >
> > -static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > +static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
> >  {
> > -    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> >      struct vhost_dev *dev = v->dev;
> >      struct vhost_msg_v2 msg = {};
> >      int fd = v->device_fd;
> > @@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >      msg.type = v->msg_type;
> >      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> >
> > -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> > +    trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);
>
> I suggest to keep the commit trace. The commit and batch are different
> things. If you want to trace the batch begin/end you should do it in
> vhost_vdpa_iotlb_batch_begin_once() etc.
>
sure will fix this
> >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> >          error_report("failed to write, fd=%d, errno=%d (%s)",
> >                       fd, errno, strerror(errno));
> > @@ -185,6 +193,124 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> >      v->iotlb_batch_begin_sent = false;
> >  }
> >
> > +static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > +{
> > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > +    vhost_vdpa_iotlb_batch_end_once(v);
> > +}
> > +
> > +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > +{
> > +    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> > +
> > +    hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > +    struct vhost_vdpa *v = iommu->dev;
> > +    void *vaddr;
> > +    int ret;
> > +    Int128 llend;
> > +
> > +    if (iotlb->target_as != &address_space_memory) {
> > +        error_report("Wrong target AS \"%s\", only system memory is allowed",
> > +                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> > +        return;
> > +    }
> > +    RCU_READ_LOCK_GUARD();
> > +    /* check if RAM section out of device range */
> > +    llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
> > +    if (int128_gt(llend, int128_make64(v->iova_range.last))) {
> > +        error_report("RAM section out of device range (max=0x%" PRIx64
> > +                     ", end addr=0x%" PRIx64 ")",
> > +                     v->iova_range.last, int128_get64(llend));
> > +        return;
> > +    }
> > +
> > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > +        bool read_only;
> > +
> > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
> > +            return;
> > +        }
> > +        vhost_vdpa_iotlb_batch_begin_once(v);
>
> I think at most 2 ioctls for this, is this still worth to batch them?
>
> Other looks good.
>
> Thanks
>
 the kernel vdpa doesn't support  no-batch mode, if we remove the batch here
the system will fail to map
qemu-system-x86_64: failed to write, fd=12, errno=14 (Bad address)
qemu-system-x86_64: vhost_vdpa_dma_unmap(0x7f811a950190, 0x0,
0x80000000) = -5 (Bad address)

I'm not sure maybe this is a bug in the kernel?
Thanks
Cindy

> > +        ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > +                                 iotlb->addr_mask + 1, vaddr, read_only);
> > +        if (ret) {
> > +            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> > +                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> > +                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
> > +        }
> > +    } else {
> > +        vhost_vdpa_iotlb_batch_begin_once(v);
> > +        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > +                                   iotlb->addr_mask + 1);
> > +        if (ret) {
> > +            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> > +                         "0x%" HWADDR_PRIx ") = %d (%m)",
> > +                         v, iova, iotlb->addr_mask + 1, ret);
> > +        }
> > +    }
> > +    vhost_vdpa_iotlb_batch_end_once(v);
> > +}
> > +
> > +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> > +                                        MemoryRegionSection *section)
> > +{
> > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > +
> > +    struct vdpa_iommu *iommu;
> > +    Int128 end;
> > +    int iommu_idx;
> > +    IOMMUMemoryRegion *iommu_mr;
> > +    int ret;
> > +
> > +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> > +
> > +    iommu = g_malloc0(sizeof(*iommu));
> > +    end = int128_add(int128_make64(section->offset_within_region),
> > +                     section->size);
> > +    end = int128_sub(end, int128_one());
> > +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> > +                                                   MEMTXATTRS_UNSPECIFIED);
> > +    iommu->iommu_mr = iommu_mr;
> > +    iommu_notifier_init(&iommu->n, vhost_vdpa_iommu_map_notify,
> > +                        IOMMU_NOTIFIER_IOTLB_EVENTS,
> > +                        section->offset_within_region,
> > +                        int128_get64(end),
> > +                        iommu_idx);
> > +    iommu->iommu_offset = section->offset_within_address_space -
> > +                          section->offset_within_region;
> > +    iommu->dev = v;
> > +
> > +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> > +    if (ret) {
> > +        g_free(iommu);
> > +        return;
> > +    }
> > +
> > +    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
> > +    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
> > +
> > +    return;
> > +}
> > +
> > +static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
> > +                                        MemoryRegionSection *section)
> > +{
> > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > +
> > +    struct vdpa_iommu *iommu;
> > +
> > +    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
> > +    {
> > +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> > +            iommu->n.start == section->offset_within_region) {
> > +            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
> > +            QLIST_REMOVE(iommu, iommu_next);
> > +            g_free(iommu);
> > +            break;
> > +        }
> > +    }
> > +}
> > +
> >  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> >                                             MemoryRegionSection *section)
> >  {
> > @@ -199,6 +325,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> >                                              v->iova_range.last)) {
> >          return;
> >      }
> > +    if (memory_region_is_iommu(section->mr)) {
> > +        vhost_vdpa_iommu_region_add(listener, section);
> > +        return;
> > +    }
> >
> >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > @@ -278,6 +408,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> >                                              v->iova_range.last)) {
> >          return;
> >      }
> > +    if (memory_region_is_iommu(section->mr)) {
> > +        vhost_vdpa_iommu_region_del(listener, section);
> > +    }
> >
> >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > @@ -1182,7 +1315,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> >      }
> >
> >      if (started) {
> > -        memory_listener_register(&v->listener, &address_space_memory);
> > +        if (vhost_dev_has_iommu(dev) && (v->shadow_vqs_enabled)) {
> > +            error_report("SVQ can not work while IOMMU enable, please disable"
> > +                         "IOMMU and try again");
> > +            return -1;
> > +        }
> > +        memory_listener_register(&v->listener, dev->vdev->dma_as);
> > +
> >          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> >      }
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index c278a2a8de..e64bfc7f98 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -52,6 +52,8 @@ typedef struct vhost_vdpa {
> >      struct vhost_dev *dev;
> >      Error *migration_blocker;
> >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > +    QLIST_HEAD(, vdpa_iommu) iommu_list;
> > +    IOMMUNotifier n;
> >  } VhostVDPA;
> >
> >  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> > @@ -61,4 +63,13 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> >  int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> >                           hwaddr size);
> >
> > +typedef struct vdpa_iommu {
> > +    struct vhost_vdpa *dev;
> > +    IOMMUMemoryRegion *iommu_mr;
> > +    hwaddr iommu_offset;
> > +    IOMMUNotifier n;
> > +    QLIST_ENTRY(vdpa_iommu) iommu_next;
> > +} VDPAIOMMUState;
> > +
> > +
> >  #endif
> > --
> > 2.34.3
> >
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.
  2023-03-23  8:40     ` Cindy Lu
@ 2023-03-24  2:49       ` Jason Wang
  2023-03-24  2:59         ` Cindy Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2023-03-24  2:49 UTC (permalink / raw)
  To: Cindy Lu; +Cc: mst, qemu-devel

On Thu, Mar 23, 2023 at 4:41 PM Cindy Lu <lulu@redhat.com> wrote:
>
> On Thu, Mar 23, 2023 at 11:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Mar 21, 2023 at 10:24 PM Cindy Lu <lulu@redhat.com> wrote:
> > >
> > > 1. The vIOMMU support will make vDPA can work in IOMMU mode. This
> > > will fix security issues while using the no-IOMMU mode.
> > > To support this feature we need to add new functions for IOMMU MR adds and
> > > deletes.
> > >
> > > Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
> > > in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
> > > the function will return fail.
> > >
> > > 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
> > > MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
> > >
> > > Verified in vp_vdpa and vdpa_sim_net driver
> > >
> > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > ---
> > >  hw/virtio/trace-events         |   2 +-
> > >  hw/virtio/vhost-vdpa.c         | 159 ++++++++++++++++++++++++++++++---
> > >  include/hw/virtio/vhost-vdpa.h |  11 +++
> > >  3 files changed, 161 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index 8f8d05cf9b..de4da2c65c 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
> > >  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > >  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > >  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > +vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> > >  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
> > >  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 0c8c37e786..39720d12a6 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -26,6 +26,7 @@
> > >  #include "cpu.h"
> > >  #include "trace.h"
> > >  #include "qapi/error.h"
> > > +#include "hw/virtio/virtio-access.h"
> > >
> > >  /*
> > >   * Return one past the end of the end of section. Be careful with uint64_t
> > > @@ -60,13 +61,21 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> > >                       iova_min, section->offset_within_address_space);
> > >          return true;
> > >      }
> > > +    /*
> > > +     * While using vIOMMU, sometimes the section will be larger than iova_max,
> > > +     * but the memory that actually maps is smaller, so move the check to
> > > +     * function vhost_vdpa_iommu_map_notify(). That function will use the actual
> > > +     * size that maps to the kernel
> > > +     */
> > >
> > > -    llend = vhost_vdpa_section_end(section);
> > > -    if (int128_gt(llend, int128_make64(iova_max))) {
> > > -        error_report("RAM section out of device range (max=0x%" PRIx64
> > > -                     ", end addr=0x%" PRIx64 ")",
> > > -                     iova_max, int128_get64(llend));
> > > -        return true;
> > > +    if (!memory_region_is_iommu(section->mr)) {
> > > +        llend = vhost_vdpa_section_end(section);
> > > +        if (int128_gt(llend, int128_make64(iova_max))) {
> > > +            error_report("RAM section out of device range (max=0x%" PRIx64
> > > +                         ", end addr=0x%" PRIx64 ")",
> > > +                         iova_max, int128_get64(llend));
> > > +            return true;
> > > +        }
> > >      }
> > >
> > >      return false;
> > > @@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> > >      v->iotlb_batch_begin_sent = true;
> > >  }
> > >
> > > -static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > +static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
> > >  {
> > > -    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > >      struct vhost_dev *dev = v->dev;
> > >      struct vhost_msg_v2 msg = {};
> > >      int fd = v->device_fd;
> > > @@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > >      msg.type = v->msg_type;
> > >      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> > >
> > > -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> > > +    trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);
> >
> > I suggest to keep the commit trace. The commit and batch are different
> > things. If you want to trace the batch begin/end you should do it in
> > vhost_vdpa_iotlb_batch_begin_once() etc.
> >
> sure will fix this
> > >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > >          error_report("failed to write, fd=%d, errno=%d (%s)",
> > >                       fd, errno, strerror(errno));
> > > @@ -185,6 +193,124 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > >      v->iotlb_batch_begin_sent = false;
> > >  }
> > >
> > > +static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > +{
> > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > +    vhost_vdpa_iotlb_batch_end_once(v);
> > > +}
> > > +
> > > +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > +{
> > > +    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> > > +
> > > +    hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > +    struct vhost_vdpa *v = iommu->dev;
> > > +    void *vaddr;
> > > +    int ret;
> > > +    Int128 llend;
> > > +
> > > +    if (iotlb->target_as != &address_space_memory) {
> > > +        error_report("Wrong target AS \"%s\", only system memory is allowed",
> > > +                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> > > +        return;
> > > +    }
> > > +    RCU_READ_LOCK_GUARD();
> > > +    /* check if RAM section out of device range */
> > > +    llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
> > > +    if (int128_gt(llend, int128_make64(v->iova_range.last))) {
> > > +        error_report("RAM section out of device range (max=0x%" PRIx64
> > > +                     ", end addr=0x%" PRIx64 ")",
> > > +                     v->iova_range.last, int128_get64(llend));
> > > +        return;
> > > +    }
> > > +
> > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > +        bool read_only;
> > > +
> > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
> > > +            return;
> > > +        }
> > > +        vhost_vdpa_iotlb_batch_begin_once(v);
> >
> > I think at most 2 ioctls for this, is this still worth to batch them?
> >
> > Other looks good.
> >
> > Thanks
> >
>  the kernel vdpa doesn't support  no-batch mode, if we remove the batch here
> the system will fail to map
> qemu-system-x86_64: failed to write, fd=12, errno=14 (Bad address)
> qemu-system-x86_64: vhost_vdpa_dma_unmap(0x7f811a950190, 0x0,
> 0x80000000) = -5 (Bad address)
>
> I'm not sure maybe this is a bug in the kernel?

I'm not sure I understand this, but do you mean you meet this if you
remove the batch_begin_once() and vhost_vdpa_iotlb_batch_end_once()?

Thanks

> Thanks
> Cindy
>
> > > +        ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > > +                                 iotlb->addr_mask + 1, vaddr, read_only);
> > > +        if (ret) {
> > > +            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> > > +                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> > > +                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
> > > +        }
> > > +    } else {
> > > +        vhost_vdpa_iotlb_batch_begin_once(v);
> > > +        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > > +                                   iotlb->addr_mask + 1);
> > > +        if (ret) {
> > > +            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> > > +                         "0x%" HWADDR_PRIx ") = %d (%m)",
> > > +                         v, iova, iotlb->addr_mask + 1, ret);
> > > +        }
> > > +    }
> > > +    vhost_vdpa_iotlb_batch_end_once(v);
> > > +}
> > > +
> > > +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> > > +                                        MemoryRegionSection *section)
> > > +{
> > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > +
> > > +    struct vdpa_iommu *iommu;
> > > +    Int128 end;
> > > +    int iommu_idx;
> > > +    IOMMUMemoryRegion *iommu_mr;
> > > +    int ret;
> > > +
> > > +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> > > +
> > > +    iommu = g_malloc0(sizeof(*iommu));
> > > +    end = int128_add(int128_make64(section->offset_within_region),
> > > +                     section->size);
> > > +    end = int128_sub(end, int128_one());
> > > +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> > > +                                                   MEMTXATTRS_UNSPECIFIED);
> > > +    iommu->iommu_mr = iommu_mr;
> > > +    iommu_notifier_init(&iommu->n, vhost_vdpa_iommu_map_notify,
> > > +                        IOMMU_NOTIFIER_IOTLB_EVENTS,
> > > +                        section->offset_within_region,
> > > +                        int128_get64(end),
> > > +                        iommu_idx);
> > > +    iommu->iommu_offset = section->offset_within_address_space -
> > > +                          section->offset_within_region;
> > > +    iommu->dev = v;
> > > +
> > > +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> > > +    if (ret) {
> > > +        g_free(iommu);
> > > +        return;
> > > +    }
> > > +
> > > +    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
> > > +    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
> > > +
> > > +    return;
> > > +}
> > > +
> > > +static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
> > > +                                        MemoryRegionSection *section)
> > > +{
> > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > +
> > > +    struct vdpa_iommu *iommu;
> > > +
> > > +    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
> > > +    {
> > > +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> > > +            iommu->n.start == section->offset_within_region) {
> > > +            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
> > > +            QLIST_REMOVE(iommu, iommu_next);
> > > +            g_free(iommu);
> > > +            break;
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > >                                             MemoryRegionSection *section)
> > >  {
> > > @@ -199,6 +325,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > >                                              v->iova_range.last)) {
> > >          return;
> > >      }
> > > +    if (memory_region_is_iommu(section->mr)) {
> > > +        vhost_vdpa_iommu_region_add(listener, section);
> > > +        return;
> > > +    }
> > >
> > >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > > @@ -278,6 +408,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> > >                                              v->iova_range.last)) {
> > >          return;
> > >      }
> > > +    if (memory_region_is_iommu(section->mr)) {
> > > +        vhost_vdpa_iommu_region_del(listener, section);
> > > +    }
> > >
> > >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > > @@ -1182,7 +1315,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >      }
> > >
> > >      if (started) {
> > > -        memory_listener_register(&v->listener, &address_space_memory);
> > > +        if (vhost_dev_has_iommu(dev) && (v->shadow_vqs_enabled)) {
> > > +            error_report("SVQ can not work while IOMMU enable, please disable"
> > > +                         "IOMMU and try again");
> > > +            return -1;
> > > +        }
> > > +        memory_listener_register(&v->listener, dev->vdev->dma_as);
> > > +
> > >          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > >      }
> > >
> > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > > index c278a2a8de..e64bfc7f98 100644
> > > --- a/include/hw/virtio/vhost-vdpa.h
> > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > @@ -52,6 +52,8 @@ typedef struct vhost_vdpa {
> > >      struct vhost_dev *dev;
> > >      Error *migration_blocker;
> > >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > +    QLIST_HEAD(, vdpa_iommu) iommu_list;
> > > +    IOMMUNotifier n;
> > >  } VhostVDPA;
> > >
> > >  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> > > @@ -61,4 +63,13 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > >  int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > >                           hwaddr size);
> > >
> > > +typedef struct vdpa_iommu {
> > > +    struct vhost_vdpa *dev;
> > > +    IOMMUMemoryRegion *iommu_mr;
> > > +    hwaddr iommu_offset;
> > > +    IOMMUNotifier n;
> > > +    QLIST_ENTRY(vdpa_iommu) iommu_next;
> > > +} VDPAIOMMUState;
> > > +
> > > +
> > >  #endif
> > > --
> > > 2.34.3
> > >
> >
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.
  2023-03-24  2:49       ` Jason Wang
@ 2023-03-24  2:59         ` Cindy Lu
  2023-03-24  3:45           ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Cindy Lu @ 2023-03-24  2:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, qemu-devel

On Fri, Mar 24, 2023 at 10:49 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Mar 23, 2023 at 4:41 PM Cindy Lu <lulu@redhat.com> wrote:
> >
> > On Thu, Mar 23, 2023 at 11:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Mar 21, 2023 at 10:24 PM Cindy Lu <lulu@redhat.com> wrote:
> > > >
> > > > 1. The vIOMMU support will make vDPA can work in IOMMU mode. This
> > > > will fix security issues while using the no-IOMMU mode.
> > > > To support this feature we need to add new functions for IOMMU MR adds and
> > > > deletes.
> > > >
> > > > Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
> > > > in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
> > > > the function will return fail.
> > > >
> > > > 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
> > > > MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
> > > >
> > > > Verified in vp_vdpa and vdpa_sim_net driver
> > > >
> > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > ---
> > > >  hw/virtio/trace-events         |   2 +-
> > > >  hw/virtio/vhost-vdpa.c         | 159 ++++++++++++++++++++++++++++++---
> > > >  include/hw/virtio/vhost-vdpa.h |  11 +++
> > > >  3 files changed, 161 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > index 8f8d05cf9b..de4da2c65c 100644
> > > > --- a/hw/virtio/trace-events
> > > > +++ b/hw/virtio/trace-events
> > > > @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
> > > >  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > > >  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > > >  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > > -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > > +vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> > > >  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
> > > >  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > index 0c8c37e786..39720d12a6 100644
> > > > --- a/hw/virtio/vhost-vdpa.c
> > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include "cpu.h"
> > > >  #include "trace.h"
> > > >  #include "qapi/error.h"
> > > > +#include "hw/virtio/virtio-access.h"
> > > >
> > > >  /*
> > > >   * Return one past the end of the end of section. Be careful with uint64_t
> > > > @@ -60,13 +61,21 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> > > >                       iova_min, section->offset_within_address_space);
> > > >          return true;
> > > >      }
> > > > +    /*
> > > > +     * While using vIOMMU, sometimes the section will be larger than iova_max,
> > > > +     * but the memory that actually maps is smaller, so move the check to
> > > > +     * function vhost_vdpa_iommu_map_notify(). That function will use the actual
> > > > +     * size that maps to the kernel
> > > > +     */
> > > >
> > > > -    llend = vhost_vdpa_section_end(section);
> > > > -    if (int128_gt(llend, int128_make64(iova_max))) {
> > > > -        error_report("RAM section out of device range (max=0x%" PRIx64
> > > > -                     ", end addr=0x%" PRIx64 ")",
> > > > -                     iova_max, int128_get64(llend));
> > > > -        return true;
> > > > +    if (!memory_region_is_iommu(section->mr)) {
> > > > +        llend = vhost_vdpa_section_end(section);
> > > > +        if (int128_gt(llend, int128_make64(iova_max))) {
> > > > +            error_report("RAM section out of device range (max=0x%" PRIx64
> > > > +                         ", end addr=0x%" PRIx64 ")",
> > > > +                         iova_max, int128_get64(llend));
> > > > +            return true;
> > > > +        }
> > > >      }
> > > >
> > > >      return false;
> > > > @@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> > > >      v->iotlb_batch_begin_sent = true;
> > > >  }
> > > >
> > > > -static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > > +static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
> > > >  {
> > > > -    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > >      struct vhost_dev *dev = v->dev;
> > > >      struct vhost_msg_v2 msg = {};
> > > >      int fd = v->device_fd;
> > > > @@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > >      msg.type = v->msg_type;
> > > >      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> > > >
> > > > -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> > > > +    trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);
> > >
> > > I suggest to keep the commit trace. The commit and batch are different
> > > things. If you want to trace the batch begin/end you should do it in
> > > vhost_vdpa_iotlb_batch_begin_once() etc.
> > >
> > sure will fix this
> > > >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > > >          error_report("failed to write, fd=%d, errno=%d (%s)",
> > > >                       fd, errno, strerror(errno));
> > > > @@ -185,6 +193,124 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > >      v->iotlb_batch_begin_sent = false;
> > > >  }
> > > >
> > > > +static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > > +{
> > > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > +    vhost_vdpa_iotlb_batch_end_once(v);
> > > > +}
> > > > +
> > > > +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > +{
> > > > +    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> > > > +
> > > > +    hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > +    struct vhost_vdpa *v = iommu->dev;
> > > > +    void *vaddr;
> > > > +    int ret;
> > > > +    Int128 llend;
> > > > +
> > > > +    if (iotlb->target_as != &address_space_memory) {
> > > > +        error_report("Wrong target AS \"%s\", only system memory is allowed",
> > > > +                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> > > > +        return;
> > > > +    }
> > > > +    RCU_READ_LOCK_GUARD();
> > > > +    /* check if RAM section out of device range */
> > > > +    llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
> > > > +    if (int128_gt(llend, int128_make64(v->iova_range.last))) {
> > > > +        error_report("RAM section out of device range (max=0x%" PRIx64
> > > > +                     ", end addr=0x%" PRIx64 ")",
> > > > +                     v->iova_range.last, int128_get64(llend));
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > +        bool read_only;
> > > > +
> > > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
> > > > +            return;
> > > > +        }
> > > > +        vhost_vdpa_iotlb_batch_begin_once(v);
> > >
> > > I think at most 2 ioctls for this, is this still worth to batch them?
> > >
> > > Other looks good.
> > >
> > > Thanks
> > >
> >  the kernel vdpa doesn't support  no-batch mode, if we remove the batch here
> > the system will fail to map
> > qemu-system-x86_64: failed to write, fd=12, errno=14 (Bad address)
> > qemu-system-x86_64: vhost_vdpa_dma_unmap(0x7f811a950190, 0x0,
> > 0x80000000) = -5 (Bad address)
> >
> > I'm not sure maybe this is a bug in the kernel?
>
> I'm not sure I understand this, but do you mean you meet this if you
> remove the batch_begin_once() and vhost_vdpa_iotlb_batch_end_once()?
>
> Thanks
>
yes, the system will fail to map if we remove these functions, Does
this work as expect?
Maybe we need to fix this in the kernel?
Thanks
Cindy
> > Thanks
> > Cindy
> >
> > > > +        ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > > > +                                 iotlb->addr_mask + 1, vaddr, read_only);
> > > > +        if (ret) {
> > > > +            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> > > > +                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> > > > +                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
> > > > +        }
> > > > +    } else {
> > > > +        vhost_vdpa_iotlb_batch_begin_once(v);
> > > > +        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > > > +                                   iotlb->addr_mask + 1);
> > > > +        if (ret) {
> > > > +            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> > > > +                         "0x%" HWADDR_PRIx ") = %d (%m)",
> > > > +                         v, iova, iotlb->addr_mask + 1, ret);
> > > > +        }
> > > > +    }
> > > > +    vhost_vdpa_iotlb_batch_end_once(v);
> > > > +}
> > > > +
> > > > +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> > > > +                                        MemoryRegionSection *section)
> > > > +{
> > > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > +
> > > > +    struct vdpa_iommu *iommu;
> > > > +    Int128 end;
> > > > +    int iommu_idx;
> > > > +    IOMMUMemoryRegion *iommu_mr;
> > > > +    int ret;
> > > > +
> > > > +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> > > > +
> > > > +    iommu = g_malloc0(sizeof(*iommu));
> > > > +    end = int128_add(int128_make64(section->offset_within_region),
> > > > +                     section->size);
> > > > +    end = int128_sub(end, int128_one());
> > > > +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> > > > +                                                   MEMTXATTRS_UNSPECIFIED);
> > > > +    iommu->iommu_mr = iommu_mr;
> > > > +    iommu_notifier_init(&iommu->n, vhost_vdpa_iommu_map_notify,
> > > > +                        IOMMU_NOTIFIER_IOTLB_EVENTS,
> > > > +                        section->offset_within_region,
> > > > +                        int128_get64(end),
> > > > +                        iommu_idx);
> > > > +    iommu->iommu_offset = section->offset_within_address_space -
> > > > +                          section->offset_within_region;
> > > > +    iommu->dev = v;
> > > > +
> > > > +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> > > > +    if (ret) {
> > > > +        g_free(iommu);
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
> > > > +    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
> > > > +
> > > > +    return;
> > > > +}
> > > > +
> > > > +static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
> > > > +                                        MemoryRegionSection *section)
> > > > +{
> > > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > +
> > > > +    struct vdpa_iommu *iommu;
> > > > +
> > > > +    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
> > > > +    {
> > > > +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> > > > +            iommu->n.start == section->offset_within_region) {
> > > > +            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
> > > > +            QLIST_REMOVE(iommu, iommu_next);
> > > > +            g_free(iommu);
> > > > +            break;
> > > > +        }
> > > > +    }
> > > > +}
> > > > +
> > > >  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > >                                             MemoryRegionSection *section)
> > > >  {
> > > > @@ -199,6 +325,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > >                                              v->iova_range.last)) {
> > > >          return;
> > > >      }
> > > > +    if (memory_region_is_iommu(section->mr)) {
> > > > +        vhost_vdpa_iommu_region_add(listener, section);
> > > > +        return;
> > > > +    }
> > > >
> > > >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > > >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > > > @@ -278,6 +408,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> > > >                                              v->iova_range.last)) {
> > > >          return;
> > > >      }
> > > > +    if (memory_region_is_iommu(section->mr)) {
> > > > +        vhost_vdpa_iommu_region_del(listener, section);
> > > > +    }
> > > >
> > > >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > > >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > > > @@ -1182,7 +1315,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > > >      }
> > > >
> > > >      if (started) {
> > > > -        memory_listener_register(&v->listener, &address_space_memory);
> > > > +        if (vhost_dev_has_iommu(dev) && (v->shadow_vqs_enabled)) {
> > > > +            error_report("SVQ can not work while IOMMU enable, please disable"
> > > > +                         "IOMMU and try again");
> > > > +            return -1;
> > > > +        }
> > > > +        memory_listener_register(&v->listener, dev->vdev->dma_as);
> > > > +
> > > >          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > > >      }
> > > >
> > > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > > > index c278a2a8de..e64bfc7f98 100644
> > > > --- a/include/hw/virtio/vhost-vdpa.h
> > > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > > @@ -52,6 +52,8 @@ typedef struct vhost_vdpa {
> > > >      struct vhost_dev *dev;
> > > >      Error *migration_blocker;
> > > >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > > +    QLIST_HEAD(, vdpa_iommu) iommu_list;
> > > > +    IOMMUNotifier n;
> > > >  } VhostVDPA;
> > > >
> > > >  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> > > > @@ -61,4 +63,13 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > >  int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > >                           hwaddr size);
> > > >
> > > > +typedef struct vdpa_iommu {
> > > > +    struct vhost_vdpa *dev;
> > > > +    IOMMUMemoryRegion *iommu_mr;
> > > > +    hwaddr iommu_offset;
> > > > +    IOMMUNotifier n;
> > > > +    QLIST_ENTRY(vdpa_iommu) iommu_next;
> > > > +} VDPAIOMMUState;
> > > > +
> > > > +
> > > >  #endif
> > > > --
> > > > 2.34.3
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.
  2023-03-24  2:59         ` Cindy Lu
@ 2023-03-24  3:45           ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2023-03-24  3:45 UTC (permalink / raw)
  To: Cindy Lu; +Cc: mst, qemu-devel

On Fri, Mar 24, 2023 at 11:00 AM Cindy Lu <lulu@redhat.com> wrote:
>
> On Fri, Mar 24, 2023 at 10:49 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Thu, Mar 23, 2023 at 4:41 PM Cindy Lu <lulu@redhat.com> wrote:
> > >
> > > On Thu, Mar 23, 2023 at 11:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 21, 2023 at 10:24 PM Cindy Lu <lulu@redhat.com> wrote:
> > > > >
> > > > > 1. The vIOMMU support will make vDPA can work in IOMMU mode. This
> > > > > will fix security issues while using the no-IOMMU mode.
> > > > > To support this feature we need to add new functions for IOMMU MR adds and
> > > > > deletes.
> > > > >
> > > > > Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
> > > > > in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
> > > > > the function will return fail.
> > > > >
> > > > > 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
> > > > > MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
> > > > >
> > > > > Verified in vp_vdpa and vdpa_sim_net driver
> > > > >
> > > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > > ---
> > > > >  hw/virtio/trace-events         |   2 +-
> > > > >  hw/virtio/vhost-vdpa.c         | 159 ++++++++++++++++++++++++++++++---
> > > > >  include/hw/virtio/vhost-vdpa.h |  11 +++
> > > > >  3 files changed, 161 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > > index 8f8d05cf9b..de4da2c65c 100644
> > > > > --- a/hw/virtio/trace-events
> > > > > +++ b/hw/virtio/trace-events
> > > > > @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
> > > > >  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
> > > > >  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
> > > > >  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > > > -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > > > +vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> > > > >  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
> > > > >  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
> > > > >  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > index 0c8c37e786..39720d12a6 100644
> > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > @@ -26,6 +26,7 @@
> > > > >  #include "cpu.h"
> > > > >  #include "trace.h"
> > > > >  #include "qapi/error.h"
> > > > > +#include "hw/virtio/virtio-access.h"
> > > > >
> > > > >  /*
> > > > >   * Return one past the end of the end of section. Be careful with uint64_t
> > > > > @@ -60,13 +61,21 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> > > > >                       iova_min, section->offset_within_address_space);
> > > > >          return true;
> > > > >      }
> > > > > +    /*
> > > > > +     * While using vIOMMU, sometimes the section will be larger than iova_max,
> > > > > +     * but the memory that actually maps is smaller, so move the check to
> > > > > +     * function vhost_vdpa_iommu_map_notify(). That function will use the actual
> > > > > +     * size that maps to the kernel
> > > > > +     */
> > > > >
> > > > > -    llend = vhost_vdpa_section_end(section);
> > > > > -    if (int128_gt(llend, int128_make64(iova_max))) {
> > > > > -        error_report("RAM section out of device range (max=0x%" PRIx64
> > > > > -                     ", end addr=0x%" PRIx64 ")",
> > > > > -                     iova_max, int128_get64(llend));
> > > > > -        return true;
> > > > > +    if (!memory_region_is_iommu(section->mr)) {
> > > > > +        llend = vhost_vdpa_section_end(section);
> > > > > +        if (int128_gt(llend, int128_make64(iova_max))) {
> > > > > +            error_report("RAM section out of device range (max=0x%" PRIx64
> > > > > +                         ", end addr=0x%" PRIx64 ")",
> > > > > +                         iova_max, int128_get64(llend));
> > > > > +            return true;
> > > > > +        }
> > > > >      }
> > > > >
> > > > >      return false;
> > > > > @@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct vhost_vdpa *v)
> > > > >      v->iotlb_batch_begin_sent = true;
> > > > >  }
> > > > >
> > > > > -static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > > > +static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
> > > > >  {
> > > > > -    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > >      struct vhost_dev *dev = v->dev;
> > > > >      struct vhost_msg_v2 msg = {};
> > > > >      int fd = v->device_fd;
> > > > > @@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > > >      msg.type = v->msg_type;
> > > > >      msg.iotlb.type = VHOST_IOTLB_BATCH_END;
> > > > >
> > > > > -    trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> > > > > +    trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);
> > > >
> > > > I suggest to keep the commit trace. The commit and batch are different
> > > > things. If you want to trace the batch begin/end you should do it in
> > > > vhost_vdpa_iotlb_batch_begin_once() etc.
> > > >
> > > sure will fix this
> > > > >      if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > > > >          error_report("failed to write, fd=%d, errno=%d (%s)",
> > > > >                       fd, errno, strerror(errno));
> > > > > @@ -185,6 +193,124 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > > >      v->iotlb_batch_begin_sent = false;
> > > > >  }
> > > > >
> > > > > +static void vhost_vdpa_listener_commit(MemoryListener *listener)
> > > > > +{
> > > > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > > +    vhost_vdpa_iotlb_batch_end_once(v);
> > > > > +}
> > > > > +
> > > > > +static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > +{
> > > > > +    struct vdpa_iommu *iommu = container_of(n, struct vdpa_iommu, n);
> > > > > +
> > > > > +    hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > > +    struct vhost_vdpa *v = iommu->dev;
> > > > > +    void *vaddr;
> > > > > +    int ret;
> > > > > +    Int128 llend;
> > > > > +
> > > > > +    if (iotlb->target_as != &address_space_memory) {
> > > > > +        error_report("Wrong target AS \"%s\", only system memory is allowed",
> > > > > +                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> > > > > +        return;
> > > > > +    }
> > > > > +    RCU_READ_LOCK_GUARD();
> > > > > +    /* check if RAM section out of device range */
> > > > > +    llend = int128_add(int128_makes64(iotlb->addr_mask), int128_makes64(iova));
> > > > > +    if (int128_gt(llend, int128_make64(v->iova_range.last))) {
> > > > > +        error_report("RAM section out of device range (max=0x%" PRIx64
> > > > > +                     ", end addr=0x%" PRIx64 ")",
> > > > > +                     v->iova_range.last, int128_get64(llend));
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > +        bool read_only;
> > > > > +
> > > > > +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
> > > > > +            return;
> > > > > +        }
> > > > > +        vhost_vdpa_iotlb_batch_begin_once(v);
> > > >
> > > > I think at most 2 ioctls for this, is this still worth to batch them?
> > > >
> > > > Other looks good.
> > > >
> > > > Thanks
> > > >
> > >  the kernel vdpa doesn't support  no-batch mode, if we remove the batch here
> > > the system will fail to map
> > > qemu-system-x86_64: failed to write, fd=12, errno=14 (Bad address)
> > > qemu-system-x86_64: vhost_vdpa_dma_unmap(0x7f811a950190, 0x0,
> > > 0x80000000) = -5 (Bad address)
> > >
> > > I'm not sure maybe this is a bug in the kernel?
> >
> > I'm not sure I understand this, but do you mean you meet this if you
> > remove the batch_begin_once() and vhost_vdpa_iotlb_batch_end_once()?
> >
> > Thanks
> >
> yes, the system will fail to map if we remove these functions, Does
> this work as expect?

I think not, please trace to see if the map here is in the middle of a
batch. If not it should be a bug.

Thanks

> Maybe we need to fix this in the kernel?
> Thanks
> Cindy
> > > Thanks
> > > Cindy
> > >
> > > > > +        ret = vhost_vdpa_dma_map(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > > > > +                                 iotlb->addr_mask + 1, vaddr, read_only);
> > > > > +        if (ret) {
> > > > > +            error_report("vhost_vdpa_dma_map(%p, 0x%" HWADDR_PRIx ", "
> > > > > +                         "0x%" HWADDR_PRIx ", %p) = %d (%m)",
> > > > > +                         v, iova, iotlb->addr_mask + 1, vaddr, ret);
> > > > > +        }
> > > > > +    } else {
> > > > > +        vhost_vdpa_iotlb_batch_begin_once(v);
> > > > > +        ret = vhost_vdpa_dma_unmap(v, VHOST_VDPA_GUEST_PA_ASID, iova,
> > > > > +                                   iotlb->addr_mask + 1);
> > > > > +        if (ret) {
> > > > > +            error_report("vhost_vdpa_dma_unmap(%p, 0x%" HWADDR_PRIx ", "
> > > > > +                         "0x%" HWADDR_PRIx ") = %d (%m)",
> > > > > +                         v, iova, iotlb->addr_mask + 1, ret);
> > > > > +        }
> > > > > +    }
> > > > > +    vhost_vdpa_iotlb_batch_end_once(v);
> > > > > +}
> > > > > +
> > > > > +static void vhost_vdpa_iommu_region_add(MemoryListener *listener,
> > > > > +                                        MemoryRegionSection *section)
> > > > > +{
> > > > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > > +
> > > > > +    struct vdpa_iommu *iommu;
> > > > > +    Int128 end;
> > > > > +    int iommu_idx;
> > > > > +    IOMMUMemoryRegion *iommu_mr;
> > > > > +    int ret;
> > > > > +
> > > > > +    iommu_mr = IOMMU_MEMORY_REGION(section->mr);
> > > > > +
> > > > > +    iommu = g_malloc0(sizeof(*iommu));
> > > > > +    end = int128_add(int128_make64(section->offset_within_region),
> > > > > +                     section->size);
> > > > > +    end = int128_sub(end, int128_one());
> > > > > +    iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
> > > > > +                                                   MEMTXATTRS_UNSPECIFIED);
> > > > > +    iommu->iommu_mr = iommu_mr;
> > > > > +    iommu_notifier_init(&iommu->n, vhost_vdpa_iommu_map_notify,
> > > > > +                        IOMMU_NOTIFIER_IOTLB_EVENTS,
> > > > > +                        section->offset_within_region,
> > > > > +                        int128_get64(end),
> > > > > +                        iommu_idx);
> > > > > +    iommu->iommu_offset = section->offset_within_address_space -
> > > > > +                          section->offset_within_region;
> > > > > +    iommu->dev = v;
> > > > > +
> > > > > +    ret = memory_region_register_iommu_notifier(section->mr, &iommu->n, NULL);
> > > > > +    if (ret) {
> > > > > +        g_free(iommu);
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    QLIST_INSERT_HEAD(&v->iommu_list, iommu, iommu_next);
> > > > > +    memory_region_iommu_replay(iommu->iommu_mr, &iommu->n);
> > > > > +
> > > > > +    return;
> > > > > +}
> > > > > +
> > > > > +static void vhost_vdpa_iommu_region_del(MemoryListener *listener,
> > > > > +                                        MemoryRegionSection *section)
> > > > > +{
> > > > > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > > > > +
> > > > > +    struct vdpa_iommu *iommu;
> > > > > +
> > > > > +    QLIST_FOREACH(iommu, &v->iommu_list, iommu_next)
> > > > > +    {
> > > > > +        if (MEMORY_REGION(iommu->iommu_mr) == section->mr &&
> > > > > +            iommu->n.start == section->offset_within_region) {
> > > > > +            memory_region_unregister_iommu_notifier(section->mr, &iommu->n);
> > > > > +            QLIST_REMOVE(iommu, iommu_next);
> > > > > +            g_free(iommu);
> > > > > +            break;
> > > > > +        }
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >  static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > >                                             MemoryRegionSection *section)
> > > > >  {
> > > > > @@ -199,6 +325,10 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > >                                              v->iova_range.last)) {
> > > > >          return;
> > > > >      }
> > > > > +    if (memory_region_is_iommu(section->mr)) {
> > > > > +        vhost_vdpa_iommu_region_add(listener, section);
> > > > > +        return;
> > > > > +    }
> > > > >
> > > > >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > > > >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > > > > @@ -278,6 +408,9 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> > > > >                                              v->iova_range.last)) {
> > > > >          return;
> > > > >      }
> > > > > +    if (memory_region_is_iommu(section->mr)) {
> > > > > +        vhost_vdpa_iommu_region_del(listener, section);
> > > > > +    }
> > > > >
> > > > >      if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > > > >                   (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > > > > @@ -1182,7 +1315,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > > > >      }
> > > > >
> > > > >      if (started) {
> > > > > -        memory_listener_register(&v->listener, &address_space_memory);
> > > > > +        if (vhost_dev_has_iommu(dev) && (v->shadow_vqs_enabled)) {
> > > > > +            error_report("SVQ can not work while IOMMU enable, please disable"
> > > > > +                         "IOMMU and try again");
> > > > > +            return -1;
> > > > > +        }
> > > > > +        memory_listener_register(&v->listener, dev->vdev->dma_as);
> > > > > +
> > > > >          return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > > > >      }
> > > > >
> > > > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > > > > index c278a2a8de..e64bfc7f98 100644
> > > > > --- a/include/hw/virtio/vhost-vdpa.h
> > > > > +++ b/include/hw/virtio/vhost-vdpa.h
> > > > > @@ -52,6 +52,8 @@ typedef struct vhost_vdpa {
> > > > >      struct vhost_dev *dev;
> > > > >      Error *migration_blocker;
> > > > >      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > > > +    QLIST_HEAD(, vdpa_iommu) iommu_list;
> > > > > +    IOMMUNotifier n;
> > > > >  } VhostVDPA;
> > > > >
> > > > >  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range *iova_range);
> > > > > @@ -61,4 +63,13 @@ int vhost_vdpa_dma_map(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > > >  int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, uint32_t asid, hwaddr iova,
> > > > >                           hwaddr size);
> > > > >
> > > > > +typedef struct vdpa_iommu {
> > > > > +    struct vhost_vdpa *dev;
> > > > > +    IOMMUMemoryRegion *iommu_mr;
> > > > > +    hwaddr iommu_offset;
> > > > > +    IOMMUNotifier n;
> > > > > +    QLIST_ENTRY(vdpa_iommu) iommu_next;
> > > > > +} VDPAIOMMUState;
> > > > > +
> > > > > +
> > > > >  #endif
> > > > > --
> > > > > 2.34.3
> > > > >
> > > >
> > >
> >
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-03-24 15:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-21 14:23 [PATCH v15 0/4] vhost-vdpa: add support for vIOMMU Cindy Lu
2023-03-21 14:23 ` [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu() Cindy Lu
2023-03-23  3:48   ` Jason Wang
2023-03-21 14:23 ` [PATCH v15 2/4] vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del() Cindy Lu
2023-03-21 14:23 ` [PATCH v15 3/4] vhost-vdpa: Add check for full 64-bit in region delete Cindy Lu
2023-03-21 14:23 ` [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU Cindy Lu
2023-03-23  3:47   ` Jason Wang
2023-03-23  8:40     ` Cindy Lu
2023-03-24  2:49       ` Jason Wang
2023-03-24  2:59         ` Cindy Lu
2023-03-24  3:45           ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.