All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
@ 2023-09-04  8:03 Eric Auger
  2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

On x86, when assigning VFIO-PCI devices protected with virtio-iommu
we encounter the case where the guest tries to map IOVAs beyond 48b
whereas the physical VTD IOMMU only supports 48b. This ends up with
VFIO_MAP_DMA failures at qemu level because at kernel level,
vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().

This is due to the fact the virtio-iommu currently unconditionally
exposes an IOVA range of 64b through its config input range fields.

This series removes this assumption by retrieving the usable IOVA
regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
a VFIO device is attached. This info is communicated to the
virtio-iommu memory region, transformed into the inversed info, ie.
the host reserved IOVA regions. Then those latter are combined with the
reserved IOVA regions set though the virtio-iommu reserved-regions
property. That way, the guest virtio-iommu driver, unchanged, is
able to probe the whole set of reserved regions and prevent any IOVA
belonging to those ranges from beeing used, achieving the original goal.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1

Eric Auger (13):
  memory: Let ReservedRegion use Range
  memory: Introduce memory_region_iommu_set_iova_ranges
  vfio: Collect container iova range info
  virtio-iommu: Rename reserved_regions into prop_resv_regions
  virtio-iommu: Introduce per IOMMUDevice reserved regions
  range: Introduce range_inverse_array()
  virtio-iommu: Implement set_iova_ranges() callback
  range: Make range_compare() public
  util/reserved-region: Add new ReservedRegion helpers
  virtio-iommu: Consolidate host reserved regions and property set ones
  test: Add some tests for range and resv-mem helpers
  virtio-iommu: Resize memory region according to the max iova info
  vfio: Remove 64-bit IOVA address space assumption

 include/exec/memory.h            |  30 ++++-
 include/hw/vfio/vfio-common.h    |   2 +
 include/hw/virtio/virtio-iommu.h |   7 +-
 include/qemu/range.h             |   9 ++
 include/qemu/reserved-region.h   |  32 +++++
 hw/core/qdev-properties-system.c |   9 +-
 hw/vfio/common.c                 |  70 ++++++++---
 hw/virtio/virtio-iommu-pci.c     |   8 +-
 hw/virtio/virtio-iommu.c         |  85 +++++++++++--
 softmmu/memory.c                 |  15 +++
 tests/unit/test-resv-mem.c       | 198 +++++++++++++++++++++++++++++++
 util/range.c                     |  41 ++++++-
 util/reserved-region.c           |  94 +++++++++++++++
 hw/virtio/trace-events           |   1 +
 tests/unit/meson.build           |   1 +
 util/meson.build                 |   1 +
 16 files changed, 562 insertions(+), 41 deletions(-)
 create mode 100644 include/qemu/reserved-region.h
 create mode 100644 tests/unit/test-resv-mem.c
 create mode 100644 util/reserved-region.c

-- 
2.41.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/13] memory: Let ReservedRegion use Range
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:12   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2023-09-04  8:03 ` [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

A reserved region is a range tagged with a type. Let's directly use
the Range type in the prospect to reuse some of the library helpers
shipped with the Range type.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/exec/memory.h            | 4 ++--
 hw/core/qdev-properties-system.c | 9 ++++++---
 hw/virtio/virtio-iommu.c         | 6 +++---
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 68284428f8..184cb3a01b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -24,6 +24,7 @@
 #include "qemu/bswap.h"
 #include "qemu/queue.h"
 #include "qemu/int128.h"
+#include "qemu/range.h"
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
@@ -79,8 +80,7 @@ extern unsigned int global_dirty_tracking;
 typedef struct MemoryRegionOps MemoryRegionOps;
 
 struct ReservedRegion {
-    hwaddr low;
-    hwaddr high;
+    Range range;
     unsigned type;
 };
 
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 6d5d43eda2..8d486eeefd 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -699,7 +699,7 @@ static void get_reserved_region(Object *obj, Visitor *v, const char *name,
     int rc;
 
     rc = snprintf(buffer, sizeof(buffer), "0x%"PRIx64":0x%"PRIx64":%u",
-                  rr->low, rr->high, rr->type);
+                  range_lob(&rr->range), range_upb(&rr->range), rr->type);
     assert(rc < sizeof(buffer));
 
     visit_type_str(v, name, &p, errp);
@@ -711,6 +711,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
     Property *prop = opaque;
     ReservedRegion *rr = object_field_prop_ptr(obj, prop);
     const char *endptr;
+    uint64_t lob, upb;
     char *str;
     int ret;
 
@@ -718,7 +719,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
         return;
     }
 
-    ret = qemu_strtou64(str, &endptr, 16, &rr->low);
+    ret = qemu_strtou64(str, &endptr, 16, &lob);
     if (ret) {
         error_setg(errp, "start address of '%s'"
                    " must be a hexadecimal integer", name);
@@ -728,7 +729,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
         goto separator_error;
     }
 
-    ret = qemu_strtou64(endptr + 1, &endptr, 16, &rr->high);
+    ret = qemu_strtou64(endptr + 1, &endptr, 16, &upb);
     if (ret) {
         error_setg(errp, "end address of '%s'"
                    " must be a hexadecimal integer", name);
@@ -738,6 +739,8 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
         goto separator_error;
     }
 
+    range_set_bounds(&rr->range, lob, upb);
+
     ret = qemu_strtoui(endptr + 1, &endptr, 10, &rr->type);
     if (ret) {
         error_setg(errp, "type of '%s'"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index be51635895..e5e46e1b55 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -645,8 +645,8 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
         prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
         prop.head.length = cpu_to_le16(length);
         prop.subtype = subtype;
-        prop.start = cpu_to_le64(s->reserved_regions[i].low);
-        prop.end = cpu_to_le64(s->reserved_regions[i].high);
+        prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
+        prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
 
         memcpy(buf, &prop, size);
 
@@ -897,7 +897,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     for (i = 0; i < s->nb_reserved_regions; i++) {
         ReservedRegion *reg = &s->reserved_regions[i];
 
-        if (addr >= reg->low && addr <= reg->high) {
+        if (range_contains(&reg->range, addr)) {
             switch (reg->type) {
             case VIRTIO_IOMMU_RESV_MEM_T_MSI:
                 entry.perm = flag;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
  2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-11 18:41   ` Peter Xu
  2023-09-14 13:38   ` David Hildenbrand
  2023-09-04  8:03 ` [PATCH 03/13] vfio: Collect container iova range info Eric Auger
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

This helper will allow to convey information about valid
IOVA ranges to virtual IOMMUS.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/exec/memory.h | 26 ++++++++++++++++++++++++++
 softmmu/memory.c      | 15 +++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 184cb3a01b..f6fb99dd3f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -519,6 +519,27 @@ struct IOMMUMemoryRegionClass {
      int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
                                      uint64_t page_size_mask,
                                      Error **errp);
+    /**
+     * @iommu_set_iova_ranges:
+     *
+     * Propagate information about the usable IOVA ranges for a given IOMMU
+     * memory region. Used for example to propagate host physical device
+     * reserved memory region constraints to the virtual IOMMU.
+     *
+     * Optional method: if this method is not provided, then the default IOVA
+     * aperture is used.
+     *
+     * @nr_ranges: number of IOVA ranges
+     *
+     * @iova_ranges: an array of @nr_ranges usable IOVA ranges
+     *
+     * Returns 0 on success, or a negative error. In case of failure, the error
+     * object must be created.
+     */
+     int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
+                                  uint32_t nr_ranges,
+                                  struct Range *iova_ranges,
+                                  Error **errp);
 };
 
 typedef struct RamDiscardListener RamDiscardListener;
@@ -1845,6 +1866,11 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
                                            uint64_t page_size_mask,
                                            Error **errp);
 
+int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu,
+                                        uint32_t nr_ranges,
+                                        struct Range *iova_ranges,
+                                        Error **errp);
+
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..07499457aa 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1905,6 +1905,21 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
     return ret;
 }
 
+int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu_mr,
+                                        uint32_t nr_ranges,
+                                        struct Range *iova_ranges,
+                                        Error **errp)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+    int ret = 0;
+
+    if (imrc->iommu_set_iova_ranges) {
+        ret = imrc->iommu_set_iova_ranges(iommu_mr, nr_ranges,
+                                          iova_ranges, errp);
+    }
+    return ret;
+}
+
 int memory_region_register_iommu_notifier(MemoryRegion *mr,
                                           IOMMUNotifier *n, Error **errp)
 {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/13] vfio: Collect container iova range info
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
  2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
  2023-09-04  8:03 ` [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
capability is supported.

This allows to propagate the information though the IOMMU MR
set_iova_ranges() callback so that virtual IOMMUs
get aware of those aperture constraints.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/common.c              | 45 +++++++++++++++++++++++++++++++++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index da43d27352..74b9b27270 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -98,6 +98,8 @@ typedef struct VFIOContainer {
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     QLIST_ENTRY(VFIOContainer) next;
+    unsigned nr_iovas;
+    struct  vfio_iova_range *iova_ranges;
 } VFIOContainer;
 
 typedef struct VFIOGuestIOMMU {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9aac21abb7..26da38de05 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1157,6 +1157,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto fail;
         }
 
+        ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
+                container->nr_iovas, (struct Range *)container->iova_ranges,
+                &err);
+        if (ret) {
+            g_free(giommu);
+            goto fail;
+        }
+
         ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
                                                     &err);
         if (ret) {
@@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
     return true;
 }
 
+static void vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
+                                     unsigned int *nr_iovas,
+                                     struct  vfio_iova_range **iova_ranges)
+{
+    struct vfio_info_cap_header *hdr;
+    struct vfio_iommu_type1_info_cap_iova_range *cap;
+
+    hdr = vfio_get_iommu_type1_info_cap(info,
+                                        VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
+    if (hdr == NULL) {
+        return;
+    }
+
+    cap = (void *)hdr;
+    *nr_iovas = cap->nr_iovas;
+
+    if (*nr_iovas == 0) {
+        return;
+    }
+    *iova_ranges = g_memdup2(cap->iova_ranges,
+                             *nr_iovas * sizeof(struct  vfio_iova_range));
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
                                           struct vfio_region_info *info)
 {
@@ -2433,6 +2464,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
     }
 }
 
+static void vfio_free_container(VFIOContainer *container)
+{
+    g_free(container->iova_ranges);
+    g_free(container);
+}
+
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                                   Error **errp)
 {
@@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
             container->dma_max_mappings = 65535;
         }
+
+        vfio_get_info_iova_range(info, &container->nr_iovas,
+                                 &container->iova_ranges);
+
         vfio_get_iommu_info_migration(container, info);
         g_free(info);
 
@@ -2663,7 +2704,7 @@ enable_discards_exit:
     vfio_ram_block_discard_disable(container, false);
 
 free_container_exit:
-    g_free(container);
+    vfio_free_container(container);
 
 close_fd_exit:
     close(fd);
@@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
         trace_vfio_disconnect_container(container->fd);
         close(container->fd);
-        g_free(container);
+        vfio_free_container(container);
 
         vfio_put_address_space(space);
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (2 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 03/13] vfio: Collect container iova range info Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 05/13] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Rename VirtIOIOMMU (nb_)reserved_regions fields with the "prop_" prefix
to highlight those fields are set through a property, at machine level.
They are IOMMU wide.

A subsequent patch will introduce per IOMMUDevice reserved regions
that will include both those IOMMU wide property reserved
regions plus, sometimes, host reserved regions, if the device is
backed by a host device protected by a physical IOMMU. Also change
nb_ prefix by nr_.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/hw/virtio/virtio-iommu.h |  4 ++--
 hw/virtio/virtio-iommu-pci.c     |  8 ++++----
 hw/virtio/virtio-iommu.c         | 15 ++++++++-------
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index a93fc5383e..eea4564782 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -55,8 +55,8 @@ struct VirtIOIOMMU {
     GHashTable *as_by_busptr;
     IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
     PCIBus *primary_bus;
-    ReservedRegion *reserved_regions;
-    uint32_t nb_reserved_regions;
+    ReservedRegion *prop_resv_regions;
+    uint32_t nr_prop_resv_regions;
     GTree *domains;
     QemuRecMutex mutex;
     GTree *endpoints;
diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
index 7ef2f9dcdb..9459fbf6ed 100644
--- a/hw/virtio/virtio-iommu-pci.c
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -37,7 +37,7 @@ struct VirtIOIOMMUPCI {
 static Property virtio_iommu_pci_properties[] = {
     DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
     DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
-                      vdev.nb_reserved_regions, vdev.reserved_regions,
+                      vdev.nr_prop_resv_regions, vdev.prop_resv_regions,
                       qdev_prop_reserved_region, ReservedRegion),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -54,9 +54,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
                          "for the virtio-iommu-pci device");
         return;
     }
-    for (int i = 0; i < s->nb_reserved_regions; i++) {
-        if (s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
-            s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
+    for (int i = 0; i < s->nr_prop_resv_regions; i++) {
+        if (s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+            s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
             error_setg(errp, "reserved region %d has an invalid type", i);
             error_append_hint(errp, "Valid values are 0 and 1\n");
             return;
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index e5e46e1b55..979cdb5648 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -631,22 +631,23 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
     size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
     int i;
 
-    total = size * s->nb_reserved_regions;
+    total = size * s->nr_prop_resv_regions;
 
     if (total > free) {
         return -ENOSPC;
     }
 
-    for (i = 0; i < s->nb_reserved_regions; i++) {
-        unsigned subtype = s->reserved_regions[i].type;
+    for (i = 0; i < s->nr_prop_resv_regions; i++) {
+        unsigned subtype = s->prop_resv_regions[i].type;
+        Range *range = &s->prop_resv_regions[i].range;
 
         assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
                subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
         prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
         prop.head.length = cpu_to_le16(length);
         prop.subtype = subtype;
-        prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
-        prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
+        prop.start = cpu_to_le64(range_lob(range));
+        prop.end = cpu_to_le64(range_upb(range));
 
         memcpy(buf, &prop, size);
 
@@ -894,8 +895,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto unlock;
     }
 
-    for (i = 0; i < s->nb_reserved_regions; i++) {
-        ReservedRegion *reg = &s->reserved_regions[i];
+    for (i = 0; i < s->nr_prop_resv_regions; i++) {
+        ReservedRegion *reg = &s->prop_resv_regions[i];
 
         if (range_contains(&reg->range, addr)) {
             switch (reg->type) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/13] virtio-iommu: Introduce per IOMMUDevice reserved regions
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (3 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 06/13] range: Introduce range_inverse_array() Eric Auger
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

For the time being the per device reserved regions are
just a duplicate of IOMMU wide reserved regions. Subsequent
patches will combine those with host reserved regions, if any.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/hw/virtio/virtio-iommu.h |  1 +
 hw/virtio/virtio-iommu.c         | 42 ++++++++++++++++++++++++++------
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index eea4564782..70b8ace34d 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -39,6 +39,7 @@ typedef struct IOMMUDevice {
     AddressSpace  as;
     MemoryRegion root;          /* The root container of the device */
     MemoryRegion bypass_mr;     /* The alias of shared memory MR */
+    GList *resv_regions;
 } IOMMUDevice;
 
 typedef struct IOMMUPciBus {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 979cdb5648..ea359b586a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -624,22 +624,48 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     return ret;
 }
 
+static int consolidate_resv_regions(IOMMUDevice *sdev)
+{
+    VirtIOIOMMU *s = sdev->viommu;
+    int i;
+
+    for (i = 0; i < s->nr_prop_resv_regions; i++) {
+        ReservedRegion *reg = g_new0(ReservedRegion, 1);
+
+        *reg = s->prop_resv_regions[i];
+        sdev->resv_regions = g_list_append(sdev->resv_regions, reg);
+    }
+    return 0;
+}
+
 static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
                                                uint8_t *buf, size_t free)
 {
     struct virtio_iommu_probe_resv_mem prop = {};
     size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
-    int i;
+    IOMMUDevice *sdev;
+    GList *l;
+    int ret;
 
-    total = size * s->nr_prop_resv_regions;
+    sdev = container_of(virtio_iommu_mr(s, ep), IOMMUDevice, iommu_mr);
+    if (!sdev) {
+        return -EINVAL;
+    }
 
+    ret = consolidate_resv_regions(sdev);
+    if (ret) {
+        return ret;
+    }
+
+    total = size * g_list_length(sdev->resv_regions);
     if (total > free) {
         return -ENOSPC;
     }
 
-    for (i = 0; i < s->nr_prop_resv_regions; i++) {
-        unsigned subtype = s->prop_resv_regions[i].type;
-        Range *range = &s->prop_resv_regions[i].range;
+    for (l = sdev->resv_regions; l; l = l->next) {
+        ReservedRegion *reg = l->data;
+        unsigned subtype = reg->type;
+        Range *range = &reg->range;
 
         assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
                subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
@@ -857,7 +883,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     bool bypass_allowed;
     int granule;
     bool found;
-    int i;
+    GList *l;
 
     interval.low = addr;
     interval.high = addr + 1;
@@ -895,8 +921,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto unlock;
     }
 
-    for (i = 0; i < s->nr_prop_resv_regions; i++) {
-        ReservedRegion *reg = &s->prop_resv_regions[i];
+    for (l = sdev->resv_regions; l; l = l->next) {
+        ReservedRegion *reg = l->data;
 
         if (range_contains(&reg->range, addr)) {
             switch (reg->type) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/13] range: Introduce range_inverse_array()
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (4 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 05/13] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:18   ` Philippe Mathieu-Daudé
  2023-09-04  8:03 ` [PATCH 07/13] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

This helper reverses an array of regions, turning original
regions into holes and original holes into actual regions,
covering the whole UINT64_MAX span.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/qemu/range.h |  3 +++
 util/range.c         | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index 7e2b1cc447..fc1d3dabe6 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -219,4 +219,7 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
 
 GList *range_list_insert(GList *list, Range *data);
 
+void range_inverse_array(uint32_t nr_ranges, Range *ranges,
+                         uint32_t *nr_inv_ranges, Range **inv_ranges);
+
 #endif
diff --git a/util/range.c b/util/range.c
index 098d9d2dc0..11c4ff0b78 100644
--- a/util/range.c
+++ b/util/range.c
@@ -70,3 +70,38 @@ GList *range_list_insert(GList *list, Range *data)
 
     return list;
 }
+
+/*
+ * Inverse an array of sorted ranges over the UINT64_MAX span, ie.
+ * original ranges becomes holes in the newly allocated inv_ranges
+ */
+void range_inverse_array(uint32_t nr_ranges, Range *ranges,
+                         uint32_t *nr_inv_ranges, Range **inv_ranges)
+{
+    Range *resv;
+    int i = 0, j = 0;
+
+    resv = g_malloc0_n(nr_ranges + 1, sizeof(Range));
+
+    /* first range lob is greater than 0, insert a first range */
+    if (range_lob(&ranges[0]) > 0) {
+        range_set_bounds(&resv[i++], 0,
+                         range_lob(&ranges[0]) - 1);
+    }
+
+    /* insert a range inbetween each original range */
+    for (; j < nr_ranges - 1; j++) {
+        if (range_compare(&ranges[j], &ranges[j + 1])) {
+            range_set_bounds(&resv[i++], range_upb(&ranges[j]) + 1,
+                             range_lob(&ranges[j + 1]) - 1);
+        }
+    }
+    /* last range upb is less than UINT64_MAX, insert a last range */
+    if (range_upb(&ranges[nr_ranges - 1]) <  UINT64_MAX) {
+        range_set_bounds(&resv[i++],
+                          range_upb(&ranges[nr_ranges - 1]) + 1, UINT64_MAX);
+    }
+    *nr_inv_ranges = i;
+    resv = g_realloc(resv, i * sizeof(Range));
+    *inv_ranges = resv;
+}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/13] virtio-iommu: Implement set_iova_ranges() callback
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (5 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 06/13] range: Introduce range_inverse_array() Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 08/13] range: Make range_compare() public Eric Auger
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

The implementation populates the array of per IOMMUDevice
host reserved regions.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/hw/virtio/virtio-iommu.h |  2 ++
 hw/virtio/virtio-iommu.c         | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 70b8ace34d..31b69c8261 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -40,6 +40,8 @@ typedef struct IOMMUDevice {
     MemoryRegion root;          /* The root container of the device */
     MemoryRegion bypass_mr;     /* The alias of shared memory MR */
     GList *resv_regions;
+    Range *host_resv_regions;
+    uint32_t nr_host_resv_regions;
 } IOMMUDevice;
 
 typedef struct IOMMUPciBus {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ea359b586a..02f1a59d57 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "qemu/iov.h"
+#include "qemu/range.h"
 #include "exec/target_page.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio.h"
@@ -1158,6 +1159,21 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
     return 0;
 }
 
+static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
+                                        uint32_t nr_ranges,
+                                        struct Range *iova_ranges,
+                                        Error **errp)
+{
+    IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    assert(nr_ranges);
+
+
+    range_inverse_array(nr_ranges, iova_ranges,
+                        &sdev->nr_host_resv_regions, &sdev->host_resv_regions);
+
+    return 0;
+}
+
 static void virtio_iommu_system_reset(void *opaque)
 {
     VirtIOIOMMU *s = opaque;
@@ -1453,6 +1469,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
     imrc->replay = virtio_iommu_replay;
     imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
     imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask;
+    imrc->iommu_set_iova_ranges = virtio_iommu_set_iova_ranges;
 }
 
 static const TypeInfo virtio_iommu_info = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/13] range: Make range_compare() public
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (6 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 07/13] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:19   ` Philippe Mathieu-Daudé
  2023-09-04  8:03 ` [PATCH 09/13] util/reserved-region: Add new ReservedRegion helpers Eric Auger
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Let's expose range_compare() in the header so that it can be
reused outside of util/range.c

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/qemu/range.h | 6 ++++++
 util/range.c         | 6 +-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index fc1d3dabe6..52641dfb79 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -217,6 +217,12 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Return -1 if @a < @b, 1 @a > @b, and 0 if they touch or overlap.
+ * Both @a and @b must not be empty.
+ */
+int range_compare(Range *a, Range *b);
+
 GList *range_list_insert(GList *list, Range *data);
 
 void range_inverse_array(uint32_t nr_ranges, Range *ranges,
diff --git a/util/range.c b/util/range.c
index 11c4ff0b78..1fc399e88f 100644
--- a/util/range.c
+++ b/util/range.c
@@ -20,11 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/range.h"
 
-/*
- * Return -1 if @a < @b, 1 @a > @b, and 0 if they touch or overlap.
- * Both @a and @b must not be empty.
- */
-static inline int range_compare(Range *a, Range *b)
+int range_compare(Range *a, Range *b)
 {
     assert(!range_is_empty(a) && !range_is_empty(b));
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/13] util/reserved-region: Add new ReservedRegion helpers
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (7 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 08/13] range: Make range_compare() public Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 10/13] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Introduce resv_region_list_insert() helper which inserts
a new ReservedRegion into a sorted list of reserved region.
In case of overlap, the new region has higher priority and
hides the existing overlapped segments. If the overlap is
partial, new regions are created for parts which are not
overlapped. The new region has higher priority independently
on the type of the regions.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/qemu/reserved-region.h | 32 ++++++++++++
 util/reserved-region.c         | 94 ++++++++++++++++++++++++++++++++++
 util/meson.build               |  1 +
 3 files changed, 127 insertions(+)
 create mode 100644 include/qemu/reserved-region.h
 create mode 100644 util/reserved-region.c

diff --git a/include/qemu/reserved-region.h b/include/qemu/reserved-region.h
new file mode 100644
index 0000000000..8e6f0a97e2
--- /dev/null
+++ b/include/qemu/reserved-region.h
@@ -0,0 +1,32 @@
+/*
+ * QEMU ReservedRegion helpers
+ *
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_RESERVED_REGION_H
+#define QEMU_RESERVED_REGION_H
+
+#include "exec/memory.h"
+
+/*
+ * Insert a new region into a sorted list of reserved regions. In case
+ * there is overlap with existing regions, the new added region has
+ * higher priority and replaces the overlapped segment.
+ */
+GList *resv_region_list_insert(GList *list, ReservedRegion *reg);
+
+#endif
diff --git a/util/reserved-region.c b/util/reserved-region.c
new file mode 100644
index 0000000000..bb26a6bed3
--- /dev/null
+++ b/util/reserved-region.c
@@ -0,0 +1,94 @@
+/*
+ * QEMU ReservedRegion helpers
+ *
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/range.h"
+#include "qemu/reserved-region.h"
+
+GList *resv_region_list_insert(GList *list, ReservedRegion *reg)
+{
+    ReservedRegion *resv_iter, *new_reg;
+    Range *r = &reg->range;
+    Range *range_iter;
+    GList *l;
+
+    for (l = list; l ; ) {
+        resv_iter = (ReservedRegion *)l->data;
+        range_iter = &resv_iter->range;
+
+        /* Skip all list elements strictly less than range to add */
+        if (range_compare(range_iter, r) < 0) {
+            l = l->next;
+        } else if (range_compare(range_iter, r) > 0) {
+            return g_list_insert_before(list, l, reg);
+        } else { /* there is an operlap */
+            if (range_contains_range(r, range_iter)) {
+                /* new range contains current item, simply remove this latter */
+                GList *prev = l->prev;
+                g_free(l->data);
+                list = g_list_delete_link(list, l);
+                if (prev) {
+                    l = prev;
+                    if (l) {
+                        l = l->next;
+                    }
+                } else {
+                    l = list;
+                }
+            } else if (range_contains_range(range_iter, r)) {
+                /* new region is included in the current region */
+                if (range_lob(range_iter) == range_lob(r)) {
+                    /* adjacent on the left side, derives into 2 regions */
+                    range_set_bounds(range_iter, range_upb(r) + 1,
+                                     range_upb(range_iter));
+                    return g_list_insert_before(list, l, reg);
+                } else if (range_upb(range_iter) == range_upb(r)) {
+                    /* adjacent on the right side, derives into 2 regions */
+                    range_set_bounds(range_iter, range_lob(range_iter),
+                                     range_lob(r) - 1);
+                    l = l->next;
+                } else {
+                    uint64_t lob = range_lob(range_iter);
+                    /*
+                     * the new range is in the middle of an existing one,
+                     * split this latter into 3 regs instead
+                     */
+                    range_set_bounds(range_iter, range_upb(r) + 1,
+                                     range_upb(range_iter));
+                    new_reg = g_new0(ReservedRegion, 1);
+                    new_reg->type = resv_iter->type;
+                    range_set_bounds(&new_reg->range,
+                                     lob, range_lob(r) - 1);
+                    list = g_list_insert_before(list, l, new_reg);
+                    return g_list_insert_before(list, l, reg);
+                }
+            } else if (range_lob(r) < range_lob(range_iter)) {
+                range_set_bounds(range_iter, range_upb(r) + 1,
+                                 range_upb(range_iter));
+                return g_list_insert_before(list, l, reg);
+            } else { /* intersection on the upper range */
+                range_set_bounds(range_iter, range_lob(range_iter),
+                                 range_lob(r) - 1);
+                l = l->next;
+            }
+        } /* overlap */
+    }
+    return g_list_append(list, reg);
+}
+
diff --git a/util/meson.build b/util/meson.build
index a375160286..7e85889606 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -52,6 +52,7 @@ util_ss.add(files('qdist.c'))
 util_ss.add(files('qht.c'))
 util_ss.add(files('qsp.c'))
 util_ss.add(files('range.c'))
+util_ss.add(files('reserved-region.c'))
 util_ss.add(files('stats64.c'))
 util_ss.add(files('systemd.c'))
 util_ss.add(files('transactions.c'))
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/13] virtio-iommu: Consolidate host reserved regions and property set ones
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (8 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 09/13] util/reserved-region: Add new ReservedRegion helpers Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 11/13] test: Add some tests for range and resv-mem helpers Eric Auger
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Up to now we were exposing to the RESV_MEM probe requests the
reserved memory regions set though the reserved-regions array property.

Combine those with the host reserved memory regions if any. Those
latter are tagged as RESERVED. We don't have more information about
them besides then cannot be mapped. Reserved regions set by
property have higher priority.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu.c | 17 ++++++++++++++++-
 hw/virtio/trace-events   |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 02f1a59d57..d260235078 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -21,6 +21,7 @@
 #include "qemu/log.h"
 #include "qemu/iov.h"
 #include "qemu/range.h"
+#include "qemu/reserved-region.h"
 #include "exec/target_page.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio.h"
@@ -630,11 +631,25 @@ static int consolidate_resv_regions(IOMMUDevice *sdev)
     VirtIOIOMMU *s = sdev->viommu;
     int i;
 
+    /* First add host reserved regions if any, all tagged as RESERVED */
+    for (i = 0; i < sdev->nr_host_resv_regions; i++) {
+        ReservedRegion *reg = g_new0(ReservedRegion, 1);
+        reg->range = sdev->host_resv_regions[i];
+        reg->type = VIRTIO_IOMMU_RESV_MEM_T_RESERVED;
+        sdev->resv_regions = resv_region_list_insert(sdev->resv_regions, reg);
+        trace_virtio_iommu_host_resv_regions(sdev->iommu_mr.parent_obj.name, i,
+                                             range_lob(&reg->range),
+                                             range_upb(&reg->range));
+    }
+    /*
+     * then add higher priority reserved regions set through properties by the
+     * machine
+     */
     for (i = 0; i < s->nr_prop_resv_regions; i++) {
         ReservedRegion *reg = g_new0(ReservedRegion, 1);
 
         *reg = s->prop_resv_regions[i];
-        sdev->resv_regions = g_list_append(sdev->resv_regions, reg);
+        sdev->resv_regions = resv_region_list_insert(sdev->resv_regions, reg);
     }
     return 0;
 }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 7109cf1a3b..796574b0f3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -134,6 +134,7 @@ virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
 virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
 virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
 virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to 0x%"PRIx64
+virtio_iommu_host_resv_regions(const char *name, uint32_t index, uint64_t lob, uint64_t upb) "mr=%s host-resv-reg[%d] = [0x%"PRIx64",0x%"PRIx64"]"
 
 # virtio-mem.c
 virtio_mem_send_response(uint16_t type) "type=%" PRIu16
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/13] test: Add some tests for range and resv-mem helpers
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (9 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 10/13] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:03 ` [PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info Eric Auger
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Add unit tests for both resv_region_list_insert() and
range_inverse_array().

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 tests/unit/test-resv-mem.c | 198 +++++++++++++++++++++++++++++++++++++
 tests/unit/meson.build     |   1 +
 2 files changed, 199 insertions(+)
 create mode 100644 tests/unit/test-resv-mem.c

diff --git a/tests/unit/test-resv-mem.c b/tests/unit/test-resv-mem.c
new file mode 100644
index 0000000000..35c9828375
--- /dev/null
+++ b/tests/unit/test-resv-mem.c
@@ -0,0 +1,198 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * reserved-region/range.c unit-tests.
+ *
+ * Copyright (C) 2023, Red Hat, Inc.
+ *
+ * Author: Eric Auger <eric.auger@redhat.com>
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/range.h"
+#include "exec/memory.h"
+#include "qemu/reserved-region.h"
+
+#define DEBUG 0
+
+static void print_rev_array(const char *prefix, uint32_t nr_rev,
+                            Range *rev, uint32_t expected_nr_rev)
+{
+    g_assert_cmpint(nr_rev, ==, expected_nr_rev);
+#if DEBUG
+    printf("%s nr_rev=%d\n", prefix, nr_rev);
+    for (int i = 0; i < nr_rev; i++) {
+        printf("%s rev[%i] = [0x%"PRIx64",0x%"PRIx64"]\n",
+               prefix, i, range_lob(&rev[i]), range_upb(&rev[i]));
+    }
+#endif
+}
+
+static void check_range_reverse_array(void)
+{
+    Range r[10];
+    Range *rev;
+    uint32_t nr_rev;
+
+    range_set_bounds(&r[0], 0x10000, UINT64_MAX);
+    range_inverse_array(1, r, &nr_rev, &rev);
+    print_rev_array("test1", nr_rev, rev, 1);
+    g_free(rev);
+
+    range_set_bounds(&r[0], 0x10000, 0xFFFFFFFFFFFF);
+    range_inverse_array(1, r, &nr_rev, &rev);
+    print_rev_array("test2", nr_rev, rev, 2);
+    g_free(rev);
+
+    range_set_bounds(&r[0], 0x0, 0xFFFF);
+    range_set_bounds(&r[1], 0x10000, 0x2FFFF);
+    range_inverse_array(2, r, &nr_rev, &rev);
+    print_rev_array("test3", nr_rev, rev, 1);
+    g_free(rev);
+
+    range_set_bounds(&r[0], 0x50000, 0x5FFFF);
+    range_set_bounds(&r[1], 0x60000, 0xFFFFFFFFFFFF);
+    range_inverse_array(2, r, &nr_rev, &rev);
+    print_rev_array("test4", nr_rev, rev, 2);
+    g_free(rev);
+
+    range_set_bounds(&r[0], 0x0, UINT64_MAX);
+    range_inverse_array(1, r, &nr_rev, &rev);
+    print_rev_array("test5", nr_rev, rev, 0);
+    g_free(rev);
+}
+
+static ReservedRegion *alloc_resv_mem(unsigned type, uint64_t lob, uint64_t upb)
+{
+    ReservedRegion *r;
+
+    r = g_new0(ReservedRegion, 1);
+    r->type = type;
+    range_set_bounds(&r->range, lob, upb);
+    return r;
+}
+
+static void print_resv_region_list(const char *prefix, GList *list,
+                                   uint32_t expected_length)
+{
+    int i = g_list_length(list);
+
+    g_assert_cmpint(i, ==, expected_length);
+#if DEBUG
+    i = 0;
+    for (GList *l = list; l; l = l->next) {
+        ReservedRegion *r = (ReservedRegion *)l->data;
+        Range *range = &r->range;
+
+        printf("%s item[%d]=[0x%x, 0x%"PRIx64", 0x%"PRIx64"]\n",
+               prefix, i++, r->type, range_lob(range), range_upb(range));
+    }
+#endif
+}
+
+static void free_resv_region(gpointer data)
+{
+    ReservedRegion *reg = (ReservedRegion *)data;
+
+    g_free(reg);
+}
+
+static void check_resv_region_list_insert(void)
+{
+    ReservedRegion *r[10];
+    GList *l = NULL;
+
+    r[0] = alloc_resv_mem(0xA, 0, 0xFFFF);
+    r[1] = alloc_resv_mem(0xA, 0x20000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[0]);
+    l = resv_region_list_insert(l, r[1]);
+    print_resv_region_list("test1", l, 2);
+
+    /* adjacent on left */
+    r[2] = alloc_resv_mem(0xB, 0x0, 0xFFF);
+    l = resv_region_list_insert(l, r[2]);
+    /* adjacent on right */
+    r[3] = alloc_resv_mem(0xC, 0x21000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[3]);
+    print_resv_region_list("test2", l, 4);
+
+    /* exact overlap of D into C*/
+    r[4] = alloc_resv_mem(0xD, 0x21000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[4]);
+    print_resv_region_list("test3", l, 4);
+
+    /* in the middle */
+    r[5] = alloc_resv_mem(0xE, 0x22000, 0x23FFF);
+    l = resv_region_list_insert(l, r[5]);
+    print_resv_region_list("test4", l, 6);
+
+    /* overwrites several existing ones */
+    r[6] = alloc_resv_mem(0xF, 0x10000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[6]);
+    print_resv_region_list("test5", l, 3);
+
+    /* contiguous at the end */
+    r[7] = alloc_resv_mem(0x0, 0x30000, 0x40000);
+    l = resv_region_list_insert(l, r[7]);
+    print_resv_region_list("test6", l, 4);
+
+    g_list_free_full(l, free_resv_region);
+    l = NULL;
+
+    r[0] = alloc_resv_mem(0x0, 0x10000, 0x1FFFF);
+    l = resv_region_list_insert(l, r[0]);
+    /* insertion before the 1st item */
+    r[1] = alloc_resv_mem(0x1, 0x0, 0xFF);
+    l = resv_region_list_insert(l, r[1]);
+    print_resv_region_list("test8", l, 2);
+
+    /* collision on the left side */
+    r[2] = alloc_resv_mem(0xA, 0x1200, 0x11FFF);
+    l = resv_region_list_insert(l, r[2]);
+    print_resv_region_list("test9", l, 3);
+
+    /* collision on the right side */
+    r[3] = alloc_resv_mem(0xA, 0x1F000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[3]);
+    print_resv_region_list("test10", l, 4);
+
+    /* override everything */
+    r[4] = alloc_resv_mem(0xF, 0x0, UINT64_MAX);
+    l = resv_region_list_insert(l, r[4]);
+    print_resv_region_list("test11", l, 1);
+
+    g_list_free_full(l, free_resv_region);
+    l = NULL;
+
+    r[0] = alloc_resv_mem(0xF, 0x1000000000000, UINT64_MAX);
+    l = resv_region_list_insert(l, r[0]);
+    print_resv_region_list("test12", l, 1);
+
+    r[1] = alloc_resv_mem(0xA, 0x0, 0xFFFFFFF);
+    l = resv_region_list_insert(l, r[1]);
+    print_resv_region_list("test12", l, 2);
+
+    r[2] = alloc_resv_mem(0xB, 0x100000000, 0x1FFFFFFFF);
+    l = resv_region_list_insert(l, r[2]);
+    print_resv_region_list("test12", l, 3);
+
+    r[3] = alloc_resv_mem(0x0, 0x010000000, 0x2FFFFFFFF);
+    l = resv_region_list_insert(l, r[3]);
+    print_resv_region_list("test12", l, 3);
+
+    g_list_free_full(l, free_resv_region);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/resv-mem/range_reverse_array",
+                    check_range_reverse_array);
+    g_test_add_func("/resv-mem/resv_region_list_insert",
+                    check_resv_region_list_insert);
+
+    g_test_run();
+
+    return 0;
+}
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 93977cc32d..765b1a8dad 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -21,6 +21,7 @@ tests = {
   'test-opts-visitor': [testqapi],
   'test-visitor-serialization': [testqapi],
   'test-bitmap': [],
+  'test-resv-mem': [],
   # all code tested by test-x86-cpuid is inside topology.h
   'test-x86-cpuid': [],
   'test-cutils': [],
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (10 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 11/13] test: Add some tests for range and resv-mem helpers Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-04  8:21   ` Philippe Mathieu-Daudé
  2023-09-04  8:03 ` [PATCH 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

By default the virtio-iommu MR has a 64b span. As we intend to
remove the VFIO assumption of 64b IOVA, let's make sure the MR
is resized according to the actual GAW of the physical IOMMU.
Otherwise we will get a failure on vfio vfio_find_hostwin().

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index d260235078..d877119df1 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1180,8 +1180,14 @@ static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
                                         Error **errp)
 {
     IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    uint64_t max_iova;
+
     assert(nr_ranges);
 
+    max_iova = range_upb(&iova_ranges[nr_ranges - 1]);
+    if (max_iova < UINT64_MAX) {
+        memory_region_set_size(&mr->parent_obj, max_iova + 1);
+    }
 
     range_inverse_array(nr_ranges, iova_ranges,
                         &sdev->nr_host_resv_regions, &sdev->host_resv_regions);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/13] vfio: Remove 64-bit IOVA address space assumption
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (11 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info Eric Auger
@ 2023-09-04  8:03 ` Eric Auger
  2023-09-05  8:22 ` [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
  2023-09-05 17:55 ` Alex Williamson
  14 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04  8:03 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd

Now we retrieve the usable IOVA ranges from the host,
let's remove the assumption of 64b IOVA space when calling
vfio_host_win_add().

Make sure the vfio_find_hostwin() gets called after we get
a chance to call set_iova_regions and resize the IOMMU MR
according to the actual physical GAW.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/vfio/common.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 26da38de05..40cac1ca91 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
 #endif
     }
 
-    hostwin = vfio_find_hostwin(container, iova, end);
-    if (!hostwin) {
-        error_setg(&err, "Container %p can't map guest IOVA region"
-                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
-        goto fail;
-    }
-
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
@@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
+    hostwin = vfio_find_hostwin(container, iova, end);
+    if (!hostwin) {
+        error_setg(&err, "Container %p can't map guest IOVA region"
+                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+        goto fail;
+    }
+
+
     /* Here we assume that memory_region_is_ram(section->mr)==true */
 
     /*
@@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         vfio_get_iommu_info_migration(container, info);
         g_free(info);
 
-        /*
-         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
-         * information to get the actual window extent rather than assume
-         * a 64-bit IOVA address space.
-         */
-        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+        g_assert(container->nr_iovas);
+        vfio_host_win_add(container, 0,
+                          container->iova_ranges[container->nr_iovas - 1].end,
+                          container->pgsizes);
 
         break;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/13] memory: Let ReservedRegion use Range
  2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
@ 2023-09-04  8:12   ` Philippe Mathieu-Daudé
  2023-09-04 14:22     ` Eric Auger
  2023-09-06  7:17   ` David Hildenbrand
  2023-09-11 18:40   ` Peter Xu
  2 siblings, 1 reply; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-04  8:12 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david

On 4/9/23 10:03, Eric Auger wrote:
> A reserved region is a range tagged with a type. Let's directly use
> the Range type in the prospect to reuse some of the library helpers
> shipped with the Range type.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   include/exec/memory.h            | 4 ++--
>   hw/core/qdev-properties-system.c | 9 ++++++---
>   hw/virtio/virtio-iommu.c         | 6 +++---
>   3 files changed, 11 insertions(+), 8 deletions(-)

TIL "qemu/range.h" :)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/13] range: Introduce range_inverse_array()
  2023-09-04  8:03 ` [PATCH 06/13] range: Introduce range_inverse_array() Eric Auger
@ 2023-09-04  8:18   ` Philippe Mathieu-Daudé
  2023-09-04 14:21     ` Eric Auger
  0 siblings, 1 reply; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-04  8:18 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david

On 4/9/23 10:03, Eric Auger wrote:
> This helper reverses an array of regions, turning original
> regions into holes and original holes into actual regions,
> covering the whole UINT64_MAX span.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   include/qemu/range.h |  3 +++
>   util/range.c         | 35 +++++++++++++++++++++++++++++++++++
>   2 files changed, 38 insertions(+)
> 
> diff --git a/include/qemu/range.h b/include/qemu/range.h
> index 7e2b1cc447..fc1d3dabe6 100644
> --- a/include/qemu/range.h
> +++ b/include/qemu/range.h
> @@ -219,4 +219,7 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>   
>   GList *range_list_insert(GList *list, Range *data);
>   
> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
> +                         uint32_t *nr_inv_ranges, Range **inv_ranges);
> +
>   #endif
> diff --git a/util/range.c b/util/range.c
> index 098d9d2dc0..11c4ff0b78 100644
> --- a/util/range.c
> +++ b/util/range.c
> @@ -70,3 +70,38 @@ GList *range_list_insert(GList *list, Range *data)
>   
>       return list;
>   }
> +
> +/*
> + * Inverse an array of sorted ranges over the UINT64_MAX span, ie.
> + * original ranges becomes holes in the newly allocated inv_ranges
> + */

Most of the functions are described in the header; could you move this
description with the declaration?

> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
> +                         uint32_t *nr_inv_ranges, Range **inv_ranges)
> +{
> +    Range *resv;
> +    int i = 0, j = 0;
> +
> +    resv = g_malloc0_n(nr_ranges + 1, sizeof(Range));
> +
> +    /* first range lob is greater than 0, insert a first range */
> +    if (range_lob(&ranges[0]) > 0) {
> +        range_set_bounds(&resv[i++], 0,
> +                         range_lob(&ranges[0]) - 1);
> +    }
> +
> +    /* insert a range inbetween each original range */
> +    for (; j < nr_ranges - 1; j++) {
> +        if (range_compare(&ranges[j], &ranges[j + 1])) {
> +            range_set_bounds(&resv[i++], range_upb(&ranges[j]) + 1,
> +                             range_lob(&ranges[j + 1]) - 1);
> +        }
> +    }
> +    /* last range upb is less than UINT64_MAX, insert a last range */

In order to use this new function with variable range sizes,
can we pass UINT64_MAX as an 'inv_range_upb' argument?

> +    if (range_upb(&ranges[nr_ranges - 1]) <  UINT64_MAX) {
> +        range_set_bounds(&resv[i++],
> +                          range_upb(&ranges[nr_ranges - 1]) + 1, UINT64_MAX);
> +    }
> +    *nr_inv_ranges = i;
> +    resv = g_realloc(resv, i * sizeof(Range));
> +    *inv_ranges = resv;
> +}



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/13] range: Make range_compare() public
  2023-09-04  8:03 ` [PATCH 08/13] range: Make range_compare() public Eric Auger
@ 2023-09-04  8:19   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-04  8:19 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david

On 4/9/23 10:03, Eric Auger wrote:
> Let's expose range_compare() in the header so that it can be
> reused outside of util/range.c
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   include/qemu/range.h | 6 ++++++
>   util/range.c         | 6 +-----
>   2 files changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info
  2023-09-04  8:03 ` [PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info Eric Auger
@ 2023-09-04  8:21   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-04  8:21 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david

On 4/9/23 10:03, Eric Auger wrote:
> By default the virtio-iommu MR has a 64b span. As we intend to
> remove the VFIO assumption of 64b IOVA, let's make sure the MR
> is resized according to the actual GAW of the physical IOMMU.
> Otherwise we will get a failure on vfio vfio_find_hostwin().
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   hw/virtio/virtio-iommu.c | 6 ++++++
>   1 file changed, 6 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/13] range: Introduce range_inverse_array()
  2023-09-04  8:18   ` Philippe Mathieu-Daudé
@ 2023-09-04 14:21     ` Eric Auger
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04 14:21 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david

Hi Philippe,
On 9/4/23 10:18, Philippe Mathieu-Daudé wrote:
> On 4/9/23 10:03, Eric Auger wrote:
>> This helper reverses an array of regions, turning original
>> regions into holes and original holes into actual regions,
>> covering the whole UINT64_MAX span.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   include/qemu/range.h |  3 +++
>>   util/range.c         | 35 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 38 insertions(+)
>>
>> diff --git a/include/qemu/range.h b/include/qemu/range.h
>> index 7e2b1cc447..fc1d3dabe6 100644
>> --- a/include/qemu/range.h
>> +++ b/include/qemu/range.h
>> @@ -219,4 +219,7 @@ static inline int ranges_overlap(uint64_t first1,
>> uint64_t len1,
>>     GList *range_list_insert(GList *list, Range *data);
>>   +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
>> +                         uint32_t *nr_inv_ranges, Range **inv_ranges);
>> +
>>   #endif
>> diff --git a/util/range.c b/util/range.c
>> index 098d9d2dc0..11c4ff0b78 100644
>> --- a/util/range.c
>> +++ b/util/range.c
>> @@ -70,3 +70,38 @@ GList *range_list_insert(GList *list, Range *data)
>>         return list;
>>   }
>> +
>> +/*
>> + * Inverse an array of sorted ranges over the UINT64_MAX span, ie.
>> + * original ranges becomes holes in the newly allocated inv_ranges
>> + */
>
> Most of the functions are described in the header; could you move this
> description with the declaration?
this is the case for all static inline primitives but not for
range_list_insert(), hence that choice. Now I don't have a strong opinion.
>
>> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
>> +                         uint32_t *nr_inv_ranges, Range **inv_ranges)
>> +{
>> +    Range *resv;
>> +    int i = 0, j = 0;
>> +
>> +    resv = g_malloc0_n(nr_ranges + 1, sizeof(Range));
>> +
>> +    /* first range lob is greater than 0, insert a first range */
>> +    if (range_lob(&ranges[0]) > 0) {
>> +        range_set_bounds(&resv[i++], 0,
>> +                         range_lob(&ranges[0]) - 1);
>> +    }
>> +
>> +    /* insert a range inbetween each original range */
>> +    for (; j < nr_ranges - 1; j++) {
>> +        if (range_compare(&ranges[j], &ranges[j + 1])) {
>> +            range_set_bounds(&resv[i++], range_upb(&ranges[j]) + 1,
>> +                             range_lob(&ranges[j + 1]) - 1);
>> +        }
>> +    }
>> +    /* last range upb is less than UINT64_MAX, insert a last range */
>
> In order to use this new function with variable range sizes,
> can we pass UINT64_MAX as an 'inv_range_upb' argument?
Indeed I hesitated to bring enhanced comodity by letting the caller pass
the upper bound and allow values less than UINT64_MAX. But I was afraid
this would complexify the implementation, hence the current choice. I
will have a look & see.

Thanks

Eric
>
>> +    if (range_upb(&ranges[nr_ranges - 1]) <  UINT64_MAX) {
>> +        range_set_bounds(&resv[i++],
>> +                          range_upb(&ranges[nr_ranges - 1]) + 1,
>> UINT64_MAX);
>> +    }
>> +    *nr_inv_ranges = i;
>> +    resv = g_realloc(resv, i * sizeof(Range));
>> +    *inv_ranges = resv;
>> +}
>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/13] memory: Let ReservedRegion use Range
  2023-09-04  8:12   ` Philippe Mathieu-Daudé
@ 2023-09-04 14:22     ` Eric Auger
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-04 14:22 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david



On 9/4/23 10:12, Philippe Mathieu-Daudé wrote:
> On 4/9/23 10:03, Eric Auger wrote:
>> A reserved region is a range tagged with a type. Let's directly use
>> the Range type in the prospect to reuse some of the library helpers
>> shipped with the Range type.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   include/exec/memory.h            | 4 ++--
>>   hw/core/qdev-properties-system.c | 9 ++++++---
>>   hw/virtio/virtio-iommu.c         | 6 +++---
>>   3 files changed, 11 insertions(+), 8 deletions(-)
>
> TIL "qemu/range.h" :)

I did as well and on top of that I learnt "TIL" :)
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

Thanks!

Eric



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (12 preceding siblings ...)
  2023-09-04  8:03 ` [PATCH 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
@ 2023-09-05  8:22 ` YangHang Liu
  2023-09-05  9:15   ` Eric Auger
  2023-09-05  9:50   ` Eric Auger
  2023-09-05 17:55 ` Alex Williamson
  14 siblings, 2 replies; 29+ messages in thread
From: YangHang Liu @ 2023-09-05  8:22 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini, peter.maydell, peterx, david,
	philmd

I have runned the following two tests, but both tests failed:
[1] start a VM with virtio-iommu + 2 ice PFs only via qemu-kvm 8.1.5
Test result : the qemu-kvm keeps throwing the error:  VFIO_MAP_DMA
failed: File exists. vfio_dma_map(0x56443d20fbe0, 0xffffe000, 0x1000,
0x7fb545709000) = -17 (File exists)
[2] start a VM with virtio-iommu + 2 ice PFs via libvirt-9.5 + qemu-kvm 8.1.5
Test result: the qemu-kvm core dump with
ERROR:../qom/object.c:1198:object_unref: assertion failed: (obj->ref >
0). Bail out! ERROR:../qom/object.c:1198:object_unref: assertion
failed: (obj->ref > 0)

After removing the 2 PF from the VM, both tests passed.

Tested-by: Yanghang Liu <yanghliu@redhat.com>

Best Regards,
YangHang Liu


On Mon, Sep 4, 2023 at 4:08 PM Eric Auger <eric.auger@redhat.com> wrote:
>
> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
> we encounter the case where the guest tries to map IOVAs beyond 48b
> whereas the physical VTD IOMMU only supports 48b. This ends up with
> VFIO_MAP_DMA failures at qemu level because at kernel level,
> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>
> This is due to the fact the virtio-iommu currently unconditionally
> exposes an IOVA range of 64b through its config input range fields.
>
> This series removes this assumption by retrieving the usable IOVA
> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
> a VFIO device is attached. This info is communicated to the
> virtio-iommu memory region, transformed into the inversed info, ie.
> the host reserved IOVA regions. Then those latter are combined with the
> reserved IOVA regions set though the virtio-iommu reserved-regions
> property. That way, the guest virtio-iommu driver, unchanged, is
> able to probe the whole set of reserved regions and prevent any IOVA
> belonging to those ranges from beeing used, achieving the original goal.
>
> Best Regards
>
> Eric
>
> This series can be found at:
> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
>
> Eric Auger (13):
>   memory: Let ReservedRegion use Range
>   memory: Introduce memory_region_iommu_set_iova_ranges
>   vfio: Collect container iova range info
>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>   range: Introduce range_inverse_array()
>   virtio-iommu: Implement set_iova_ranges() callback
>   range: Make range_compare() public
>   util/reserved-region: Add new ReservedRegion helpers
>   virtio-iommu: Consolidate host reserved regions and property set ones
>   test: Add some tests for range and resv-mem helpers
>   virtio-iommu: Resize memory region according to the max iova info
>   vfio: Remove 64-bit IOVA address space assumption
>
>  include/exec/memory.h            |  30 ++++-
>  include/hw/vfio/vfio-common.h    |   2 +
>  include/hw/virtio/virtio-iommu.h |   7 +-
>  include/qemu/range.h             |   9 ++
>  include/qemu/reserved-region.h   |  32 +++++
>  hw/core/qdev-properties-system.c |   9 +-
>  hw/vfio/common.c                 |  70 ++++++++---
>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>  hw/virtio/virtio-iommu.c         |  85 +++++++++++--
>  softmmu/memory.c                 |  15 +++
>  tests/unit/test-resv-mem.c       | 198 +++++++++++++++++++++++++++++++
>  util/range.c                     |  41 ++++++-
>  util/reserved-region.c           |  94 +++++++++++++++
>  hw/virtio/trace-events           |   1 +
>  tests/unit/meson.build           |   1 +
>  util/meson.build                 |   1 +
>  16 files changed, 562 insertions(+), 41 deletions(-)
>  create mode 100644 include/qemu/reserved-region.h
>  create mode 100644 tests/unit/test-resv-mem.c
>  create mode 100644 util/reserved-region.c
>
> --
> 2.41.0
>
>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-09-05  8:22 ` [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
@ 2023-09-05  9:15   ` Eric Auger
  2023-09-05  9:50   ` Eric Auger
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-05  9:15 UTC (permalink / raw)
  To: YangHang Liu
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini, peter.maydell, peterx, david,
	philmd

Hi Yanghang,

On 9/5/23 10:22, YangHang Liu wrote:
> I have runned the following two tests, but both tests failed:
> [1] start a VM with virtio-iommu + 2 ice PFs only via qemu-kvm 8.1.5
> Test result : the qemu-kvm keeps throwing the error:  VFIO_MAP_DMA
> failed: File exists. vfio_dma_map(0x56443d20fbe0, 0xffffe000, 0x1000,
> 0x7fb545709000) = -17 (File exists)
> [2] start a VM with virtio-iommu + 2 ice PFs via libvirt-9.5 + qemu-kvm 8.1.5
> Test result: the qemu-kvm core dump with
> ERROR:../qom/object.c:1198:object_unref: assertion failed: (obj->ref >
> 0). Bail out! ERROR:../qom/object.c:1198:object_unref: assertion
> failed: (obj->ref > 0)
>
> After removing the 2 PF from the VM, both tests passed.
> Tested-by: Yanghang Liu <yanghliu@redhat.com>

thank you for testing. If my understanding is correct you still
encountered some issues with/after the series. If this is correct you
shall not offer your Tested-by which means you tested the series and it
works fine for you/fixes your issue.

Coming back to the above mentionned issues:

1) the File Exists issue is known and is linked to the replay. This will
be handled separately, ie.I need to resume working on it as my first
approach was flawed: See
https://lore.kernel.org/all/20221207133646.635760-1-eric.auger@redhat.com/
This is unrelated to this series. Note this shouldn't prevent your
passthroughed device from working. Those should be just spurious
warnings that need to be removed.

2) the object_unref assertion most probaly is linked to that series and
I will to investigate asap.

Thank you again!

Eric
>
> Best Regards,
> YangHang Liu
>
>
> On Mon, Sep 4, 2023 at 4:08 PM Eric Auger <eric.auger@redhat.com> wrote:
>> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
>> we encounter the case where the guest tries to map IOVAs beyond 48b
>> whereas the physical VTD IOMMU only supports 48b. This ends up with
>> VFIO_MAP_DMA failures at qemu level because at kernel level,
>> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>>
>> This is due to the fact the virtio-iommu currently unconditionally
>> exposes an IOVA range of 64b through its config input range fields.
>>
>> This series removes this assumption by retrieving the usable IOVA
>> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
>> a VFIO device is attached. This info is communicated to the
>> virtio-iommu memory region, transformed into the inversed info, ie.
>> the host reserved IOVA regions. Then those latter are combined with the
>> reserved IOVA regions set though the virtio-iommu reserved-regions
>> property. That way, the guest virtio-iommu driver, unchanged, is
>> able to probe the whole set of reserved regions and prevent any IOVA
>> belonging to those ranges from beeing used, achieving the original goal.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
>>
>> Eric Auger (13):
>>   memory: Let ReservedRegion use Range
>>   memory: Introduce memory_region_iommu_set_iova_ranges
>>   vfio: Collect container iova range info
>>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>>   range: Introduce range_inverse_array()
>>   virtio-iommu: Implement set_iova_ranges() callback
>>   range: Make range_compare() public
>>   util/reserved-region: Add new ReservedRegion helpers
>>   virtio-iommu: Consolidate host reserved regions and property set ones
>>   test: Add some tests for range and resv-mem helpers
>>   virtio-iommu: Resize memory region according to the max iova info
>>   vfio: Remove 64-bit IOVA address space assumption
>>
>>  include/exec/memory.h            |  30 ++++-
>>  include/hw/vfio/vfio-common.h    |   2 +
>>  include/hw/virtio/virtio-iommu.h |   7 +-
>>  include/qemu/range.h             |   9 ++
>>  include/qemu/reserved-region.h   |  32 +++++
>>  hw/core/qdev-properties-system.c |   9 +-
>>  hw/vfio/common.c                 |  70 ++++++++---
>>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>>  hw/virtio/virtio-iommu.c         |  85 +++++++++++--
>>  softmmu/memory.c                 |  15 +++
>>  tests/unit/test-resv-mem.c       | 198 +++++++++++++++++++++++++++++++
>>  util/range.c                     |  41 ++++++-
>>  util/reserved-region.c           |  94 +++++++++++++++
>>  hw/virtio/trace-events           |   1 +
>>  tests/unit/meson.build           |   1 +
>>  util/meson.build                 |   1 +
>>  16 files changed, 562 insertions(+), 41 deletions(-)
>>  create mode 100644 include/qemu/reserved-region.h
>>  create mode 100644 tests/unit/test-resv-mem.c
>>  create mode 100644 util/reserved-region.c
>>
>> --
>> 2.41.0
>>
>>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-09-05  8:22 ` [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
  2023-09-05  9:15   ` Eric Auger
@ 2023-09-05  9:50   ` Eric Auger
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-05  9:50 UTC (permalink / raw)
  To: YangHang Liu
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini, peter.maydell, peterx, david,
	philmd

Hi,

On 9/5/23 10:22, YangHang Liu wrote:
> I have runned the following two tests, but both tests failed:
> [1] start a VM with virtio-iommu + 2 ice PFs only via qemu-kvm 8.1.5
> Test result : the qemu-kvm keeps throwing the error:  VFIO_MAP_DMA
> failed: File exists. vfio_dma_map(0x56443d20fbe0, 0xffffe000, 0x1000,
> 0x7fb545709000) = -17 (File exists)
> [2] start a VM with virtio-iommu + 2 ice PFs via libvirt-9.5 + qemu-kvm 8.1.5
> Test result: the qemu-kvm core dump with
> ERROR:../qom/object.c:1198:object_unref: assertion failed: (obj->ref >
> 0). Bail out! ERROR:../qom/object.c:1198:object_unref: assertion
> failed: (obj->ref > 0)
This latter issue is introduced by patch
[PATCH 12/13] virtio-iommu: Resize memory region according to the max
iova info
and especially the call to

memory_region_set_size(&mr->parent_obj, max_iova + 1);

I don't really get why at the moment but I will investigate ... Eric

>
> After removing the 2 PF from the VM, both tests passed.
>
> Tested-by: Yanghang Liu <yanghliu@redhat.com>
>
> Best Regards,
> YangHang Liu
>
>
> On Mon, Sep 4, 2023 at 4:08 PM Eric Auger <eric.auger@redhat.com> wrote:
>> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
>> we encounter the case where the guest tries to map IOVAs beyond 48b
>> whereas the physical VTD IOMMU only supports 48b. This ends up with
>> VFIO_MAP_DMA failures at qemu level because at kernel level,
>> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>>
>> This is due to the fact the virtio-iommu currently unconditionally
>> exposes an IOVA range of 64b through its config input range fields.
>>
>> This series removes this assumption by retrieving the usable IOVA
>> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
>> a VFIO device is attached. This info is communicated to the
>> virtio-iommu memory region, transformed into the inversed info, ie.
>> the host reserved IOVA regions. Then those latter are combined with the
>> reserved IOVA regions set though the virtio-iommu reserved-regions
>> property. That way, the guest virtio-iommu driver, unchanged, is
>> able to probe the whole set of reserved regions and prevent any IOVA
>> belonging to those ranges from beeing used, achieving the original goal.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
>>
>> Eric Auger (13):
>>   memory: Let ReservedRegion use Range
>>   memory: Introduce memory_region_iommu_set_iova_ranges
>>   vfio: Collect container iova range info
>>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>>   range: Introduce range_inverse_array()
>>   virtio-iommu: Implement set_iova_ranges() callback
>>   range: Make range_compare() public
>>   util/reserved-region: Add new ReservedRegion helpers
>>   virtio-iommu: Consolidate host reserved regions and property set ones
>>   test: Add some tests for range and resv-mem helpers
>>   virtio-iommu: Resize memory region according to the max iova info
>>   vfio: Remove 64-bit IOVA address space assumption
>>
>>  include/exec/memory.h            |  30 ++++-
>>  include/hw/vfio/vfio-common.h    |   2 +
>>  include/hw/virtio/virtio-iommu.h |   7 +-
>>  include/qemu/range.h             |   9 ++
>>  include/qemu/reserved-region.h   |  32 +++++
>>  hw/core/qdev-properties-system.c |   9 +-
>>  hw/vfio/common.c                 |  70 ++++++++---
>>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>>  hw/virtio/virtio-iommu.c         |  85 +++++++++++--
>>  softmmu/memory.c                 |  15 +++
>>  tests/unit/test-resv-mem.c       | 198 +++++++++++++++++++++++++++++++
>>  util/range.c                     |  41 ++++++-
>>  util/reserved-region.c           |  94 +++++++++++++++
>>  hw/virtio/trace-events           |   1 +
>>  tests/unit/meson.build           |   1 +
>>  util/meson.build                 |   1 +
>>  16 files changed, 562 insertions(+), 41 deletions(-)
>>  create mode 100644 include/qemu/reserved-region.h
>>  create mode 100644 tests/unit/test-resv-mem.c
>>  create mode 100644 util/reserved-region.c
>>
>> --
>> 2.41.0
>>
>>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (13 preceding siblings ...)
  2023-09-05  8:22 ` [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
@ 2023-09-05 17:55 ` Alex Williamson
  2023-09-06  6:40   ` Eric Auger
  14 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2023-09-05 17:55 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
	pbonzini, peter.maydell, peterx, david, philmd

On Mon,  4 Sep 2023 10:03:43 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
> we encounter the case where the guest tries to map IOVAs beyond 48b
> whereas the physical VTD IOMMU only supports 48b. This ends up with
> VFIO_MAP_DMA failures at qemu level because at kernel level,
> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
> 
> This is due to the fact the virtio-iommu currently unconditionally
> exposes an IOVA range of 64b through its config input range fields.
> 
> This series removes this assumption by retrieving the usable IOVA
> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
> a VFIO device is attached. This info is communicated to the
> virtio-iommu memory region, transformed into the inversed info, ie.
> the host reserved IOVA regions. Then those latter are combined with the
> reserved IOVA regions set though the virtio-iommu reserved-regions
> property. That way, the guest virtio-iommu driver, unchanged, is
> able to probe the whole set of reserved regions and prevent any IOVA
> belonging to those ranges from beeing used, achieving the original goal.

Hi Eric,

I don't quite follow this relative to device hotplug.  Are we
manipulating a per-device memory region which is created at device add
time?  Is that memory region actually shared in some cases, for instance
if we have a PCIe-to-PCI bridge aliasing devices on the conventional
side?  Thanks,

Alex

> This series can be found at:
> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
> 
> Eric Auger (13):
>   memory: Let ReservedRegion use Range
>   memory: Introduce memory_region_iommu_set_iova_ranges
>   vfio: Collect container iova range info
>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>   range: Introduce range_inverse_array()
>   virtio-iommu: Implement set_iova_ranges() callback
>   range: Make range_compare() public
>   util/reserved-region: Add new ReservedRegion helpers
>   virtio-iommu: Consolidate host reserved regions and property set ones
>   test: Add some tests for range and resv-mem helpers
>   virtio-iommu: Resize memory region according to the max iova info
>   vfio: Remove 64-bit IOVA address space assumption
> 
>  include/exec/memory.h            |  30 ++++-
>  include/hw/vfio/vfio-common.h    |   2 +
>  include/hw/virtio/virtio-iommu.h |   7 +-
>  include/qemu/range.h             |   9 ++
>  include/qemu/reserved-region.h   |  32 +++++
>  hw/core/qdev-properties-system.c |   9 +-
>  hw/vfio/common.c                 |  70 ++++++++---
>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>  hw/virtio/virtio-iommu.c         |  85 +++++++++++--
>  softmmu/memory.c                 |  15 +++
>  tests/unit/test-resv-mem.c       | 198 +++++++++++++++++++++++++++++++
>  util/range.c                     |  41 ++++++-
>  util/reserved-region.c           |  94 +++++++++++++++
>  hw/virtio/trace-events           |   1 +
>  tests/unit/meson.build           |   1 +
>  util/meson.build                 |   1 +
>  16 files changed, 562 insertions(+), 41 deletions(-)
>  create mode 100644 include/qemu/reserved-region.h
>  create mode 100644 tests/unit/test-resv-mem.c
>  create mode 100644 util/reserved-region.c
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-09-05 17:55 ` Alex Williamson
@ 2023-09-06  6:40   ` Eric Auger
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Auger @ 2023-09-06  6:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
	pbonzini, peter.maydell, peterx, david, philmd

Hi Alex,

On 9/5/23 19:55, Alex Williamson wrote:
> On Mon,  4 Sep 2023 10:03:43 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
>> we encounter the case where the guest tries to map IOVAs beyond 48b
>> whereas the physical VTD IOMMU only supports 48b. This ends up with
>> VFIO_MAP_DMA failures at qemu level because at kernel level,
>> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>>
>> This is due to the fact the virtio-iommu currently unconditionally
>> exposes an IOVA range of 64b through its config input range fields.
>>
>> This series removes this assumption by retrieving the usable IOVA
>> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
>> a VFIO device is attached. This info is communicated to the
>> virtio-iommu memory region, transformed into the inversed info, ie.
>> the host reserved IOVA regions. Then those latter are combined with the
>> reserved IOVA regions set though the virtio-iommu reserved-regions
>> property. That way, the guest virtio-iommu driver, unchanged, is
>> able to probe the whole set of reserved regions and prevent any IOVA
>> belonging to those ranges from beeing used, achieving the original goal.
> Hi Eric,
>
> I don't quite follow this relative to device hotplug.  Are we
> manipulating a per-device memory region which is created at device add
> time?  Is that memory region actually shared in some cases, for instance
> if we have a PCIe-to-PCI bridge aliasing devices on the conventional
> side?  Thanks,
I agree this deserves more attention and testing in the case of hotplug
and aliasing. Wrt PCIe to PCI bridge, virtio-iommu and smmu are known to
be broken with this latter due to lack of kernel support (issue with
group probing, but this might change in the future) so this is not a
currently supported feature, as opposed to virtual intel iommu. Here I
was mostly assuming one device per container and per IOMMU MR but maybe
I have to detect & forbid more complex scenari.

Thanks

Eric
> Alex
>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
>>
>> Eric Auger (13):
>>   memory: Let ReservedRegion use Range
>>   memory: Introduce memory_region_iommu_set_iova_ranges
>>   vfio: Collect container iova range info
>>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>>   range: Introduce range_inverse_array()
>>   virtio-iommu: Implement set_iova_ranges() callback
>>   range: Make range_compare() public
>>   util/reserved-region: Add new ReservedRegion helpers
>>   virtio-iommu: Consolidate host reserved regions and property set ones
>>   test: Add some tests for range and resv-mem helpers
>>   virtio-iommu: Resize memory region according to the max iova info
>>   vfio: Remove 64-bit IOVA address space assumption
>>
>>  include/exec/memory.h            |  30 ++++-
>>  include/hw/vfio/vfio-common.h    |   2 +
>>  include/hw/virtio/virtio-iommu.h |   7 +-
>>  include/qemu/range.h             |   9 ++
>>  include/qemu/reserved-region.h   |  32 +++++
>>  hw/core/qdev-properties-system.c |   9 +-
>>  hw/vfio/common.c                 |  70 ++++++++---
>>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>>  hw/virtio/virtio-iommu.c         |  85 +++++++++++--
>>  softmmu/memory.c                 |  15 +++
>>  tests/unit/test-resv-mem.c       | 198 +++++++++++++++++++++++++++++++
>>  util/range.c                     |  41 ++++++-
>>  util/reserved-region.c           |  94 +++++++++++++++
>>  hw/virtio/trace-events           |   1 +
>>  tests/unit/meson.build           |   1 +
>>  util/meson.build                 |   1 +
>>  16 files changed, 562 insertions(+), 41 deletions(-)
>>  create mode 100644 include/qemu/reserved-region.h
>>  create mode 100644 tests/unit/test-resv-mem.c
>>  create mode 100644 util/reserved-region.c
>>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/13] memory: Let ReservedRegion use Range
  2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
  2023-09-04  8:12   ` Philippe Mathieu-Daudé
@ 2023-09-06  7:17   ` David Hildenbrand
  2023-09-11 18:40   ` Peter Xu
  2 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2023-09-06  7:17 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, philmd

On 04.09.23 10:03, Eric Auger wrote:
> A reserved region is a range tagged with a type. Let's directly use
> the Range type in the prospect to reuse some of the library helpers
> shipped with the Range type.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   include/exec/memory.h            | 4 ++--
>   hw/core/qdev-properties-system.c | 9 ++++++---
>   hw/virtio/virtio-iommu.c         | 6 +++---
>   3 files changed, 11 insertions(+), 8 deletions(-)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/13] memory: Let ReservedRegion use Range
  2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
  2023-09-04  8:12   ` Philippe Mathieu-Daudé
  2023-09-06  7:17   ` David Hildenbrand
@ 2023-09-11 18:40   ` Peter Xu
  2 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2023-09-11 18:40 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini, peter.maydell, david, philmd

On Mon, Sep 04, 2023 at 10:03:44AM +0200, Eric Auger wrote:
> A reserved region is a range tagged with a type. Let's directly use
> the Range type in the prospect to reuse some of the library helpers
> shipped with the Range type.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges
  2023-09-04  8:03 ` [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
@ 2023-09-11 18:41   ` Peter Xu
  2023-09-14 13:38   ` David Hildenbrand
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Xu @ 2023-09-11 18:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini, peter.maydell, david, philmd

On Mon, Sep 04, 2023 at 10:03:45AM +0200, Eric Auger wrote:
> This helper will allow to convey information about valid
> IOVA ranges to virtual IOMMUS.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges
  2023-09-04  8:03 ` [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
  2023-09-11 18:41   ` Peter Xu
@ 2023-09-14 13:38   ` David Hildenbrand
  1 sibling, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2023-09-14 13:38 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm,
	alex.williamson, clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, philmd

On 04.09.23 10:03, Eric Auger wrote:
> This helper will allow to convey information about valid
> IOVA ranges to virtual IOMMUS.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   include/exec/memory.h | 26 ++++++++++++++++++++++++++
>   softmmu/memory.c      | 15 +++++++++++++++
>   2 files changed, 41 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 184cb3a01b..f6fb99dd3f 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -519,6 +519,27 @@ struct IOMMUMemoryRegionClass {
>        int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
>                                        uint64_t page_size_mask,
>                                        Error **errp);
> +    /**
> +     * @iommu_set_iova_ranges:
> +     *
> +     * Propagate information about the usable IOVA ranges for a given IOMMU
> +     * memory region. Used for example to propagate host physical device
> +     * reserved memory region constraints to the virtual IOMMU.
> +     *
> +     * Optional method: if this method is not provided, then the default IOVA
> +     * aperture is used.
> +     *
> +     * @nr_ranges: number of IOVA ranges
> +     *
> +     * @iova_ranges: an array of @nr_ranges usable IOVA ranges
> +     *
> +     * Returns 0 on success, or a negative error. In case of failure, the error
> +     * object must be created.
> +     */
> +     int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
> +                                  uint32_t nr_ranges,
> +                                  struct Range *iova_ranges,
> +                                  Error **errp);
>   };
>   
>   typedef struct RamDiscardListener RamDiscardListener;
> @@ -1845,6 +1866,11 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
>                                              uint64_t page_size_mask,
>                                              Error **errp);
>   
> +int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu,
> +                                        uint32_t nr_ranges,
> +                                        struct Range *iova_ranges,
> +                                        Error **errp);
> +
>   /**
>    * memory_region_name: get a memory region's name
>    *
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7d9494ce70..07499457aa 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1905,6 +1905,21 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
>       return ret;
>   }
>   
> +int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu_mr,
> +                                        uint32_t nr_ranges,
> +                                        struct Range *iova_ranges,
> +                                        Error **errp)
> +{
> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
> +    int ret = 0;
> +
> +    if (imrc->iommu_set_iova_ranges) {
> +        ret = imrc->iommu_set_iova_ranges(iommu_mr, nr_ranges,
> +                                          iova_ranges, errp);
> +    }
> +    return ret;
> +}
> +
>   int memory_region_register_iommu_notifier(MemoryRegion *mr,
>                                             IOMMUNotifier *n, Error **errp)
>   {

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-09-14 13:39 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-04  8:03 [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
2023-09-04  8:03 ` [PATCH 01/13] memory: Let ReservedRegion use Range Eric Auger
2023-09-04  8:12   ` Philippe Mathieu-Daudé
2023-09-04 14:22     ` Eric Auger
2023-09-06  7:17   ` David Hildenbrand
2023-09-11 18:40   ` Peter Xu
2023-09-04  8:03 ` [PATCH 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
2023-09-11 18:41   ` Peter Xu
2023-09-14 13:38   ` David Hildenbrand
2023-09-04  8:03 ` [PATCH 03/13] vfio: Collect container iova range info Eric Auger
2023-09-04  8:03 ` [PATCH 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
2023-09-04  8:03 ` [PATCH 05/13] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
2023-09-04  8:03 ` [PATCH 06/13] range: Introduce range_inverse_array() Eric Auger
2023-09-04  8:18   ` Philippe Mathieu-Daudé
2023-09-04 14:21     ` Eric Auger
2023-09-04  8:03 ` [PATCH 07/13] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
2023-09-04  8:03 ` [PATCH 08/13] range: Make range_compare() public Eric Auger
2023-09-04  8:19   ` Philippe Mathieu-Daudé
2023-09-04  8:03 ` [PATCH 09/13] util/reserved-region: Add new ReservedRegion helpers Eric Auger
2023-09-04  8:03 ` [PATCH 10/13] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
2023-09-04  8:03 ` [PATCH 11/13] test: Add some tests for range and resv-mem helpers Eric Auger
2023-09-04  8:03 ` [PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info Eric Auger
2023-09-04  8:21   ` Philippe Mathieu-Daudé
2023-09-04  8:03 ` [PATCH 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
2023-09-05  8:22 ` [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
2023-09-05  9:15   ` Eric Auger
2023-09-05  9:50   ` Eric Auger
2023-09-05 17:55 ` Alex Williamson
2023-09-06  6:40   ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.