All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge
@ 2015-09-24  4:33 David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer David Gibson
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

Hi Alex,

Here are the parts of my recent series to allow VFIO devices on the
spapr-pci-host-bridge device which affect the core VFIO code.  They've
been revised according to the comments from yourself and others.

There's also one patch for the memory subsystem.  Paolo can you let me
know if this needs to be sent separately.

Note that while these are motivated by the needs of the sPAPR code,
they changes should all be generally correct, and will allow safer and
more flexible use of VFIO devices in other potential situations as
well.

Please apply.

David Gibson (7):
  vfio: Remove unneeded union from VFIOContainer
  vfio: Generalize vfio_listener_region_add failure path
  vfio: Check guest IOVA ranges against host IOMMU capabilities
  vfio: Record host IOMMU's available IO page sizes
  memory: Allow replay of IOMMU mapping notifications
  vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  vfio: Expose a VFIO PCI device's group for EEH

 hw/vfio/common.c              | 140 +++++++++++++++++++++++++-----------------
 hw/vfio/pci.c                 |  14 +++++
 include/exec/memory.h         |  17 +++++
 include/hw/vfio/vfio-common.h |  23 +++----
 include/hw/vfio/vfio-pci.h    |  11 ++++
 memory.c                      |  18 ++++++
 6 files changed, 155 insertions(+), 68 deletions(-)
 create mode 100644 include/hw/vfio/vfio-pci.h

-- 
2.4.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
@ 2015-09-24  4:33 ` David Gibson
  2015-09-24 16:01   ` Alex Williamson
  2015-09-24 16:10   ` Thomas Huth
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 2/7] vfio: Generalize vfio_listener_region_add failure path David Gibson
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

Currently the VFIOContainer iommu_data field contains a union with
different information for different host iommu types.  However:
   * It only actually contains information for the x86-like "Type1" iommu
   * Because we have a common listener the Type1 fields are actually used
on all IOMMU types, including the SPAPR TCE type as well
   * There's no tag in the VFIOContainer to tell you which union member is
valid anyway.

In fact we now have a general structure for the listener which is unlikely
to ever need per-iommu-type information, so this patch removes the union.

In a similar way we can unify the setup of the vfio memory listener in
vfio_connect_container() that is currently split across a switch on iommu
type, but is effectively the same in both cases.

The iommu_data.release pointer was only needed as a cleanup function
which would handle potentially different data in the union.  With the
union gone, it too can be removed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c              | 52 ++++++++++++++++---------------------------
 include/hw/vfio/vfio-common.h | 16 +++----------
 2 files changed, 22 insertions(+), 46 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0d341a3..1545f62 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -315,8 +315,7 @@ out:
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     hwaddr iova, end;
     Int128 llend;
     void *vaddr;
@@ -406,9 +405,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * can gracefully fail.  Runtime, there's not much we can do other
          * than throw a hardware error.
          */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
+        if (!container->initialized) {
+            if (!container->error) {
+                container->error = ret;
             }
         } else {
             hw_error("vfio: DMA mapping failed, unable to continue");
@@ -419,8 +418,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
 static void vfio_listener_region_del(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     hwaddr iova, end;
     int ret;
 
@@ -485,7 +483,7 @@ static const MemoryListener vfio_memory_listener = {
 
 static void vfio_listener_release(VFIOContainer *container)
 {
-    memory_listener_unregister(&container->iommu_data.type1.listener);
+    memory_listener_unregister(&container->listener);
 }
 
 int vfio_mmap_region(Object *obj, VFIORegion *region,
@@ -683,21 +681,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
-        if (container->iommu_data.type1.error) {
-            ret = container->iommu_data.type1.error;
-            error_report("vfio: memory listener initialization failed for container");
-            goto listener_release_exit;
-        }
-
-        container->iommu_data.type1.initialized = true;
-
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -723,19 +706,24 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
         goto free_container_exit;
     }
 
+    container->listener = vfio_memory_listener;
+
+    memory_listener_register(&container->listener, container->space->as);
+
+    if (container->error) {
+        ret = container->error;
+        error_report("vfio: memory listener initialization failed for container");
+        goto listener_release_exit;
+    }
+
+    container->initialized = true;
+
     QLIST_INIT(&container->group_list);
     QLIST_INSERT_HEAD(&space->containers, container, next);
 
@@ -774,9 +762,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
         VFIOAddressSpace *space = container->space;
         VFIOGuestIOMMU *giommu, *tmp;
 
-        if (container->iommu_data.release) {
-            container->iommu_data.release(container);
-        }
+        vfio_listener_release(container);
         QLIST_REMOVE(container, next);
 
         QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9b9901f..fbbe6de 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -59,22 +59,12 @@ typedef struct VFIOAddressSpace {
 
 struct VFIOGroup;
 
-typedef struct VFIOType1 {
-    MemoryListener listener;
-    int error;
-    bool initialized;
-} VFIOType1;
-
 typedef struct VFIOContainer {
     VFIOAddressSpace *space;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    struct {
-        /* enable abstraction to support various iommu backends */
-        union {
-            VFIOType1 type1;
-        };
-        void (*release)(struct VFIOContainer *);
-    } iommu_data;
+    MemoryListener listener;
+    int error;
+    bool initialized;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 2/7] vfio: Generalize vfio_listener_region_add failure path
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer David Gibson
@ 2015-09-24  4:33 ` David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

If a DMA mapping operation fails in vfio_listener_region_add() it
checks to see if we've already completed initial setup of the
container.  If so it reports an error so the setup code can fail
gracefully, otherwise throws a hw_error().

There are other potential failure cases in vfio_listener_region_add()
which could benefit from the same logic, so move it to its own
fail: block.  Later patches can use this to extend other failure cases
to fail as gracefully as possible under the circumstances.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/vfio/common.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1545f62..95a4850 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -399,19 +399,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
         error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
                      "0x%"HWADDR_PRIx", %p) = %d (%m)",
                      container, iova, end - iova, vaddr, ret);
+        goto fail;
+    }
 
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->initialized) {
-            if (!container->error) {
-                container->error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
+    return;
+
+fail:
+    /*
+     * On the initfn path, store the first error in the container so we
+     * can gracefully fail.  Runtime, there's not much we can do other
+     * than throw a hardware error.
+     */
+    if (!container->initialized) {
+        if (!container->error) {
+            container->error = ret;
         }
+    } else {
+        hw_error("vfio: DMA mapping failed, unable to continue");
     }
 }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 2/7] vfio: Generalize vfio_listener_region_add failure path David Gibson
@ 2015-09-24  4:33 ` David Gibson
  2015-09-24 17:32   ` Alex Williamson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes David Gibson
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

The current vfio core code assumes that the host IOMMU is capable of
mapping any IOVA the guest wants to use to where we need.  However, real
IOMMUs generally only support translating a certain range of IOVAs (the
"DMA window") not a full 64-bit address space.

The common x86 IOMMUs support a wide enough range that guests are very
unlikely to go beyond it in practice, however the IOMMU used on IBM Power
machines - in the default configuration - supports only a much more limited
IOVA range, usually 0..2GiB.

If the guest attempts to set up an IOVA range that the host IOMMU can't
map, qemu won't report an error until it actually attempts to map a bad
IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
visible IOMMU) then this will show up very quickly.  If there is a guest
visible IOMMU, however, the problem might not show up until much later when
the guest actually attempt to DMA with an IOVA the host can't handle.

This patch adds a test so that we will detect earlier if the guest is
attempting to use IOVA ranges that the host IOMMU won't be able to deal
with.

For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
incorrect, but no worse than what we have already.  We can't do better for
now because the Type1 kernel interface doesn't tell us what IOVA range the
IOMMU actually supports.

For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
IOVA range and validate guest IOVA ranges against it, and this patch does
so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/vfio/common.c              | 40 +++++++++++++++++++++++++++++++++++++---
 include/hw/vfio/vfio-common.h |  6 ++++++
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 95a4850..f90cc75 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -343,14 +343,22 @@ static void vfio_listener_region_add(MemoryListener *listener,
     if (int128_ge(int128_make64(iova), llend)) {
         return;
     }
+    end = int128_get64(llend);
+
+    if ((iova < container->min_iova) || ((end - 1) > container->max_iova)) {
+        error_report("vfio: IOMMU container %p can't map guest IOVA region"
+                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
+                     container, iova, end - 1);
+        ret = -EFAULT; /* FIXME: better choice here? */
+        goto fail;
+    }
 
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
 
-        trace_vfio_listener_region_add_iommu(iova,
-                    int128_get64(int128_sub(llend, int128_one())));
+        trace_vfio_listener_region_add_iommu(iova, end - 1);
         /*
          * FIXME: We should do some checking to see if the
          * capabilities of the host VFIO IOMMU are adequate to model
@@ -387,7 +395,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     /* Here we assume that memory_region_is_ram(section->mr)==true */
 
-    end = int128_get64(llend);
     vaddr = memory_region_get_ram_ptr(section->mr) +
             section->offset_within_region +
             (iova - section->offset_within_address_space);
@@ -685,7 +692,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
+
+        /*
+         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
+         * IOVA whatsoever.  That's not actually true, but the current
+         * kernel interface doesn't tell us what it can map, and the
+         * existing Type1 IOMMUs generally support any IOVA we're
+         * going to actually try in practice.
+         */
+        container->min_iova = 0;
+        container->max_iova = (hwaddr)-1;
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+        struct vfio_iommu_spapr_tce_info info;
+
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
             error_report("vfio: failed to set group container: %m");
@@ -710,6 +729,21 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
+
+        /*
+         * FIXME: This only considers the host IOMMU' 32-bit window.
+         * At some point we need to add support for the optional
+         * 64-bit window and dynamic windows
+         */
+        info.argsz = sizeof(info);
+        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
+        if (ret) {
+            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+        container->min_iova = info.dma32_window_start;
+        container->max_iova = container->min_iova + info.dma32_window_size - 1;
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fbbe6de..859dbec 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -65,6 +65,12 @@ typedef struct VFIOContainer {
     MemoryListener listener;
     int error;
     bool initialized;
+    /*
+     * FIXME: This assumes the host IOMMU can support only a
+     * single contiguous IOVA window.  We may need to generalize
+     * that in future
+     */
+    hwaddr min_iova, max_iova;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (2 preceding siblings ...)
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
@ 2015-09-24  4:33 ` David Gibson
  2015-09-24 17:32   ` Alex Williamson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications David Gibson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

Depending on the host IOMMU type we determine and record the available page
sizes for IOMMU translation.  We'll need this for other validation in
future patches.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/vfio/common.c              | 13 +++++++++++++
 include/hw/vfio/vfio-common.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f90cc75..db8dff3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -677,6 +677,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
     if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
         ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
         bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
+        struct vfio_iommu_type1_info info;
 
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -702,6 +703,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
          */
         container->min_iova = 0;
         container->max_iova = (hwaddr)-1;
+
+        /* Assume just 4K IOVA page size */
+        container->iova_pgsizes = 0x1000;
+        info.argsz = sizeof(info);
+        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
+        /* Ignore errors */
+        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+            container->iova_pgsizes = info.iova_pgsizes;
+        }
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
         struct vfio_iommu_spapr_tce_info info;
 
@@ -744,6 +754,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
         }
         container->min_iova = info.dma32_window_start;
         container->max_iova = container->min_iova + info.dma32_window_size - 1;
+
+        /* Assume just 4K IOVA pages for now */
+        container->iova_pgsizes = 0x1000;
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 859dbec..b8a76e1 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -71,6 +71,7 @@ typedef struct VFIOContainer {
      * that in future
      */
     hwaddr min_iova, max_iova;
+    uint64_t iova_pgsizes;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (3 preceding siblings ...)
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes David Gibson
@ 2015-09-24  4:33 ` David Gibson
  2015-09-24 16:08   ` Laurent Vivier
                     ` (2 more replies)
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 6/7] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 7/7] vfio: Expose a VFIO PCI device's group for EEH David Gibson
  6 siblings, 3 replies; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

When we have guest visible IOMMUs, we allow notifiers to be registered
which will be informed of all changes to IOMMU mappings.  This is used by
vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.

However, unlike with a memory region listener, an iommu notifier won't be
told about any mappings which already exist in the (guest) IOMMU at the
time it is registered.  This can cause problems if hotplugging a VFIO
device onto a guest bus which had existing guest IOMMU mappings, but didn't
previously have an VFIO devices (and hence no host IOMMU mappings).

This adds a memory_region_register_iommu_notifier_replay() function to
handle this case.  As well as registering the new notifier it replays
existing mappings.  Because the IOMMU memory region doesn't internally
remember the granularity of the guest IOMMU it has a small hack where the
caller must specify a granularity at which to replay mappings.

If there are finer mappings in the guest IOMMU these will be reported in
the iotlb structures passed to the notifier which it must handle (probably
causing it to flag an error).  This isn't new - the VFIO iommu notifier
must already handle notifications about guest IOMMU mappings too short
for it to represent in the host IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/exec/memory.h | 17 +++++++++++++++++
 memory.c              | 18 ++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5baaf48..304f985 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -583,6 +583,23 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
 
 /**
+ * memory_region_register_iommu_notifier_replay: register a notifier
+ * for changes to IOMMU translation entries, and replay existing IOMMU
+ * translations to the new notifier.
+ *
+ * @mr: the memory region to observe
+ * @n: the notifier to be added; the notifier receives a pointer to an
+ *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
+ *     valid on exit from the notifier.
+ * @granularity: Minimum page granularity to replay notifications for
+ * @is_write: Whether to treat the replay as a translate "write"
+ *     through the iommu
+ */
+void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
+                                                  hwaddr granularity,
+                                                  bool is_write);
+
+/**
  * memory_region_unregister_iommu_notifier: unregister a notifier for
  * changes to IOMMU translation entries.
  *
diff --git a/memory.c b/memory.c
index ef87363..b4b6861 100644
--- a/memory.c
+++ b/memory.c
@@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
     notifier_list_add(&mr->iommu_notify, n);
 }
 
+void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
+                                                  hwaddr granularity,
+                                                  bool is_write)
+{
+    hwaddr addr;
+    IOMMUTLBEntry iotlb;
+
+    memory_region_register_iommu_notifier(mr, n);
+
+    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
+
+        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        if (iotlb.perm != IOMMU_NONE) {
+            n->notify(n, &iotlb);
+        }
+    }
+}
+
 void memory_region_unregister_iommu_notifier(Notifier *n)
 {
     notifier_remove(n);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 6/7] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (4 preceding siblings ...)
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications David Gibson
@ 2015-09-24  4:33 ` David Gibson
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 7/7] vfio: Expose a VFIO PCI device's group for EEH David Gibson
  6 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

At present the memory listener used by vfio to keep host IOMMU mappings
in sync with the guest memory image assumes that if a guest IOMMU
appears, then it has no existing mappings.

This may not be true if a VFIO device is hotplugged onto a guest bus
which didn't previously include a VFIO device, and which has existing
guest IOMMU mappings.

Therefore, use the memory_region_register_iommu_notifier_replay()
function in order to fix this case, replaying existing guest IOMMU
mappings, bringing the host IOMMU into sync with the guest IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index db8dff3..8a4fba9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -312,6 +312,11 @@ out:
     rcu_read_unlock();
 }
 
+static hwaddr vfio_container_granularity(VFIOContainer *container)
+{
+    return (hwaddr)1 << ctz64(container->iova_pgsizes);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
@@ -369,26 +374,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * would be the right place to wire that up (tell the KVM
          * device emulation the VFIO iommu handles to use).
          */
-        /*
-         * This assumes that the guest IOMMU is empty of
-         * mappings at this point.
-         *
-         * One way of doing this is:
-         * 1. Avoid sharing IOMMUs between emulated devices or different
-         * IOMMU groups.
-         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
-         * there are some mappings in IOMMU.
-         *
-         * VFIO on SPAPR does that. Other IOMMU models may do that different,
-         * they must make sure there are no existing mappings or
-         * loop through existing mappings to map them into VFIO.
-         */
         giommu = g_malloc0(sizeof(*giommu));
         giommu->iommu = section->mr;
         giommu->container = container;
         giommu->n.notify = vfio_iommu_map_notify;
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+        memory_region_register_iommu_notifier_replay(giommu->iommu, &giommu->n,
+            vfio_container_granularity(container), false);
 
         return;
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 7/7] vfio: Expose a VFIO PCI device's group for EEH
  2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (5 preceding siblings ...)
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 6/7] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
@ 2015-09-24  4:33 ` David Gibson
  6 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-24  4:33 UTC (permalink / raw)
  To: alex.williamson, pbonzini
  Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, David Gibson

The Enhanced Error Handling (EEH) interface in PAPR operates on units of a
Partitionable Endpoint (PE).  For VFIO devices, the PE boundaries the guest
sees must match the PE (i.e. IOMMU group) boundaries on the host.  To
implement this it will need to discover from VFIO which group a given
device belongs to.

This exposes a new vfio_pci_device_group() function for this purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/pci.c              | 14 ++++++++++++++
 include/hw/vfio/vfio-pci.h | 11 +++++++++++
 2 files changed, 25 insertions(+)
 create mode 100644 include/hw/vfio/vfio-pci.h

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dcabb6d..49ae834 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -35,6 +35,8 @@
 #include "pci.h"
 #include "trace.h"
 
+#include "hw/vfio/vfio-pci.h"
+
 #define MSIX_CAP_LENGTH 12
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
@@ -2312,6 +2314,18 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+VFIOGroup *vfio_pci_device_group(PCIDevice *pdev)
+{
+    VFIOPCIDevice *vdev;
+
+    if (!object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
+        return NULL;
+    }
+
+    vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    return vdev->vbasedev.group;
+}
+
 static int vfio_initfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio-pci.h b/include/hw/vfio/vfio-pci.h
new file mode 100644
index 0000000..32105f7
--- /dev/null
+++ b/include/hw/vfio/vfio-pci.h
@@ -0,0 +1,11 @@
+#ifndef VFIO_PCI_H
+#define VFIO_PCI_H
+
+#include "qemu/typedefs.h"
+
+/* We expose the concept of a VFIOGroup, though not its internals */
+typedef struct VFIOGroup VFIOGroup;
+
+extern VFIOGroup *vfio_pci_device_group(PCIDevice *pdev);
+
+#endif /* VFIO_PCI_H */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer David Gibson
@ 2015-09-24 16:01   ` Alex Williamson
  2015-09-25  5:14     ` David Gibson
  2015-09-24 16:10   ` Thomas Huth
  1 sibling, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2015-09-24 16:01 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> Currently the VFIOContainer iommu_data field contains a union with
> different information for different host iommu types.  However:
>    * It only actually contains information for the x86-like "Type1" iommu
>    * Because we have a common listener the Type1 fields are actually used
> on all IOMMU types, including the SPAPR TCE type as well
>    * There's no tag in the VFIOContainer to tell you which union member is
> valid anyway.

FWIW, this last point isn't valid.  The IOMMU setup determines which
union member is active and the listener and release functions are
specific to the union member.  There's no need whatsoever for a tag to
keep track of the union member in use.  The only problem is that the
union solved a problem that never really came to exist, so we can now
remove it and simplify things.

I'll remove this last bullet point unless there's some objection.
Thanks,

Alex


> In fact we now have a general structure for the listener which is unlikely
> to ever need per-iommu-type information, so this patch removes the union.
> 
> In a similar way we can unify the setup of the vfio memory listener in
> vfio_connect_container() that is currently split across a switch on iommu
> type, but is effectively the same in both cases.
> 
> The iommu_data.release pointer was only needed as a cleanup function
> which would handle potentially different data in the union.  With the
> union gone, it too can be removed.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 52 ++++++++++++++++---------------------------
>  include/hw/vfio/vfio-common.h | 16 +++----------
>  2 files changed, 22 insertions(+), 46 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 0d341a3..1545f62 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -315,8 +315,7 @@ out:
>  static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> -    VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            iommu_data.type1.listener);
> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>      hwaddr iova, end;
>      Int128 llend;
>      void *vaddr;
> @@ -406,9 +405,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           * can gracefully fail.  Runtime, there's not much we can do other
>           * than throw a hardware error.
>           */
> -        if (!container->iommu_data.type1.initialized) {
> -            if (!container->iommu_data.type1.error) {
> -                container->iommu_data.type1.error = ret;
> +        if (!container->initialized) {
> +            if (!container->error) {
> +                container->error = ret;
>              }
>          } else {
>              hw_error("vfio: DMA mapping failed, unable to continue");
> @@ -419,8 +418,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>  static void vfio_listener_region_del(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> -    VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            iommu_data.type1.listener);
> +    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
>      hwaddr iova, end;
>      int ret;
>  
> @@ -485,7 +483,7 @@ static const MemoryListener vfio_memory_listener = {
>  
>  static void vfio_listener_release(VFIOContainer *container)
>  {
> -    memory_listener_unregister(&container->iommu_data.type1.listener);
> +    memory_listener_unregister(&container->listener);
>  }
>  
>  int vfio_mmap_region(Object *obj, VFIORegion *region,
> @@ -683,21 +681,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> -
> -        container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> -
> -        memory_listener_register(&container->iommu_data.type1.listener,
> -                                 container->space->as);
> -
> -        if (container->iommu_data.type1.error) {
> -            ret = container->iommu_data.type1.error;
> -            error_report("vfio: memory listener initialization failed for container");
> -            goto listener_release_exit;
> -        }
> -
> -        container->iommu_data.type1.initialized = true;
> -
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
> @@ -723,19 +706,24 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> -
> -        container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> -
> -        memory_listener_register(&container->iommu_data.type1.listener,
> -                                 container->space->as);
> -
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
>          goto free_container_exit;
>      }
>  
> +    container->listener = vfio_memory_listener;
> +
> +    memory_listener_register(&container->listener, container->space->as);
> +
> +    if (container->error) {
> +        ret = container->error;
> +        error_report("vfio: memory listener initialization failed for container");
> +        goto listener_release_exit;
> +    }
> +
> +    container->initialized = true;
> +
>      QLIST_INIT(&container->group_list);
>      QLIST_INSERT_HEAD(&space->containers, container, next);
>  
> @@ -774,9 +762,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>          VFIOAddressSpace *space = container->space;
>          VFIOGuestIOMMU *giommu, *tmp;
>  
> -        if (container->iommu_data.release) {
> -            container->iommu_data.release(container);
> -        }
> +        vfio_listener_release(container);
>          QLIST_REMOVE(container, next);
>  
>          QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 9b9901f..fbbe6de 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -59,22 +59,12 @@ typedef struct VFIOAddressSpace {
>  
>  struct VFIOGroup;
>  
> -typedef struct VFIOType1 {
> -    MemoryListener listener;
> -    int error;
> -    bool initialized;
> -} VFIOType1;
> -
>  typedef struct VFIOContainer {
>      VFIOAddressSpace *space;
>      int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> -    struct {
> -        /* enable abstraction to support various iommu backends */
> -        union {
> -            VFIOType1 type1;
> -        };
> -        void (*release)(struct VFIOContainer *);
> -    } iommu_data;
> +    MemoryListener listener;
> +    int error;
> +    bool initialized;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
>      QLIST_ENTRY(VFIOContainer) next;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications David Gibson
@ 2015-09-24 16:08   ` Laurent Vivier
  2015-09-25  5:39     ` David Gibson
  2015-09-24 17:32   ` Alex Williamson
  2015-09-25 11:20   ` Paolo Bonzini
  2 siblings, 1 reply; 28+ messages in thread
From: Laurent Vivier @ 2015-09-24 16:08 UTC (permalink / raw)
  To: David Gibson, alex.williamson, pbonzini
  Cc: thuth, qemu-ppc, abologna, qemu-devel



On 24/09/2015 06:33, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_register_iommu_notifier_replay() function to
> handle this case.  As well as registering the new notifier it replays
> existing mappings.  Because the IOMMU memory region doesn't internally
> remember the granularity of the guest IOMMU it has a small hack where the
> caller must specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/exec/memory.h | 17 +++++++++++++++++
>  memory.c              | 18 ++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5baaf48..304f985 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -583,6 +583,23 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
>  
>  /**
> + * memory_region_register_iommu_notifier_replay: register a notifier
> + * for changes to IOMMU translation entries, and replay existing IOMMU
> + * translations to the new notifier.
> + *
> + * @mr: the memory region to observe
> + * @n: the notifier to be added; the notifier receives a pointer to an
> + *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> + *     valid on exit from the notifier.
> + * @granularity: Minimum page granularity to replay notifications for
> + * @is_write: Whether to treat the replay as a translate "write"
> + *     through the iommu
> + */
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity,
> +                                                  bool is_write);
> +
> +/**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
>   * changes to IOMMU translation entries.
>   *
> diff --git a/memory.c b/memory.c
> index ef87363..b4b6861 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>      notifier_list_add(&mr->iommu_notify, n);
>  }
>  
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity,
> +                                                  bool is_write)
> +{
> +    hwaddr addr;
> +    IOMMUTLBEntry iotlb;
> +
> +    memory_region_register_iommu_notifier(mr, n);
> +
> +    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> +
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        if (iotlb.perm != IOMMU_NONE) {
> +            n->notify(n, &iotlb);
> +        }
> +    }
> +}

If mr->size > (UINT64_MAX + 1 - granularity), you run into an infinite
loop because hwaddr is a 64bit value and the stop condition is beyond
its max value. You can avoid this by using the power of 2 of the
granularity, instead of the granularity:

int shift = ctz64(granularity);
hwaddr size = memory_region_size(mr) >> shift;
for (addr = 0; addr < size; addr++)
{
    iotlb = mr->iommu_ops->translate(mr, addr << shift, is_write);
...

so in patch 6, you should pass the power of 2 instead of the value of
the granularity.

Of course, it works if granularity is at least 2....

>  void memory_region_unregister_iommu_notifier(Notifier *n)
>  {
>      notifier_remove(n);
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer David Gibson
  2015-09-24 16:01   ` Alex Williamson
@ 2015-09-24 16:10   ` Thomas Huth
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Huth @ 2015-09-24 16:10 UTC (permalink / raw)
  To: David Gibson, alex.williamson, pbonzini
  Cc: lvivier, qemu-ppc, abologna, qemu-devel

On 24/09/15 06:33, David Gibson wrote:
> Currently the VFIOContainer iommu_data field contains a union with
> different information for different host iommu types.  However:
>    * It only actually contains information for the x86-like "Type1" iommu
>    * Because we have a common listener the Type1 fields are actually used
> on all IOMMU types, including the SPAPR TCE type as well
>    * There's no tag in the VFIOContainer to tell you which union member is
> valid anyway.
> 
> In fact we now have a general structure for the listener which is unlikely
> to ever need per-iommu-type information, so this patch removes the union.
> 
> In a similar way we can unify the setup of the vfio memory listener in
> vfio_connect_container() that is currently split across a switch on iommu
> type, but is effectively the same in both cases.
> 
> The iommu_data.release pointer was only needed as a cleanup function
> which would handle potentially different data in the union.  With the
> union gone, it too can be removed.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
[...]
>          QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 9b9901f..fbbe6de 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -59,22 +59,12 @@ typedef struct VFIOAddressSpace {
>  
>  struct VFIOGroup;
>  
> -typedef struct VFIOType1 {
> -    MemoryListener listener;
> -    int error;
> -    bool initialized;
> -} VFIOType1;
> -
>  typedef struct VFIOContainer {
>      VFIOAddressSpace *space;
>      int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> -    struct {
> -        /* enable abstraction to support various iommu backends */
> -        union {
> -            VFIOType1 type1;
> -        };
> -        void (*release)(struct VFIOContainer *);
> -    } iommu_data;
> +    MemoryListener listener;
> +    int error;
> +    bool initialized;

Hmmm, maybe it's just bikeshed painting, but would it make sense to
rename the field with a "iommu_" prefix now, e.g. "iommu_error" instead
of "error", so that it is more clear that "error" refers to the IOMMU
stuff? (sorry for coming up with this now after suggesting to remove the
"iommu_data" container which made this clear ... but sometimes you have
to see the code first...)

 Thomas


>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
>      QLIST_ENTRY(VFIOContainer) next;
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
@ 2015-09-24 17:32   ` Alex Williamson
  2015-09-25  5:20     ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2015-09-24 17:32 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> The current vfio core code assumes that the host IOMMU is capable of
> mapping any IOVA the guest wants to use to where we need.  However, real
> IOMMUs generally only support translating a certain range of IOVAs (the
> "DMA window") not a full 64-bit address space.
> 
> The common x86 IOMMUs support a wide enough range that guests are very
> unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> machines - in the default configuration - supports only a much more limited
> IOVA range, usually 0..2GiB.
> 
> If the guest attempts to set up an IOVA range that the host IOMMU can't
> map, qemu won't report an error until it actually attempts to map a bad
> IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> visible IOMMU) then this will show up very quickly.  If there is a guest
> visible IOMMU, however, the problem might not show up until much later when
> the guest actually attempt to DMA with an IOVA the host can't handle.
> 
> This patch adds a test so that we will detect earlier if the guest is
> attempting to use IOVA ranges that the host IOMMU won't be able to deal
> with.
> 
> For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> incorrect, but no worse than what we have already.  We can't do better for
> now because the Type1 kernel interface doesn't tell us what IOVA range the
> IOMMU actually supports.
> 
> For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> IOVA range and validate guest IOVA ranges against it, and this patch does
> so.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/vfio/common.c              | 40 +++++++++++++++++++++++++++++++++++++---
>  include/hw/vfio/vfio-common.h |  6 ++++++
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 95a4850..f90cc75 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -343,14 +343,22 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      if (int128_ge(int128_make64(iova), llend)) {
>          return;
>      }
> +    end = int128_get64(llend);
> +
> +    if ((iova < container->min_iova) || ((end - 1) > container->max_iova)) {
> +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> +                     container, iova, end - 1);
> +        ret = -EFAULT; /* FIXME: better choice here? */

"Bad address" makes sense to me.  This looks like an RFC comment, can we
remove it?

> +        goto fail;
> +    }
>  
>      memory_region_ref(section->mr);
>  
>      if (memory_region_is_iommu(section->mr)) {
>          VFIOGuestIOMMU *giommu;
>  
> -        trace_vfio_listener_region_add_iommu(iova,
> -                    int128_get64(int128_sub(llend, int128_one())));
> +        trace_vfio_listener_region_add_iommu(iova, end - 1);
>          /*
>           * FIXME: We should do some checking to see if the
>           * capabilities of the host VFIO IOMMU are adequate to model
> @@ -387,7 +395,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>  
>      /* Here we assume that memory_region_is_ram(section->mr)==true */
>  
> -    end = int128_get64(llend);
>      vaddr = memory_region_get_ram_ptr(section->mr) +
>              section->offset_within_region +
>              (iova - section->offset_within_address_space);
> @@ -685,7 +692,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> +
> +        /*
> +         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> +         * IOVA whatsoever.  That's not actually true, but the current
> +         * kernel interface doesn't tell us what it can map, and the
> +         * existing Type1 IOMMUs generally support any IOVA we're
> +         * going to actually try in practice.
> +         */
> +        container->min_iova = 0;
> +        container->max_iova = (hwaddr)-1;
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> +        struct vfio_iommu_spapr_tce_info info;
> +
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
>              error_report("vfio: failed to set group container: %m");
> @@ -710,6 +729,21 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> +
> +        /*
> +         * FIXME: This only considers the host IOMMU' 32-bit window.

IOMMU's?

> +         * At some point we need to add support for the optional
> +         * 64-bit window and dynamic windows
> +         */
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> +        if (ret) {
> +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> +            ret = -errno;
> +            goto free_container_exit;
> +        }
> +        container->min_iova = info.dma32_window_start;
> +        container->max_iova = container->min_iova + info.dma32_window_size - 1;
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index fbbe6de..859dbec 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -65,6 +65,12 @@ typedef struct VFIOContainer {
>      MemoryListener listener;
>      int error;
>      bool initialized;
> +    /*
> +     * FIXME: This assumes the host IOMMU can support only a
> +     * single contiguous IOVA window.  We may need to generalize
> +     * that in future
> +     */

There sure are a lot of FIXMEs here.  This just seems to be an
implementation note.  I certainly encourage comments, but they don't all
need to start with FIXME unless it's something we really should fix.
"... may need to generalize..." does not sound like such a case.

> +    hwaddr min_iova, max_iova;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
>      QLIST_ENTRY(VFIOContainer) next;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes David Gibson
@ 2015-09-24 17:32   ` Alex Williamson
  2015-09-25  5:21     ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2015-09-24 17:32 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> Depending on the host IOMMU type we determine and record the available page
> sizes for IOMMU translation.  We'll need this for other validation in
> future patches.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/vfio/common.c              | 13 +++++++++++++
>  include/hw/vfio/vfio-common.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f90cc75..db8dff3 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -677,6 +677,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>      if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>          ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>          bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> +        struct vfio_iommu_type1_info info;
>  
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
> @@ -702,6 +703,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>           */
>          container->min_iova = 0;
>          container->max_iova = (hwaddr)-1;
> +
> +        /* Assume just 4K IOVA page size */
> +        container->iova_pgsizes = 0x1000;
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
> +        /* Ignore errors */
> +        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> +            container->iova_pgsizes = info.iova_pgsizes;
> +        }
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>          struct vfio_iommu_spapr_tce_info info;
>  
> @@ -744,6 +754,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>          }
>          container->min_iova = info.dma32_window_start;
>          container->max_iova = container->min_iova + info.dma32_window_size - 1;
> +
> +        /* Assume just 4K IOVA pages for now */


Ironically, no FIXME ;)

> +        container->iova_pgsizes = 0x1000;
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 859dbec..b8a76e1 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -71,6 +71,7 @@ typedef struct VFIOContainer {
>       * that in future
>       */
>      hwaddr min_iova, max_iova;
> +    uint64_t iova_pgsizes;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
>      QLIST_ENTRY(VFIOContainer) next;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications David Gibson
  2015-09-24 16:08   ` Laurent Vivier
@ 2015-09-24 17:32   ` Alex Williamson
  2015-09-25  5:24     ` David Gibson
  2015-09-25 11:20   ` Paolo Bonzini
  2 siblings, 1 reply; 28+ messages in thread
From: Alex Williamson @ 2015-09-24 17:32 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_register_iommu_notifier_replay() function to
> handle this case.  As well as registering the new notifier it replays
> existing mappings.  Because the IOMMU memory region doesn't internally
> remember the granularity of the guest IOMMU it has a small hack where the
> caller must specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/exec/memory.h | 17 +++++++++++++++++
>  memory.c              | 18 ++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5baaf48..304f985 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -583,6 +583,23 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
>  
>  /**
> + * memory_region_register_iommu_notifier_replay: register a notifier
> + * for changes to IOMMU translation entries, and replay existing IOMMU
> + * translations to the new notifier.
> + *
> + * @mr: the memory region to observe
> + * @n: the notifier to be added; the notifier receives a pointer to an
> + *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> + *     valid on exit from the notifier.
> + * @granularity: Minimum page granularity to replay notifications for
> + * @is_write: Whether to treat the replay as a translate "write"
> + *     through the iommu
> + */
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity,
> +                                                  bool is_write);
> +
> +/**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
>   * changes to IOMMU translation entries.
>   *
> diff --git a/memory.c b/memory.c
> index ef87363..b4b6861 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>      notifier_list_add(&mr->iommu_notify, n);
>  }
>  
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity,
> +                                                  bool is_write)
> +{
> +    hwaddr addr;
> +    IOMMUTLBEntry iotlb;
> +
> +    memory_region_register_iommu_notifier(mr, n);
> +
> +    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> +
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        if (iotlb.perm != IOMMU_NONE) {
> +            n->notify(n, &iotlb);
> +        }
> +    }
> +}
> +


When memory_listener_register() replays mappings, it does so on an rcu
copy of the flatview for each AddressSpace.  Here we don't seem to have
anything protecting against concurrency... do we need to worry about
that?

>  void memory_region_unregister_iommu_notifier(Notifier *n)
>  {
>      notifier_remove(n);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer
  2015-09-24 16:01   ` Alex Williamson
@ 2015-09-25  5:14     ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-25  5:14 UTC (permalink / raw)
  To: Alex Williamson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]

On Thu, Sep 24, 2015 at 10:01:55AM -0600, Alex Williamson wrote:
> On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> > Currently the VFIOContainer iommu_data field contains a union with
> > different information for different host iommu types.  However:
> >    * It only actually contains information for the x86-like "Type1" iommu
> >    * Because we have a common listener the Type1 fields are actually used
> > on all IOMMU types, including the SPAPR TCE type as well
> >    * There's no tag in the VFIOContainer to tell you which union member is
> > valid anyway.
> 
> FWIW, this last point isn't valid.  The IOMMU setup determines which
> union member is active and the listener and release functions are
> specific to the union member.  There's no need whatsoever for a tag to
> keep track of the union member in use.  The only problem is that the
> union solved a problem that never really came to exist, so we can now
> remove it and simplify things.

I could argue some of the details there, but none of them are really
important.

> I'll remove this last bullet point unless there's some objection.
> Thanks,

That's fine.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-24 17:32   ` Alex Williamson
@ 2015-09-25  5:20     ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-25  5:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 7097 bytes --]

On Thu, Sep 24, 2015 at 11:32:01AM -0600, Alex Williamson wrote:
> On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> > The current vfio core code assumes that the host IOMMU is capable of
> > mapping any IOVA the guest wants to use to where we need.  However, real
> > IOMMUs generally only support translating a certain range of IOVAs (the
> > "DMA window") not a full 64-bit address space.
> > 
> > The common x86 IOMMUs support a wide enough range that guests are very
> > unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> > machines - in the default configuration - supports only a much more limited
> > IOVA range, usually 0..2GiB.
> > 
> > If the guest attempts to set up an IOVA range that the host IOMMU can't
> > map, qemu won't report an error until it actually attempts to map a bad
> > IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> > visible IOMMU) then this will show up very quickly.  If there is a guest
> > visible IOMMU, however, the problem might not show up until much later when
> > the guest actually attempt to DMA with an IOVA the host can't handle.
> > 
> > This patch adds a test so that we will detect earlier if the guest is
> > attempting to use IOVA ranges that the host IOMMU won't be able to deal
> > with.
> > 
> > For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> > incorrect, but no worse than what we have already.  We can't do better for
> > now because the Type1 kernel interface doesn't tell us what IOVA range the
> > IOMMU actually supports.
> > 
> > For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> > IOVA range and validate guest IOVA ranges against it, and this patch does
> > so.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> > ---
> >  hw/vfio/common.c              | 40 +++++++++++++++++++++++++++++++++++++---
> >  include/hw/vfio/vfio-common.h |  6 ++++++
> >  2 files changed, 43 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 95a4850..f90cc75 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -343,14 +343,22 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >      if (int128_ge(int128_make64(iova), llend)) {
> >          return;
> >      }
> > +    end = int128_get64(llend);
> > +
> > +    if ((iova < container->min_iova) || ((end - 1) > container->max_iova)) {
> > +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> > +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> > +                     container, iova, end - 1);
> > +        ret = -EFAULT; /* FIXME: better choice here? */
> 
> "Bad address" makes sense to me.  This looks like an RFC comment, can we
> remove it?

Ok.

> 
> > +        goto fail;
> > +    }
> >  
> >      memory_region_ref(section->mr);
> >  
> >      if (memory_region_is_iommu(section->mr)) {
> >          VFIOGuestIOMMU *giommu;
> >  
> > -        trace_vfio_listener_region_add_iommu(iova,
> > -                    int128_get64(int128_sub(llend, int128_one())));
> > +        trace_vfio_listener_region_add_iommu(iova, end - 1);
> >          /*
> >           * FIXME: We should do some checking to see if the
> >           * capabilities of the host VFIO IOMMU are adequate to model
> > @@ -387,7 +395,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >  
> >      /* Here we assume that memory_region_is_ram(section->mr)==true */
> >  
> > -    end = int128_get64(llend);
> >      vaddr = memory_region_get_ram_ptr(section->mr) +
> >              section->offset_within_region +
> >              (iova - section->offset_within_address_space);
> > @@ -685,7 +692,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >              ret = -errno;
> >              goto free_container_exit;
> >          }
> > +
> > +        /*
> > +         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> > +         * IOVA whatsoever.  That's not actually true, but the current
> > +         * kernel interface doesn't tell us what it can map, and the
> > +         * existing Type1 IOMMUs generally support any IOVA we're
> > +         * going to actually try in practice.
> > +         */
> > +        container->min_iova = 0;
> > +        container->max_iova = (hwaddr)-1;
> >      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> > +        struct vfio_iommu_spapr_tce_info info;
> > +
> >          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
> >          if (ret) {
> >              error_report("vfio: failed to set group container: %m");
> > @@ -710,6 +729,21 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >              ret = -errno;
> >              goto free_container_exit;
> >          }
> > +
> > +        /*
> > +         * FIXME: This only considers the host IOMMU' 32-bit window.
> 
> IOMMU's?

Yes.

> > +         * At some point we need to add support for the optional
> > +         * 64-bit window and dynamic windows
> > +         */
> > +        info.argsz = sizeof(info);
> > +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > +        if (ret) {
> > +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> > +            ret = -errno;
> > +            goto free_container_exit;
> > +        }
> > +        container->min_iova = info.dma32_window_start;
> > +        container->max_iova = container->min_iova + info.dma32_window_size - 1;
> >      } else {
> >          error_report("vfio: No available IOMMU models");
> >          ret = -EINVAL;
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index fbbe6de..859dbec 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -65,6 +65,12 @@ typedef struct VFIOContainer {
> >      MemoryListener listener;
> >      int error;
> >      bool initialized;
> > +    /*
> > +     * FIXME: This assumes the host IOMMU can support only a
> > +     * single contiguous IOVA window.  We may need to generalize
> > +     * that in future
> > +     */
> 
> There sure are a lot of FIXMEs here.  This just seems to be an
> implementation note.  I certainly encourage comments, but they don't all
> need to start with FIXME unless it's something we really should fix.
> "... may need to generalize..." does not sound like such a case.

True enough.  I'll file of some "FIXME"s.

> 
> > +    hwaddr min_iova, max_iova;
> >      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> >      QLIST_HEAD(, VFIOGroup) group_list;
> >      QLIST_ENTRY(VFIOContainer) next;
> 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes
  2015-09-24 17:32   ` Alex Williamson
@ 2015-09-25  5:21     ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-25  5:21 UTC (permalink / raw)
  To: Alex Williamson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2518 bytes --]

On Thu, Sep 24, 2015 at 11:32:14AM -0600, Alex Williamson wrote:
> On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> > Depending on the host IOMMU type we determine and record the available page
> > sizes for IOMMU translation.  We'll need this for other validation in
> > future patches.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Thomas Huth <thuth@redhat.com>
> > Reviewed-by: Laurent Vivier <lvivier@redhat.com>
> > ---
> >  hw/vfio/common.c              | 13 +++++++++++++
> >  include/hw/vfio/vfio-common.h |  1 +
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index f90cc75..db8dff3 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -677,6 +677,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >      if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
> >          ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
> >          bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> > +        struct vfio_iommu_type1_info info;
> >  
> >          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
> >          if (ret) {
> > @@ -702,6 +703,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >           */
> >          container->min_iova = 0;
> >          container->max_iova = (hwaddr)-1;
> > +
> > +        /* Assume just 4K IOVA page size */
> > +        container->iova_pgsizes = 0x1000;
> > +        info.argsz = sizeof(info);
> > +        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
> > +        /* Ignore errors */
> > +        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> > +            container->iova_pgsizes = info.iova_pgsizes;
> > +        }
> >      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> >          struct vfio_iommu_spapr_tce_info info;
> >  
> > @@ -744,6 +754,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >          }
> >          container->min_iova = info.dma32_window_start;
> >          container->max_iova = container->min_iova + info.dma32_window_size - 1;
> > +
> > +        /* Assume just 4K IOVA pages for now */
> 
> 
> Ironically, no FIXME ;)

:p

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-24 17:32   ` Alex Williamson
@ 2015-09-25  5:24     ` David Gibson
  2015-09-25 11:25       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2015-09-25  5:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: lvivier, thuth, abologna, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2941 bytes --]

On Thu, Sep 24, 2015 at 11:32:29AM -0600, Alex Williamson wrote:
> On Thu, 2015-09-24 at 14:33 +1000, David Gibson wrote:
> > When we have guest visible IOMMUs, we allow notifiers to be registered
> > which will be informed of all changes to IOMMU mappings.  This is used by
> > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> > 
> > However, unlike with a memory region listener, an iommu notifier won't be
> > told about any mappings which already exist in the (guest) IOMMU at the
> > time it is registered.  This can cause problems if hotplugging a VFIO
> > device onto a guest bus which had existing guest IOMMU mappings, but didn't
> > previously have an VFIO devices (and hence no host IOMMU mappings).
> > 
> > This adds a memory_region_register_iommu_notifier_replay() function to
> > handle this case.  As well as registering the new notifier it replays
> > existing mappings.  Because the IOMMU memory region doesn't internally
> > remember the granularity of the guest IOMMU it has a small hack where the
> > caller must specify a granularity at which to replay mappings.
> > 
> > If there are finer mappings in the guest IOMMU these will be reported in
> > the iotlb structures passed to the notifier which it must handle (probably
> > causing it to flag an error).  This isn't new - the VFIO iommu notifier
> > must already handle notifications about guest IOMMU mappings too short
> > for it to represent in the host IOMMU.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

[snip]
> > +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> > +                                                  hwaddr granularity,
> > +                                                  bool is_write)
> > +{
> > +    hwaddr addr;
> > +    IOMMUTLBEntry iotlb;
> > +
> > +    memory_region_register_iommu_notifier(mr, n);
> > +
> > +    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> > +
> > +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> > +        if (iotlb.perm != IOMMU_NONE) {
> > +            n->notify(n, &iotlb);
> > +        }
> > +    }
> > +}
> > +
> 
> 
> When memory_listener_register() replays mappings, it does so on an rcu
> copy of the flatview for each AddressSpace.  Here we don't seem to have
> anything protecting against concurrency... do we need to worry about
> that?

I was assuming that the IOMMU mappings are protected by the BQL.  I
_think_ that's the case (for every IOMMU we have so far), but I'm not
really sure how to be sure.

> 
> >  void memory_region_unregister_iommu_notifier(Notifier *n)
> >  {
> >      notifier_remove(n);
> 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-24 16:08   ` Laurent Vivier
@ 2015-09-25  5:39     ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-25  5:39 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: thuth, qemu-devel, abologna, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3974 bytes --]

On Thu, Sep 24, 2015 at 06:08:59PM +0200, Laurent Vivier wrote:
> 
> 
> On 24/09/2015 06:33, David Gibson wrote:
> > When we have guest visible IOMMUs, we allow notifiers to be registered
> > which will be informed of all changes to IOMMU mappings.  This is used by
> > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> > 
> > However, unlike with a memory region listener, an iommu notifier won't be
> > told about any mappings which already exist in the (guest) IOMMU at the
> > time it is registered.  This can cause problems if hotplugging a VFIO
> > device onto a guest bus which had existing guest IOMMU mappings, but didn't
> > previously have an VFIO devices (and hence no host IOMMU mappings).
> > 
> > This adds a memory_region_register_iommu_notifier_replay() function to
> > handle this case.  As well as registering the new notifier it replays
> > existing mappings.  Because the IOMMU memory region doesn't internally
> > remember the granularity of the guest IOMMU it has a small hack where the
> > caller must specify a granularity at which to replay mappings.
> > 
> > If there are finer mappings in the guest IOMMU these will be reported in
> > the iotlb structures passed to the notifier which it must handle (probably
> > causing it to flag an error).  This isn't new - the VFIO iommu notifier
> > must already handle notifications about guest IOMMU mappings too short
> > for it to represent in the host IOMMU.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/exec/memory.h | 17 +++++++++++++++++
> >  memory.c              | 18 ++++++++++++++++++
> >  2 files changed, 35 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 5baaf48..304f985 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -583,6 +583,23 @@ void memory_region_notify_iommu(MemoryRegion
> > *mr,
[snip]
> > +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> > +                                                  hwaddr granularity,
> > +                                                  bool is_write)
> > +{
> > +    hwaddr addr;
> > +    IOMMUTLBEntry iotlb;
> > +
> > +    memory_region_register_iommu_notifier(mr, n);
> > +
> > +    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> > +
> > +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> > +        if (iotlb.perm != IOMMU_NONE) {
> > +            n->notify(n, &iotlb);
> > +        }
> > +    }
> > +}
> 
> If mr->size > (UINT64_MAX + 1 - granularity), you run into an infinite
> loop because hwaddr is a 64bit value and the stop condition is beyond
> its max value. You can avoid this by using the power of 2 of the

Ugh, yes, and I think my old version with more int128s was still
wrong, too.

> granularity, instead of the granularity:
> 
> int shift = ctz64(granularity);
> hwaddr size = memory_region_size(mr) >> shift;
> for (addr = 0; addr < size; addr++)
> {
>     iotlb = mr->iommu_ops->translate(mr, addr << shift, is_write);
> ...
> 
> so in patch 6, you should pass the power of 2 instead of the value of
> the granularity.
> 
> Of course, it works if granularity is at least 2....

Hrm, rather clunky.

I've instead gone for putting this at the end of the loop body:

        /* if (2^64 - MR size) < granularity, it's possible to get an
         * infinite loop here.  This should catch such a wraparound */
        if ((addr + granularity) < addr) {
            break;
        }

Of course, unless granularity is huge, stepping through a whole 2^64
address space might be indistinguishable from an infinite loop in
practice..

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-24  4:33 ` [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications David Gibson
  2015-09-24 16:08   ` Laurent Vivier
  2015-09-24 17:32   ` Alex Williamson
@ 2015-09-25 11:20   ` Paolo Bonzini
  2015-09-25 11:33     ` David Gibson
  2 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-09-25 11:20 UTC (permalink / raw)
  To: David Gibson, alex.williamson
  Cc: lvivier, thuth, qemu-ppc, abologna, qemu-devel



On 24/09/2015 06:33, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_register_iommu_notifier_replay() function to
> handle this case.  As well as registering the new notifier it replays
> existing mappings.  Because the IOMMU memory region doesn't internally
> remember the granularity of the guest IOMMU it has a small hack where the
> caller must specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

The patch is okay, just two questions:

1) Is there a case where using the no-replay functions makes sense?

2) You could add a ->replay function to the iommu_ops to optimize it, if
you want.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-25  5:24     ` David Gibson
@ 2015-09-25 11:25       ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2015-09-25 11:25 UTC (permalink / raw)
  To: David Gibson, Alex Williamson
  Cc: lvivier, thuth, qemu-ppc, abologna, qemu-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



On 25/09/2015 07:24, David Gibson wrote:
>> When memory_listener_register() replays mappings, it does so on 
>> an rcu copy of the flatview for each AddressSpace.  Here we
>> don't seem to have anything protecting against concurrency... do
>> we need to worry about that?
> 
> I was assuming that the IOMMU mappings are protected by the BQL. I
> _think_ that's the case (for every IOMMU we have so far), but I'm 
> not really sure how to be sure.

Yes, even in listener_add_address_space there's no real need to use
address_space_get_flatview because updates to both the memory maps and
the MemoryListener list are protected by the BQL.  It could just read
as->current_map directly.

listener_add_address_space plays it a bit safe because QEMU doesn't
(yet?) have stuff like rcu_dereference_check.  It's sad that we'll
have to reinvent so much debugging stuff from Linux...

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWBS9AAAoJEL/70l94x66D5l0H/AxF9flRUf6PaNO6RLpd0N8y
z17BXqKuGIIevkGtF66xCRNDTRgfHKlugKkSQjaQIlGlh2k/nKfozcFERiE6/unv
QmUS3+c3ryLMhOC4VidU90Krq1ZsXLLNE1Z81aSQwD4Y0LXEp3hFuL1F/K6tDUtY
Vzk2KJu/bEaeDhJa4UglBOBLGWIYNRvokIFc2TdoMjdBvKfygLHfjL3rroQOle4U
40/mBO+3J58cyfCBYha0U9DxVTvceaFubj3M72W3ajenXZ15lq75B0M9GCNNujHE
oaP8ehQ5wByyiGGmcZkStEXU/ups7J6cMWWjpTak3PqO0O5DVo+j5P4QbvJE57Q=
=gQRk
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-25 11:20   ` Paolo Bonzini
@ 2015-09-25 11:33     ` David Gibson
  2015-09-25 12:04       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2015-09-25 11:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2309 bytes --]

On Fri, Sep 25, 2015 at 01:20:12PM +0200, Paolo Bonzini wrote:
> 
> 
> On 24/09/2015 06:33, David Gibson wrote:
> > When we have guest visible IOMMUs, we allow notifiers to be registered
> > which will be informed of all changes to IOMMU mappings.  This is used by
> > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> > 
> > However, unlike with a memory region listener, an iommu notifier won't be
> > told about any mappings which already exist in the (guest) IOMMU at the
> > time it is registered.  This can cause problems if hotplugging a VFIO
> > device onto a guest bus which had existing guest IOMMU mappings, but didn't
> > previously have an VFIO devices (and hence no host IOMMU mappings).
> > 
> > This adds a memory_region_register_iommu_notifier_replay() function to
> > handle this case.  As well as registering the new notifier it replays
> > existing mappings.  Because the IOMMU memory region doesn't internally
> > remember the granularity of the guest IOMMU it has a small hack where the
> > caller must specify a granularity at which to replay mappings.
> > 
> > If there are finer mappings in the guest IOMMU these will be reported in
> > the iotlb structures passed to the notifier which it must handle (probably
> > causing it to flag an error).  This isn't new - the VFIO iommu notifier
> > must already handle notifications about guest IOMMU mappings too short
> > for it to represent in the host IOMMU.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> 
> The patch is okay, just two questions:
> 
> 1) Is there a case where using the no-replay functions makes sense?

I'm not sure.  I think vfio is the only user so far, so I guess that's
technically a no.  I was reluctant to change the interface and
semantics just off the bat, though.

> 2) You could add a ->replay function to the iommu_ops to optimize it, if
> you want.

Yes.  Originally I was going to implement with a ->replay function,
before I realised I didn't actually need it.  I figure we can optimize
when and if it proves necessary.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-25 11:33     ` David Gibson
@ 2015-09-25 12:04       ` Paolo Bonzini
  2015-09-26  6:54         ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-09-25 12:04 UTC (permalink / raw)
  To: David Gibson
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



On 25/09/2015 13:33, David Gibson wrote:
> 1) Is there a case where using the no-replay functions makes
> sense?
> 
> I'm not sure.  I think vfio is the only user so far, so I guess
> that's technically a no.  I was reluctant to change the interface
> and semantics just off the bat, though.

Considering memory_region_listener does the reply, I think it's okay.

For solving the problem that Laurent mentioned, using int128 seems
like the easiest solution...

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWBTg7AAoJEL/70l94x66De7AIAIBdzM9tzuZzlMYKG0505+a+
E3K+CrWG0EwN1yzCWU/g4Au17BrSKQ7dVaxeemw48txHtlxxRaTU/WpV0ZGZLS2H
upk3hdgbbgMR7IDgpPHx0MPXDD3nW3h2Xnom5jDr0UF/xpcnVRe+osvs5R3AS8GI
bbbm/QZ3qkIUBUz7w97/d8wYCkZarsB8LoVYoy1nBLvr0QBsL+V0AGV1aw9DpgDk
BtagDntAZQK1doYDVyThK6Ht4agVRF0FEUKR1fej27CriwJJnCOK5zVBP6S9gIZZ
FycmNat1OT573ATlqkjXK6zHsWSvmoM0gdE4Fq3+k/qme3Waz9/v++EEBgqNHXU=
=k/Kg
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-25 12:04       ` Paolo Bonzini
@ 2015-09-26  6:54         ` David Gibson
  2015-09-28  8:59           ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2015-09-26  6:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Fri, Sep 25, 2015 at 02:04:14PM +0200, Paolo Bonzini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> 
> 
> On 25/09/2015 13:33, David Gibson wrote:
> > 1) Is there a case where using the no-replay functions makes
> > sense?
> > 
> > I'm not sure.  I think vfio is the only user so far, so I guess
> > that's technically a no.  I was reluctant to change the interface
> > and semantics just off the bat, though.
> 
> Considering memory_region_listener does the reply, I think it's
> okay.

Uh.. just to be clear, are you saying I should change this so there's
only the replaying interface?

> For solving the problem that Laurent mentioned, using int128 seems
> like the easiest solution...

Maybe.  It means I have to do all the address calculation in the loop
with an int128, then truncate it to do the actual call.  That seems
harder to me than the overflow check I added, but I suppose it's
conceptually similar in some ways.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-26  6:54         ` David Gibson
@ 2015-09-28  8:59           ` Paolo Bonzini
  2015-09-29  3:30             ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-09-28  8:59 UTC (permalink / raw)
  To: David Gibson
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc



On 26/09/2015 08:54, David Gibson wrote:
> On Fri, Sep 25, 2015 at 02:04:14PM +0200, Paolo Bonzini wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>> 
>> 
>> 
>> On 25/09/2015 13:33, David Gibson wrote:
>>> 1) Is there a case where using the no-replay functions makes 
>>> sense?
>>> 
>>> I'm not sure.  I think vfio is the only user so far, so I
>>> guess that's technically a no.  I was reluctant to change the
>>> interface and semantics just off the bat, though.
>> 
>> Considering memory_region_listener does the reply, I think it's 
>> okay.
> 
> Uh.. just to be clear, are you saying I should change this so
> there's only the replaying interface?

Maybe...  The only issue is the "granularity" argument, which is
not in memory_region_register_iommu_notifier.  That makes me wonder if
the replay and registration make sense as separate operations.

What about adding a new function memory_region_iommu_replay and
separate the two phases?

>> For solving the problem that Laurent mentioned, using int128
>> seems like the easiest solution...
> 
> Maybe.  It means I have to do all the address calculation in the
> loop with an int128, then truncate it to do the actual call.  That
> seems harder to me than the overflow check I added, but I suppose
> it's conceptually similar in some ways.

Your overflow check is also okay, I wrote this before seeing the
updated version.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-28  8:59           ` Paolo Bonzini
@ 2015-09-29  3:30             ` David Gibson
  2015-09-29  7:15               ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: David Gibson @ 2015-09-29  3:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

On Mon, Sep 28, 2015 at 10:59:03AM +0200, Paolo Bonzini wrote:
> 
> 
> On 26/09/2015 08:54, David Gibson wrote:
> > On Fri, Sep 25, 2015 at 02:04:14PM +0200, Paolo Bonzini wrote:
> >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
> >> 
> >> 
> >> 
> >> On 25/09/2015 13:33, David Gibson wrote:
> >>> 1) Is there a case where using the no-replay functions makes 
> >>> sense?
> >>> 
> >>> I'm not sure.  I think vfio is the only user so far, so I
> >>> guess that's technically a no.  I was reluctant to change the
> >>> interface and semantics just off the bat, though.
> >> 
> >> Considering memory_region_listener does the reply, I think it's 
> >> okay.
> > 
> > Uh.. just to be clear, are you saying I should change this so
> > there's only the replaying interface?
> 
> Maybe...  The only issue is the "granularity" argument, which is
> not in memory_region_register_iommu_notifier.  That makes me wonder if
> the replay and registration make sense as separate operations.
> 
> What about adding a new function memory_region_iommu_replay and
> separate the two phases?

Hm.. I'm not sure I see much advantage to separating the phases.  But
I don't particularly object to the idea either.  So, I think it's your
call.

> >> For solving the problem that Laurent mentioned, using int128
> >> seems like the easiest solution...
> > 
> > Maybe.  It means I have to do all the address calculation in the
> > loop with an int128, then truncate it to do the actual call.  That
> > seems harder to me than the overflow check I added, but I suppose
> > it's conceptually similar in some ways.
> 
> Your overflow check is also okay, I wrote this before seeing the
> updated version.
> 
> Paolo
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-29  3:30             ` David Gibson
@ 2015-09-29  7:15               ` Paolo Bonzini
  2015-09-30  2:15                 ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2015-09-29  7:15 UTC (permalink / raw)
  To: David Gibson
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc



On 29/09/2015 05:30, David Gibson wrote:
>> Maybe...  The only issue is the "granularity" argument, which is 
>> not in memory_region_register_iommu_notifier.  That makes me
>> wonder if the replay and registration make sense as separate
>> operations.
>> 
>> What about adding a new function memory_region_iommu_replay and 
>> separate the two phases?
> 
> Hm.. I'm not sure I see much advantage to separating the phases.

It's just that it's (IMO) the clearest API.  It shows how the
granularity is only used for replay.

> But I don't particularly object to the idea either.  So, I think
it's your
> call.

Yes, let's separate it.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications
  2015-09-29  7:15               ` Paolo Bonzini
@ 2015-09-30  2:15                 ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2015-09-30  2:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lvivier, thuth, qemu-devel, abologna, alex.williamson, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Tue, Sep 29, 2015 at 09:15:58AM +0200, Paolo Bonzini wrote:
> 
> 
> On 29/09/2015 05:30, David Gibson wrote:
> >> Maybe...  The only issue is the "granularity" argument, which is 
> >> not in memory_region_register_iommu_notifier.  That makes me
> >> wonder if the replay and registration make sense as separate
> >> operations.
> >> 
> >> What about adding a new function memory_region_iommu_replay and 
> >> separate the two phases?
> > 
> > Hm.. I'm not sure I see much advantage to separating the phases.
> 
> It's just that it's (IMO) the clearest API.  It shows how the
> granularity is only used for replay.
> 
> > But I don't particularly object to the idea either.  So, I think
> it's your
> > call.
> 
> Yes, let's separate it.

Ok, I've done that, and sent a hopefull final respin of the series.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-09-30  2:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-24  4:33 [Qemu-devel] [PATCH 0/7] VFIO extensions to allow VFIO devices on spapr-pci-host-bridge David Gibson
2015-09-24  4:33 ` [Qemu-devel] [PATCH 1/7] vfio: Remove unneeded union from VFIOContainer David Gibson
2015-09-24 16:01   ` Alex Williamson
2015-09-25  5:14     ` David Gibson
2015-09-24 16:10   ` Thomas Huth
2015-09-24  4:33 ` [Qemu-devel] [PATCH 2/7] vfio: Generalize vfio_listener_region_add failure path David Gibson
2015-09-24  4:33 ` [Qemu-devel] [PATCH 3/7] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
2015-09-24 17:32   ` Alex Williamson
2015-09-25  5:20     ` David Gibson
2015-09-24  4:33 ` [Qemu-devel] [PATCH 4/7] vfio: Record host IOMMU's available IO page sizes David Gibson
2015-09-24 17:32   ` Alex Williamson
2015-09-25  5:21     ` David Gibson
2015-09-24  4:33 ` [Qemu-devel] [PATCH 5/7] memory: Allow replay of IOMMU mapping notifications David Gibson
2015-09-24 16:08   ` Laurent Vivier
2015-09-25  5:39     ` David Gibson
2015-09-24 17:32   ` Alex Williamson
2015-09-25  5:24     ` David Gibson
2015-09-25 11:25       ` Paolo Bonzini
2015-09-25 11:20   ` Paolo Bonzini
2015-09-25 11:33     ` David Gibson
2015-09-25 12:04       ` Paolo Bonzini
2015-09-26  6:54         ` David Gibson
2015-09-28  8:59           ` Paolo Bonzini
2015-09-29  3:30             ` David Gibson
2015-09-29  7:15               ` Paolo Bonzini
2015-09-30  2:15                 ` David Gibson
2015-09-24  4:33 ` [Qemu-devel] [PATCH 6/7] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
2015-09-24  4:33 ` [Qemu-devel] [PATCH 7/7] vfio: Expose a VFIO PCI device's group for EEH David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.