All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge
@ 2015-09-17 13:09 David Gibson
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
                   ` (12 more replies)
  0 siblings, 13 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

Currently the pseries machine type uses two types of PCI Host Bridge
(PHB) devices: "spapr-pci-host-bridge" the 'normal' variant intended
for emulated PCI devices, and "spapr-pci-vfio-host-bridge" intended
for VFIO devices.

When using VFIO with pseries, a separate spapr-pci-vfio-host-bridge
device is needed for every host IOMMU group from which you're using
VFIO devices.  This is quite awkward for the user and/or management
tools.  It's especially awkward since the current code makes
essentially no attempt to detect and warn the user if the wrong sorts
of devices are connected to the wrong PHB.

It turns out that the VFIO core code is actually general enough that
VFIO devices almost work on the normal spapr-pci-host-bridge device.
In fact with the right combination of circumstances they *can* work
right now.

spapr-pci-vfio-host-bridge does 3 additional things:

    1) It disables KVM acceleration of the guest IOMMU.  That
       acceleration breaks VFIO because it means guest IOMMU updates
       bypass the VFIO infrastructure which keeps the host IOMMU in
       sync.

    2) It automatically configures the guest PHB's DMA window to match
       the capabilities of the host IOMMU, and advertises that to the
       guest.

    3) It provides additional handling of EEH (Enhanced Error
       Handling) functions.

This patch series:
    * Allows VFIO devices to be used on the spapr-pci-host-bridge by
      auto-switching the KVM TCE acceleration

    * Adds verification that the host IOMMU can handle the DMA windows
      used by guest PHBs

    * Allows the DMA window on the guest PHB to be configured with
      device properties.  This can be used to make sure it matches a
      host window, but in practice the default setting will already
      work with the host IOMMU on all current systems.

    * Adds support to the VFIO core to allow a VFIO device to be
      hotplugged onto a bus which doesn't yet have VFIO devices.  This
      already worked for systems without a guest-visible IOMMU
      (i.e. x86), this series makes it work even with a guest visible
      IOMMU.

    * Makes a few related cleanups along the way

This series does NOT allow EEH operations on VFIO devices on the
spapr-pci-host-bridge device, so the spapr-pci-vfio-host-bridge device
is left in for now.  It turns out there are some serious existing
problems in both the qemu EEH implementation and (worse) in the
EEH/VFIO kernel interface.  Fixing those is a problem for another day.
Maybe tomorrow.


I've tested basic assignment of an xHCI to a pseries guest, both at
startup and with hotplug.  I haven't (yet) tested VFIO on x86 with
this series.

This series probably needs to be merged via several different trees.
I'm intending to split up as necessary once it's had some review.

David Gibson (10):
  vfio: Remove unneeded union from VFIOContainer
  vfio: Generalize vfio_listener_region_add failure path
  vfio: Check guest IOVA ranges against host IOMMU capabilities
  vfio: Record host IOMMU's available IO page sizes
  memory: Allow replay of IOMMU mapping notifications
  vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  spapr_pci: Allow PCI host bridge DMA window to be configured
  spapr_iommu: Rename vfio_accel parameter
  spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
  spapr_pci: Allow VFIO devices to work on the normal PCI host bridge

 hw/ppc/spapr_iommu.c          |  25 ++++++-
 hw/ppc/spapr_pci.c            |  13 +++-
 hw/vfio/common.c              | 152 +++++++++++++++++++++++++++---------------
 include/exec/memory.h         |  16 +++++
 include/hw/pci-host/spapr.h   |   3 +-
 include/hw/ppc/spapr.h        |   6 +-
 include/hw/vfio/vfio-common.h |  21 +++---
 memory.c                      |  18 +++++
 target-ppc/kvm.c              |   4 +-
 target-ppc/kvm_ppc.h          |   2 +-
 10 files changed, 184 insertions(+), 76 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-18  6:15   ` Alexey Kardashevskiy
                     ` (2 more replies)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path David Gibson
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

Currently the VFIOContainer iommu_data field contains a union with
different information for different host iommu types.  However:
   * It only actually contains information for the x86-like "Type1" iommu
   * Because we have a common listener the Type1 fields are actually used
on all IOMMU types, including the SPAPR TCE type as well
   * There's no tag in the VFIOContainer to tell you which union member is
valid anyway.

In fact we now have a general structure for the listener which is unlikely
to ever need per-iommu-type information, so this patch removes the union.

In a similar way we can unify the setup of the vfio memory listener in
vfio_connect_container() that is currently split across a switch on iommu
type, but is effectively the same in both cases.

The iommu_data.release pointer was only needed as a cleanup function
which would handle potentially different data in the union.  With the
union gone, it too can be removed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c              | 51 +++++++++++++++++--------------------------
 include/hw/vfio/vfio-common.h | 14 +++---------
 2 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6d21311..e3152f6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -316,7 +316,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
+                                            iommu_data.listener);
     hwaddr iova, end;
     Int128 llend;
     void *vaddr;
@@ -406,9 +406,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * can gracefully fail.  Runtime, there's not much we can do other
          * than throw a hardware error.
          */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
+        if (!container->iommu_data.initialized) {
+            if (!container->iommu_data.error) {
+                container->iommu_data.error = ret;
             }
         } else {
             hw_error("vfio: DMA mapping failed, unable to continue");
@@ -420,7 +420,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
+                                            iommu_data.listener);
     hwaddr iova, end;
     int ret;
 
@@ -485,7 +485,7 @@ static const MemoryListener vfio_memory_listener = {
 
 static void vfio_listener_release(VFIOContainer *container)
 {
-    memory_listener_unregister(&container->iommu_data.type1.listener);
+    memory_listener_unregister(&container->iommu_data.listener);
 }
 
 int vfio_mmap_region(Object *obj, VFIORegion *region,
@@ -683,21 +683,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
-        if (container->iommu_data.type1.error) {
-            ret = container->iommu_data.type1.error;
-            error_report("vfio: memory listener initialization failed for container");
-            goto listener_release_exit;
-        }
-
-        container->iommu_data.type1.initialized = true;
-
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -723,19 +708,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
         goto free_container_exit;
     }
 
+    container->iommu_data.listener = vfio_memory_listener;
+
+    memory_listener_register(&container->iommu_data.listener,
+                             container->space->as);
+
+    if (container->iommu_data.error) {
+        ret = container->iommu_data.error;
+        error_report("vfio: memory listener initialization failed for container");
+        goto listener_release_exit;
+    }
+
+    container->iommu_data.initialized = true;
+
     QLIST_INIT(&container->group_list);
     QLIST_INSERT_HEAD(&space->containers, container, next);
 
@@ -774,9 +765,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
         VFIOAddressSpace *space = container->space;
         VFIOGuestIOMMU *giommu, *tmp;
 
-        if (container->iommu_data.release) {
-            container->iommu_data.release(container);
-        }
+        vfio_listener_release(container);
         QLIST_REMOVE(container, next);
 
         QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 59a321d..aff18cd 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -64,21 +64,13 @@ typedef struct VFIOAddressSpace {
 
 struct VFIOGroup;
 
-typedef struct VFIOType1 {
-    MemoryListener listener;
-    int error;
-    bool initialized;
-} VFIOType1;
-
 typedef struct VFIOContainer {
     VFIOAddressSpace *space;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     struct {
-        /* enable abstraction to support various iommu backends */
-        union {
-            VFIOType1 type1;
-        };
-        void (*release)(struct VFIOContainer *);
+        MemoryListener listener;
+        int error;
+        bool initialized;
     } iommu_data;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-23  9:13   ` Thomas Huth
  2015-09-23 13:31   ` Laurent Vivier
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

If a DMA mapping operation fails in vfio_listener_region_add() it
checks to see if we've already completed initial setup of the
container.  If so it reports an error so the setup code can fail
gracefully, otherwise throws a hw_error().

There are other potential failure cases in vfio_listener_region_add()
which could benefit from the same logic, so move it to its own
fail: block.  Later patches can use this to extend other failure cases
to fail as gracefully as possible under the circumstances.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e3152f6..9953b9c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -400,19 +400,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
         error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
                      "0x%"HWADDR_PRIx", %p) = %d (%m)",
                      container, iova, end - iova, vaddr, ret);
+        goto fail;
+    }
 
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->iommu_data.initialized) {
-            if (!container->iommu_data.error) {
-                container->iommu_data.error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
+    return;
+
+fail:
+    /*
+     * On the initfn path, store the first error in the container so we
+     * can gracefully fail.  Runtime, there's not much we can do other
+     * than throw a hardware error.
+     */
+    if (!container->iommu_data.initialized) {
+        if (!container->iommu_data.error) {
+            container->iommu_data.error = ret;
         }
+    } else {
+        hw_error("vfio: DMA mapping failed, unable to continue");
     }
 }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-18  6:38   ` Alexey Kardashevskiy
                     ` (2 more replies)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes David Gibson
                   ` (9 subsequent siblings)
  12 siblings, 3 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

The current vfio core code assumes that the host IOMMU is capable of
mapping any IOVA the guest wants to use to where we need.  However, real
IOMMUs generally only support translating a certain range of IOVAs (the
"DMA window") not a full 64-bit address space.

The common x86 IOMMUs support a wide enough range that guests are very
unlikely to go beyond it in practice, however the IOMMU used on IBM Power
machines - in the default configuration - supports only a much more limited
IOVA range, usually 0..2GiB.

If the guest attempts to set up an IOVA range that the host IOMMU can't
map, qemu won't report an error until it actually attempts to map a bad
IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
visible IOMMU) then this will show up very quickly.  If there is a guest
visible IOMMU, however, the problem might not show up until much later when
the guest actually attempt to DMA with an IOVA the host can't handle.

This patch adds a test so that we will detect earlier if the guest is
attempting to use IOVA ranges that the host IOMMU won't be able to deal
with.

For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
incorrect, but no worse than what we have already.  We can't do better for
now because the Type1 kernel interface doesn't tell us what IOVA range the
IOMMU actually supports.

For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
IOVA range and validate guest IOVA ranges against it, and this patch does
so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
 include/hw/vfio/vfio-common.h |  6 ++++++
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9953b9c..c37f1a1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
     if (int128_ge(int128_make64(iova), llend)) {
         return;
     }
+    end = int128_get64(llend);
+
+    if ((iova < container->iommu_data.min_iova)
+        || ((end - 1) > container->iommu_data.max_iova)) {
+        error_report("vfio: IOMMU container %p can't map guest IOVA region"
+                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
+                     container, iova, end - 1);
+        ret = -EFAULT; /* FIXME: better choice here? */
+        goto fail;
+    }
 
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
 
-        trace_vfio_listener_region_add_iommu(iova,
-                    int128_get64(int128_sub(llend, int128_one())));
+        trace_vfio_listener_region_add_iommu(iova, end - 1);
         /*
          * FIXME: We should do some checking to see if the
          * capabilities of the host VFIO IOMMU are adequate to model
@@ -388,7 +397,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     /* Here we assume that memory_region_is_ram(section->mr)==true */
 
-    end = int128_get64(llend);
     vaddr = memory_region_get_ram_ptr(section->mr) +
             section->offset_within_region +
             (iova - section->offset_within_address_space);
@@ -687,7 +695,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
+
+        /*
+         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
+         * IOVA whatsoever.  That's not actually true, but the current
+         * kernel interface doesn't tell us what it can map, and the
+         * existing Type1 IOMMUs generally support any IOVA we're
+         * going to actually try in practice.
+         */
+        container->iommu_data.min_iova = 0;
+        container->iommu_data.max_iova = (hwaddr)-1;
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+        struct vfio_iommu_spapr_tce_info info;
+
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
             error_report("vfio: failed to set group container: %m");
@@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
+
+        /*
+         * FIXME: This only considers the host IOMMU' 32-bit window.
+         * At some point we need to add support for the optional
+         * 64-bit window and dynamic windows
+         */
+        info.argsz = sizeof(info);
+        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
+        if (ret) {
+            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+        container->iommu_data.min_iova = info.dma32_window_start;
+        container->iommu_data.max_iova = container->iommu_data.min_iova
+            + info.dma32_window_size - 1;
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index aff18cd..88ec213 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -71,6 +71,12 @@ typedef struct VFIOContainer {
         MemoryListener listener;
         int error;
         bool initialized;
+        /*
+         * FIXME: This assumes the host IOMMU can support only a
+         * single contiguous IOVA window.  We may need to generalize
+         * that in future
+         */
+        hwaddr min_iova, max_iova;
     } iommu_data;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (2 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-23 10:29   ` Thomas Huth
  2015-09-23 14:30   ` Laurent Vivier
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications David Gibson
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

Depending on the host IOMMU type we determine and record the available page
sizes for IOMMU translation.  We'll need this for other validation in
future patches.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c              | 13 +++++++++++++
 include/hw/vfio/vfio-common.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c37f1a1..daaac48 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -680,6 +680,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
     if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
         ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
         bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
+        struct vfio_iommu_type1_info info;
 
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -705,6 +706,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
          */
         container->iommu_data.min_iova = 0;
         container->iommu_data.max_iova = (hwaddr)-1;
+
+        /* Assume just 4K IOVA page size */
+        container->iommu_data.iova_pgsizes = 0x1000;
+        info.argsz = sizeof(info);
+        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
+        /* Ignore errors */
+        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+            container->iommu_data.iova_pgsizes = info.iova_pgsizes;
+        }
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
         struct vfio_iommu_spapr_tce_info info;
 
@@ -748,6 +758,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
         container->iommu_data.min_iova = info.dma32_window_start;
         container->iommu_data.max_iova = container->iommu_data.min_iova
             + info.dma32_window_size - 1;
+
+        /* Assume just 4K IOVA pages for now */
+        container->iommu_data.iova_pgsizes = 0x1000;
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 88ec213..c09db6d 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -77,6 +77,7 @@ typedef struct VFIOContainer {
          * that in future
          */
         hwaddr min_iova, max_iova;
+        uint64_t iova_pgsizes;
     } iommu_data;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (3 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-23 10:40   ` Thomas Huth
  2015-09-23 17:04   ` Laurent Vivier
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

When we have guest visible IOMMUs, we allow notifiers to be registered
which will be informed of all changes to IOMMU mappings.  This is used by
vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.

However, unlike with a memory region listener, an iommu notifier won't be
told about any mappings which already exist in the (guest) IOMMU at the
time it is registered.  This can cause problems if hotplugging a VFIO
device onto a guest bus which had existing guest IOMMU mappings, but didn't
previously have an VFIO devices (and hence no host IOMMU mappings).

This adds a memory_region_register_iommu_notifier_replay() function to
handle this case.  As well as registering the new notifier it replays
existing mappings.  Because the IOMMU memory region doesn't internally
remember the granularity of the guest IOMMU it has a small hack where the
caller must specify a granularity at which to replay mappings.

If there are finer mappings in the guest IOMMU these will be reported in
the iotlb structures passed to the notifier which it must handle (probably
causing it to flag an error).  This isn't new - the VFIO iommu notifier
must already handle notifications about guest IOMMU mappings too short
for it to represent in the host IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/exec/memory.h | 16 ++++++++++++++++
 memory.c              | 18 ++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5baaf48..3cf145b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -583,6 +583,22 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
 
 /**
+ * memory_region_register_iommu_notifier_replay: register a notifier
+ * for changes to IOMMU translation entries, and replay existing IOMMU
+ * translations to the new notifier.
+ *
+ * @mr: the memory region to observe
+ * @n: the notifier to be added; the notifier receives a pointer to an
+ *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
+ *     valid on exit from the notifier.
+ * @granularity: Minimum page granularity to replay notifications for
+ * @is_write: Whether to treat the replay as a translate "write"
+ *     through the iommu
+ */
+void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
+                                                  hwaddr granularity, bool is_write);
+
+/**
  * memory_region_unregister_iommu_notifier: unregister a notifier for
  * changes to IOMMU translation entries.
  *
diff --git a/memory.c b/memory.c
index 0d8b2d9..6b5a2f1 100644
--- a/memory.c
+++ b/memory.c
@@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
     notifier_list_add(&mr->iommu_notify, n);
 }
 
+void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
+                                                  hwaddr granularity, bool is_write)
+{
+    hwaddr addr;
+    IOMMUTLBEntry iotlb;
+
+    memory_region_register_iommu_notifier(mr, n);
+
+    for (addr = 0;
+         int128_lt(int128_make64(addr), mr->size);
+         addr += granularity) {
+
+        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        if (iotlb.perm != IOMMU_NONE)
+            n->notify(n, &iotlb);
+    }
+}
+
 void memory_region_unregister_iommu_notifier(Notifier *n)
 {
     notifier_remove(n);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (4 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-17 16:54   ` Alex Williamson
  2015-09-23 18:44   ` Laurent Vivier
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured David Gibson
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

At present the memory listener used by vfio to keep host IOMMU mappings
in sync with the guest memory image assumes that if a guest IOMMU
appears, then it has no existing mappings.

This may not be true if a VFIO device is hotplugged onto a guest bus
which didn't previously include a VFIO device, and which has existing
guest IOMMU mappings.

Therefore, use the memory_region_register_iommu_notifier_replay()
function in order to fix this case, replaying existing guest IOMMU
mappings, bringing the host IOMMU into sync with the guest IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/common.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index daaac48..543c38e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -312,6 +312,22 @@ out:
     rcu_read_unlock();
 }
 
+static hwaddr vfio_container_granularity(VFIOContainer *container)
+{
+    uint64_t pgsize;
+
+    assert(container->iommu_data.iova_pgsizes);
+
+    /* Find the smallest page size supported by the IOMMU */
+    for (pgsize = 1; pgsize; pgsize <<= 1) {
+        if (pgsize & container->iommu_data.iova_pgsizes) {
+            return pgsize;
+        }
+    }
+    /* Can't happen */
+    assert(0);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
@@ -371,26 +387,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * would be the right place to wire that up (tell the KVM
          * device emulation the VFIO iommu handles to use).
          */
-        /*
-         * This assumes that the guest IOMMU is empty of
-         * mappings at this point.
-         *
-         * One way of doing this is:
-         * 1. Avoid sharing IOMMUs between emulated devices or different
-         * IOMMU groups.
-         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
-         * there are some mappings in IOMMU.
-         *
-         * VFIO on SPAPR does that. Other IOMMU models may do that different,
-         * they must make sure there are no existing mappings or
-         * loop through existing mappings to map them into VFIO.
-         */
         giommu = g_malloc0(sizeof(*giommu));
         giommu->iommu = section->mr;
         giommu->container = container;
         giommu->n.notify = vfio_iommu_map_notify;
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+        memory_region_register_iommu_notifier_replay(giommu->iommu, &giommu->n,
+            vfio_container_granularity(container), false);
 
         return;
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (5 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-23 11:08   ` Thomas Huth
  2015-09-23 18:55   ` Laurent Vivier
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter David Gibson
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

At present the PCI host bridge (PHB) for the pseries machine type has a
fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
memory via the PAPR paravirtualized IOMMU.

For better support of VFIO devices, we're going to want to allow for
different configurations of the DMA window.

Eventually we'll want to allow the guest itself to reconfigure the window
via the PAPR dynamic DMA window interface, but as a preliminary this patch
allows the user to reconfigure the window with new properties on the PHB
device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_pci.c          | 7 +++++--
 include/hw/pci-host/spapr.h | 3 +--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index b088491..622c4ac 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1394,7 +1394,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
     sPAPRTCETable *tcet;
     uint32_t nb_table;
 
-    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
+    nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
     tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
                                0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
     if (!tcet) {
@@ -1404,7 +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
     }
 
     /* Register default 32bit DMA window */
-    memory_region_add_subregion(&sphb->iommu_root, 0,
+    memory_region_add_subregion(&sphb->iommu_root, sphb->dma_win_addr,
                                 spapr_tce_get_iommu(tcet));
 }
 
@@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] = {
                        SPAPR_PCI_IO_WIN_SIZE),
     DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
                      true),
+    /* Default DMA window is 0..1GB */
+    DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr, 0),
+    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState, dma_win_size, 0x40000000),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 5322b56..7de5e02 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -78,6 +78,7 @@ struct sPAPRPHBState {
     MemoryRegion memwindow, iowindow, msiwindow;
 
     uint32_t dma_liobn;
+    hwaddr dma_win_addr, dma_win_size;
     AddressSpace iommu_as;
     MemoryRegion iommu_root;
 
@@ -115,8 +116,6 @@ struct sPAPRPHBVFIOState {
 
 #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
 
-#define SPAPR_PCI_DMA32_SIZE         0x40000000
-
 static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (6 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-17 16:54   ` Alex Williamson
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO David Gibson
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

The vfio_accel parameter used when creating a new TCE table (guest IOMMU
context) has a confusing name.  What it really means is whether we need the
TCE table created to be able to support VFIO devices.

VFIO is relevant, because when available we use in-kernel acceleration of
the TCE table, but that may not work with VFIO devices because updates to
the table are handled in kernel, bypass qemu and so don't hit qemu's
infrastructure for keeping the VFIO host IOMMU state in sync with the guest
IOMMU state.

Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
with no impact on the logic.
There's a capability

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_iommu.c   | 6 +++---
 include/hw/ppc/spapr.h | 4 ++--
 target-ppc/kvm.c       | 4 ++--
 target-ppc/kvm_ppc.h   | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index f61504e..5166cde 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -146,7 +146,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
         tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
                                               window_size,
                                               &tcet->fd,
-                                              tcet->vfio_accel);
+                                              tcet->need_vfio);
     }
 
     if (!tcet->table) {
@@ -172,7 +172,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    uint64_t bus_offset,
                                    uint32_t page_shift,
                                    uint32_t nb_table,
-                                   bool vfio_accel)
+                                   bool need_vfio)
 {
     sPAPRTCETable *tcet;
     char tmp[64];
@@ -192,7 +192,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
     tcet->bus_offset = bus_offset;
     tcet->page_shift = page_shift;
     tcet->nb_table = nb_table;
-    tcet->vfio_accel = vfio_accel;
+    tcet->need_vfio = need_vfio;
 
     snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
     object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index ffb108d..03a9804 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -562,7 +562,7 @@ struct sPAPRTCETable {
     uint32_t page_shift;
     uint64_t *table;
     bool bypass;
-    bool vfio_accel;
+    bool need_vfio;
     int fd;
     MemoryRegion iommu;
     struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
@@ -587,7 +587,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    uint64_t bus_offset,
                                    uint32_t page_shift,
                                    uint32_t nb_table,
-                                   bool vfio_accel);
+                                   bool need_vfio);
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
 int spapr_dma_dt(void *fdt, int node_off, const char *propname,
                  uint32_t liobn, uint64_t window, uint32_t size);
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 9cf5308..9191f13 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2071,7 +2071,7 @@ bool kvmppc_spapr_use_multitce(void)
 }
 
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
-                              bool vfio_accel)
+                              bool need_vfio)
 {
     struct kvm_create_spapr_tce args = {
         .liobn = liobn,
@@ -2085,7 +2085,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
      * destroying the table, which the upper layers -will- do
      */
     *pfd = -1;
-    if (!cap_spapr_tce || (vfio_accel && !cap_spapr_vfio)) {
+    if (!cap_spapr_tce || (need_vfio && !cap_spapr_vfio)) {
         return NULL;
     }
 
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 0714ba0..8b69e73 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -38,7 +38,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
 off_t kvmppc_alloc_rma(void **rma);
 bool kvmppc_spapr_use_multitce(void);
 void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
-                              bool vfio_accel);
+                              bool need_vfio);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
 uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (7 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-17 16:54   ` Alex Williamson
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 10/10] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge David Gibson
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
this is decided at creation time.

To support hotplug of VFIO devices, we need to allow a TCE table which
previously didn't allow VFIO devices to be switched so that it can.  This
patch adds an spapr_tce_need_vfio() function to do this, by reallocating
the table in userspace if necessary.

Currently this doesn't allow the KVM acceleration to be re-enabled if all
the VFIO devices are removed.  That's an optimization for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_iommu.c   | 19 +++++++++++++++++++
 include/hw/ppc/spapr.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 5166cde..109e10d 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -168,6 +168,25 @@ static int spapr_tce_table_realize(DeviceState *dev)
     return 0;
 }
 
+void spapr_tce_need_vfio(sPAPRTCETable *tcet)
+{
+    size_t table_size = tcet->nb_table * sizeof(uint64_t);
+    void *newtable;
+
+    if (tcet->fd < 0) {
+        /* Table is already in userspace, nothing to be done */
+        return;
+    }
+
+    newtable = g_malloc0(table_size);
+    memcpy(newtable, tcet->table, table_size);
+
+    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
+
+    tcet->fd = -1;
+    tcet->table = newtable;
+}
+
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    uint64_t bus_offset,
                                    uint32_t page_shift,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 03a9804..38a29fb 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -588,6 +588,8 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    uint32_t page_shift,
                                    uint32_t nb_table,
                                    bool need_vfio);
+void spapr_tce_need_vfio(sPAPRTCETable *tcet);
+
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
 int spapr_dma_dt(void *fdt, int node_off, const char *propname,
                  uint32_t liobn, uint64_t window, uint32_t size);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Qemu-devel] [RFC PATCH 10/10] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (8 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO David Gibson
@ 2015-09-17 13:09 ` David Gibson
  2015-09-17 16:54 ` [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge Alex Williamson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 13:09 UTC (permalink / raw)
  To: alex.williamson, aik, gwshan
  Cc: lvivier, thuth, qemu-devel, qemu-ppc, pbonzini, David Gibson

The core VFIO infrastructure more or less allows VFIO devices to work
on any normal guest PCI host bridge (PHB) without extra logic.
However, the "spapr-pci-host-bridge" device (as opposed to the special
"spapr-pci-vfio-host-bridge" device) breaks this by using a partially
KVM accelerated implementation of the guest kernel IOMMU which won't
work with VFIO devices, without additional kernel support.

This patch allows VFIO devices to work on the spapr-pci-host-bridge,
by having it switch off KVM TCE acceleration when a VFIO device is
added to the PHB (either on startup, or by hotplug).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_pci.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 622c4ac..a0cca22 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1090,6 +1090,12 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc,
     void *fdt = NULL;
     int fdt_start_offset = 0, fdt_size;
 
+    if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
+        sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
+
+        spapr_tce_need_vfio(tcet);
+    }
+
     if (dev->hotplugged) {
         fdt = create_device_tree(&fdt_size);
         fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (9 preceding siblings ...)
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 10/10] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge David Gibson
@ 2015-09-17 16:54 ` Alex Williamson
  2015-09-23 11:26 ` Thomas Huth
  2015-09-23 16:46 ` Laurent Vivier
  12 siblings, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2015-09-17 16:54 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> Currently the pseries machine type uses two types of PCI Host Bridge
> (PHB) devices: "spapr-pci-host-bridge" the 'normal' variant intended
> for emulated PCI devices, and "spapr-pci-vfio-host-bridge" intended
> for VFIO devices.
> 
> When using VFIO with pseries, a separate spapr-pci-vfio-host-bridge
> device is needed for every host IOMMU group from which you're using
> VFIO devices.  This is quite awkward for the user and/or management
> tools.  It's especially awkward since the current code makes
> essentially no attempt to detect and warn the user if the wrong sorts
> of devices are connected to the wrong PHB.
> 
> It turns out that the VFIO core code is actually general enough that
> VFIO devices almost work on the normal spapr-pci-host-bridge device.
> In fact with the right combination of circumstances they *can* work
> right now.
> 
> spapr-pci-vfio-host-bridge does 3 additional things:
> 
>     1) It disables KVM acceleration of the guest IOMMU.  That
>        acceleration breaks VFIO because it means guest IOMMU updates
>        bypass the VFIO infrastructure which keeps the host IOMMU in
>        sync.
> 
>     2) It automatically configures the guest PHB's DMA window to match
>        the capabilities of the host IOMMU, and advertises that to the
>        guest.
> 
>     3) It provides additional handling of EEH (Enhanced Error
>        Handling) functions.
> 
> This patch series:
>     * Allows VFIO devices to be used on the spapr-pci-host-bridge by
>       auto-switching the KVM TCE acceleration
> 
>     * Adds verification that the host IOMMU can handle the DMA windows
>       used by guest PHBs
> 
>     * Allows the DMA window on the guest PHB to be configured with
>       device properties.  This can be used to make sure it matches a
>       host window, but in practice the default setting will already
>       work with the host IOMMU on all current systems.
> 
>     * Adds support to the VFIO core to allow a VFIO device to be
>       hotplugged onto a bus which doesn't yet have VFIO devices.  This
>       already worked for systems without a guest-visible IOMMU
>       (i.e. x86), this series makes it work even with a guest visible
>       IOMMU.
> 
>     * Makes a few related cleanups along the way
> 
> This series does NOT allow EEH operations on VFIO devices on the
> spapr-pci-host-bridge device, so the spapr-pci-vfio-host-bridge device
> is left in for now.  It turns out there are some serious existing
> problems in both the qemu EEH implementation and (worse) in the
> EEH/VFIO kernel interface.  Fixing those is a problem for another day.
> Maybe tomorrow.
> 
> 
> I've tested basic assignment of an xHCI to a pseries guest, both at
> startup and with hotplug.  I haven't (yet) tested VFIO on x86 with
> this series.
> 
> This series probably needs to be merged via several different trees.
> I'm intending to split up as necessary once it's had some review.
> 
> David Gibson (10):
>   vfio: Remove unneeded union from VFIOContainer
>   vfio: Generalize vfio_listener_region_add failure path
>   vfio: Check guest IOVA ranges against host IOMMU capabilities
>   vfio: Record host IOMMU's available IO page sizes
>   memory: Allow replay of IOMMU mapping notifications
>   vfio: Allow hotplug of containers onto existing guest IOMMU mappings
>   spapr_pci: Allow PCI host bridge DMA window to be configured
>   spapr_iommu: Rename vfio_accel parameter
>   spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
>   spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
> 
>  hw/ppc/spapr_iommu.c          |  25 ++++++-
>  hw/ppc/spapr_pci.c            |  13 +++-
>  hw/vfio/common.c              | 152 +++++++++++++++++++++++++++---------------
>  include/exec/memory.h         |  16 +++++
>  include/hw/pci-host/spapr.h   |   3 +-
>  include/hw/ppc/spapr.h        |   6 +-
>  include/hw/vfio/vfio-common.h |  21 +++---
>  memory.c                      |  18 +++++
>  target-ppc/kvm.c              |   4 +-
>  target-ppc/kvm_ppc.h          |   2 +-
>  10 files changed, 184 insertions(+), 76 deletions(-)
> 

Very nice series.  A couple minor comments that I'll send against the
relevant patches.  I don't see any reason to suspect breakage on other
platforms, but I'll give it a smoke test on x86.  The hack we have in
place to skip mapping PCI BARs while they're being sized is sort of
nagging at me in light of this new min/max_iova, but perhaps we always
need to handle that as a special case or make PCI-core stop trying to
register those mappings.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
@ 2015-09-17 16:54   ` Alex Williamson
  2015-09-17 23:31     ` David Gibson
  2015-09-23 18:44   ` Laurent Vivier
  1 sibling, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2015-09-17 16:54 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> At present the memory listener used by vfio to keep host IOMMU mappings
> in sync with the guest memory image assumes that if a guest IOMMU
> appears, then it has no existing mappings.
> 
> This may not be true if a VFIO device is hotplugged onto a guest bus
> which didn't previously include a VFIO device, and which has existing
> guest IOMMU mappings.
> 
> Therefore, use the memory_region_register_iommu_notifier_replay()
> function in order to fix this case, replaying existing guest IOMMU
> mappings, bringing the host IOMMU into sync with the guest IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c | 34 +++++++++++++++++++---------------
>  1 file changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index daaac48..543c38e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -312,6 +312,22 @@ out:
>      rcu_read_unlock();
>  }
>  
> +static hwaddr vfio_container_granularity(VFIOContainer *container)
> +{
> +    uint64_t pgsize;
> +
> +    assert(container->iommu_data.iova_pgsizes);

return (hwaddr)1 << (ffsl(container->iommu_data.iova_pgsizes) - 1;

?

> +
> +    /* Find the smallest page size supported by the IOMMU */
> +    for (pgsize = 1; pgsize; pgsize <<= 1) {
> +        if (pgsize & container->iommu_data.iova_pgsizes) {
> +            return pgsize;
> +        }
> +    }
> +    /* Can't happen */
> +    assert(0);
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> @@ -371,26 +387,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           * would be the right place to wire that up (tell the KVM
>           * device emulation the VFIO iommu handles to use).
>           */
> -        /*
> -         * This assumes that the guest IOMMU is empty of
> -         * mappings at this point.
> -         *
> -         * One way of doing this is:
> -         * 1. Avoid sharing IOMMUs between emulated devices or different
> -         * IOMMU groups.
> -         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
> -         * there are some mappings in IOMMU.
> -         *
> -         * VFIO on SPAPR does that. Other IOMMU models may do that different,
> -         * they must make sure there are no existing mappings or
> -         * loop through existing mappings to map them into VFIO.
> -         */
>          giommu = g_malloc0(sizeof(*giommu));
>          giommu->iommu = section->mr;
>          giommu->container = container;
>          giommu->n.notify = vfio_iommu_map_notify;
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +
> +        memory_region_register_iommu_notifier_replay(giommu->iommu, &giommu->n,
> +            vfio_container_granularity(container), false);
>  
>          return;
>      }

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter David Gibson
@ 2015-09-17 16:54   ` Alex Williamson
  2015-09-17 23:34     ` David Gibson
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2015-09-17 16:54 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> The vfio_accel parameter used when creating a new TCE table (guest IOMMU
> context) has a confusing name.  What it really means is whether we need the
> TCE table created to be able to support VFIO devices.
> 
> VFIO is relevant, because when available we use in-kernel acceleration of
> the TCE table, but that may not work with VFIO devices because updates to
> the table are handled in kernel, bypass qemu and so don't hit qemu's
> infrastructure for keeping the VFIO host IOMMU state in sync with the guest
> IOMMU state.
> 
> Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
> with no impact on the logic.
> There's a capability

nit, why entangle the technology you're using, vfio, with the feature
you want, visible iommu updates?  You could rename this to something
even more self describing, like disable_in_kernel_updates, or whatever
more concise name you can come up with.

> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_iommu.c   | 6 +++---
>  include/hw/ppc/spapr.h | 4 ++--
>  target-ppc/kvm.c       | 4 ++--
>  target-ppc/kvm_ppc.h   | 2 +-
>  4 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index f61504e..5166cde 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -146,7 +146,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>          tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>                                                window_size,
>                                                &tcet->fd,
> -                                              tcet->vfio_accel);
> +                                              tcet->need_vfio);
>      }
>  
>      if (!tcet->table) {
> @@ -172,7 +172,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>                                     uint64_t bus_offset,
>                                     uint32_t page_shift,
>                                     uint32_t nb_table,
> -                                   bool vfio_accel)
> +                                   bool need_vfio)
>  {
>      sPAPRTCETable *tcet;
>      char tmp[64];
> @@ -192,7 +192,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>      tcet->bus_offset = bus_offset;
>      tcet->page_shift = page_shift;
>      tcet->nb_table = nb_table;
> -    tcet->vfio_accel = vfio_accel;
> +    tcet->need_vfio = need_vfio;
>  
>      snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
>      object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ffb108d..03a9804 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -562,7 +562,7 @@ struct sPAPRTCETable {
>      uint32_t page_shift;
>      uint64_t *table;
>      bool bypass;
> -    bool vfio_accel;
> +    bool need_vfio;
>      int fd;
>      MemoryRegion iommu;
>      struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
> @@ -587,7 +587,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>                                     uint64_t bus_offset,
>                                     uint32_t page_shift,
>                                     uint32_t nb_table,
> -                                   bool vfio_accel);
> +                                   bool need_vfio);
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
>  int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>                   uint32_t liobn, uint64_t window, uint32_t size);
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 9cf5308..9191f13 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2071,7 +2071,7 @@ bool kvmppc_spapr_use_multitce(void)
>  }
>  
>  void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
> -                              bool vfio_accel)
> +                              bool need_vfio)
>  {
>      struct kvm_create_spapr_tce args = {
>          .liobn = liobn,
> @@ -2085,7 +2085,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
>       * destroying the table, which the upper layers -will- do
>       */
>      *pfd = -1;
> -    if (!cap_spapr_tce || (vfio_accel && !cap_spapr_vfio)) {
> +    if (!cap_spapr_tce || (need_vfio && !cap_spapr_vfio)) {
>          return NULL;
>      }
>  
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 0714ba0..8b69e73 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -38,7 +38,7 @@ int kvmppc_booke_watchdog_enable(PowerPCCPU *cpu);
>  off_t kvmppc_alloc_rma(void **rma);
>  bool kvmppc_spapr_use_multitce(void);
>  void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd,
> -                              bool vfio_accel);
> +                              bool need_vfio);
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO David Gibson
@ 2015-09-17 16:54   ` Alex Williamson
  2015-09-23 11:24     ` Thomas Huth
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2015-09-17 16:54 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
> all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
> this is decided at creation time.
> 
> To support hotplug of VFIO devices, we need to allow a TCE table which
> previously didn't allow VFIO devices to be switched so that it can.  This
> patch adds an spapr_tce_need_vfio() function to do this, by reallocating
> the table in userspace if necessary.
> 
> Currently this doesn't allow the KVM acceleration to be re-enabled if all
> the VFIO devices are removed.  That's an optimization for another time.


Same comment as previous patch, spapr_tce_need_userspace_table() (or
something) makes the code much more self documenting.


> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_iommu.c   | 19 +++++++++++++++++++
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 5166cde..109e10d 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -168,6 +168,25 @@ static int spapr_tce_table_realize(DeviceState *dev)
>      return 0;
>  }
>  
> +void spapr_tce_need_vfio(sPAPRTCETable *tcet)
> +{
> +    size_t table_size = tcet->nb_table * sizeof(uint64_t);
> +    void *newtable;
> +
> +    if (tcet->fd < 0) {
> +        /* Table is already in userspace, nothing to be done */
> +        return;
> +    }
> +
> +    newtable = g_malloc0(table_size);
> +    memcpy(newtable, tcet->table, table_size);
> +
> +    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
> +
> +    tcet->fd = -1;
> +    tcet->table = newtable;
> +}
> +
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>                                     uint64_t bus_offset,
>                                     uint32_t page_shift,
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 03a9804..38a29fb 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -588,6 +588,8 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>                                     uint32_t page_shift,
>                                     uint32_t nb_table,
>                                     bool need_vfio);
> +void spapr_tce_need_vfio(sPAPRTCETable *tcet);
> +
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
>  int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>                   uint32_t liobn, uint64_t window, uint32_t size);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-17 16:54   ` Alex Williamson
@ 2015-09-17 23:31     ` David Gibson
  2015-09-23 11:02       ` Thomas Huth
  0 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-17 23:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: lvivier, thuth, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Thu, Sep 17, 2015 at 10:54:24AM -0600, Alex Williamson wrote:
> On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> > At present the memory listener used by vfio to keep host IOMMU mappings
> > in sync with the guest memory image assumes that if a guest IOMMU
> > appears, then it has no existing mappings.
> > 
> > This may not be true if a VFIO device is hotplugged onto a guest bus
> > which didn't previously include a VFIO device, and which has existing
> > guest IOMMU mappings.
> > 
> > Therefore, use the memory_region_register_iommu_notifier_replay()
> > function in order to fix this case, replaying existing guest IOMMU
> > mappings, bringing the host IOMMU into sync with the guest IOMMU.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/vfio/common.c | 34 +++++++++++++++++++---------------
> >  1 file changed, 19 insertions(+), 15 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index daaac48..543c38e 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -312,6 +312,22 @@ out:
> >      rcu_read_unlock();
> >  }
> >  
> > +static hwaddr vfio_container_granularity(VFIOContainer *container)
> > +{
> > +    uint64_t pgsize;
> > +
> > +    assert(container->iommu_data.iova_pgsizes);
> 
> return (hwaddr)1 << (ffsl(container->iommu_data.iova_pgsizes) - 1;

Ah, yes, that should work.  I didn't do it that way mostly because I
tend to confuse myself when I try to remember exactly how ffs
semantics work.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter
  2015-09-17 16:54   ` Alex Williamson
@ 2015-09-17 23:34     ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-17 23:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: lvivier, thuth, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1616 bytes --]

On Thu, Sep 17, 2015 at 10:54:38AM -0600, Alex Williamson wrote:
> On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> > The vfio_accel parameter used when creating a new TCE table (guest IOMMU
> > context) has a confusing name.  What it really means is whether we need the
> > TCE table created to be able to support VFIO devices.
> > 
> > VFIO is relevant, because when available we use in-kernel acceleration of
> > the TCE table, but that may not work with VFIO devices because updates to
> > the table are handled in kernel, bypass qemu and so don't hit qemu's
> > infrastructure for keeping the VFIO host IOMMU state in sync with the guest
> > IOMMU state.
> > 
> > Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
> > with no impact on the logic.
> > There's a capability
> 
> nit, why entangle the technology you're using, vfio, with the feature
> you want, visible iommu updates?  You could rename this to something
> even more self describing, like disable_in_kernel_updates, or whatever
> more concise name you can come up with.

Because vfio availability really is the feature I want.  If the kernel
gains the ability to wire up the KVM TCE acceleration with VFIO (which
I believe is in IBM's plans), then the idea is that
spapr_tce_new_table() will return an accelerated table, even when VFIO
is requested.

Same thing with spapr_tce_need_vfio().

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
@ 2015-09-18  6:15   ` Alexey Kardashevskiy
  2015-09-23 10:31   ` Thomas Huth
  2015-09-23 13:18   ` Laurent Vivier
  2 siblings, 0 replies; 54+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-18  6:15 UTC (permalink / raw)
  To: David Gibson, alex.williamson, gwshan
  Cc: lvivier, pbonzini, thuth, qemu-ppc, qemu-devel

On 09/17/2015 11:09 PM, David Gibson wrote:
> Currently the VFIOContainer iommu_data field contains a union with
> different information for different host iommu types.  However:
>     * It only actually contains information for the x86-like "Type1" iommu
>     * Because we have a common listener the Type1 fields are actually used
> on all IOMMU types, including the SPAPR TCE type as well
>     * There's no tag in the VFIOContainer to tell you which union member is
> valid anyway.
>
> In fact we now have a general structure for the listener which is unlikely
> to ever need per-iommu-type information, so this patch removes the union.
>
> In a similar way we can unify the setup of the vfio memory listener in
> vfio_connect_container() that is currently split across a switch on iommu
> type, but is effectively the same in both cases.
>
> The iommu_data.release pointer was only needed as a cleanup function
> which would handle potentially different data in the union.  With the
> union gone, it too can be removed.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>   hw/vfio/common.c              | 51 +++++++++++++++++--------------------------
>   include/hw/vfio/vfio-common.h | 14 +++---------
>   2 files changed, 23 insertions(+), 42 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6d21311..e3152f6 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -316,7 +316,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            iommu_data.type1.listener);
> +                                            iommu_data.listener);
>       hwaddr iova, end;
>       Int128 llend;
>       void *vaddr;
> @@ -406,9 +406,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
>            * can gracefully fail.  Runtime, there's not much we can do other
>            * than throw a hardware error.
>            */
> -        if (!container->iommu_data.type1.initialized) {
> -            if (!container->iommu_data.type1.error) {
> -                container->iommu_data.type1.error = ret;
> +        if (!container->iommu_data.initialized) {
> +            if (!container->iommu_data.error) {
> +                container->iommu_data.error = ret;
>               }
>           } else {
>               hw_error("vfio: DMA mapping failed, unable to continue");
> @@ -420,7 +420,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            iommu_data.type1.listener);
> +                                            iommu_data.listener);
>       hwaddr iova, end;
>       int ret;
>
> @@ -485,7 +485,7 @@ static const MemoryListener vfio_memory_listener = {
>
>   static void vfio_listener_release(VFIOContainer *container)
>   {
> -    memory_listener_unregister(&container->iommu_data.type1.listener);
> +    memory_listener_unregister(&container->iommu_data.listener);
>   }
>
>   int vfio_mmap_region(Object *obj, VFIORegion *region,
> @@ -683,21 +683,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>               ret = -errno;
>               goto free_container_exit;
>           }
> -
> -        container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> -
> -        memory_listener_register(&container->iommu_data.type1.listener,
> -                                 container->space->as);
> -
> -        if (container->iommu_data.type1.error) {
> -            ret = container->iommu_data.type1.error;
> -            error_report("vfio: memory listener initialization failed for container");
> -            goto listener_release_exit;
> -        }
> -
> -        container->iommu_data.type1.initialized = true;
> -
>       } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>           ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>           if (ret) {
> @@ -723,19 +708,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>               ret = -errno;
>               goto free_container_exit;
>           }
> -
> -        container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> -
> -        memory_listener_register(&container->iommu_data.type1.listener,
> -                                 container->space->as);
> -
>       } else {
>           error_report("vfio: No available IOMMU models");
>           ret = -EINVAL;
>           goto free_container_exit;
>       }
>
> +    container->iommu_data.listener = vfio_memory_listener;
> +
> +    memory_listener_register(&container->iommu_data.listener,
> +                             container->space->as);
> +
> +    if (container->iommu_data.error) {
> +        ret = container->iommu_data.error;
> +        error_report("vfio: memory listener initialization failed for container");
> +        goto listener_release_exit;
> +    }
> +
> +    container->iommu_data.initialized = true;
> +
>       QLIST_INIT(&container->group_list);
>       QLIST_INSERT_HEAD(&space->containers, container, next);
>
> @@ -774,9 +765,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>           VFIOAddressSpace *space = container->space;
>           VFIOGuestIOMMU *giommu, *tmp;
>
> -        if (container->iommu_data.release) {
> -            container->iommu_data.release(container);
> -        }
> +        vfio_listener_release(container);
>           QLIST_REMOVE(container, next);
>
>           QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 59a321d..aff18cd 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -64,21 +64,13 @@ typedef struct VFIOAddressSpace {
>
>   struct VFIOGroup;
>
> -typedef struct VFIOType1 {
> -    MemoryListener listener;
> -    int error;
> -    bool initialized;
> -} VFIOType1;
> -
>   typedef struct VFIOContainer {
>       VFIOAddressSpace *space;
>       int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>       struct {
> -        /* enable abstraction to support various iommu backends */
> -        union {
> -            VFIOType1 type1;
> -        };
> -        void (*release)(struct VFIOContainer *);
> +        MemoryListener listener;
> +        int error;
> +        bool initialized;
>       } iommu_data;

The only difference to my patch - "[RFC PATCH qemu 2/4] vfio: Generalize 
IOMMU memory listener" - is that you keep iommu_data struct which was 
supposed to have an union for type1/spapr/etc IOMMU data but this is gone. 
What is its purpose now?



>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
@ 2015-09-18  6:38   ` Alexey Kardashevskiy
  2015-09-23 10:10   ` Thomas Huth
  2015-09-23 14:26   ` Laurent Vivier
  2 siblings, 0 replies; 54+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-18  6:38 UTC (permalink / raw)
  To: David Gibson, alex.williamson, gwshan
  Cc: lvivier, pbonzini, thuth, qemu-ppc, qemu-devel

On 09/17/2015 11:09 PM, David Gibson wrote:
> The current vfio core code assumes that the host IOMMU is capable of
> mapping any IOVA the guest wants to use to where we need.  However, real
> IOMMUs generally only support translating a certain range of IOVAs (the
> "DMA window") not a full 64-bit address space.
>
> The common x86 IOMMUs support a wide enough range that guests are very
> unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> machines - in the default configuration - supports only a much more limited
> IOVA range, usually 0..2GiB.
>
> If the guest attempts to set up an IOVA range that the host IOMMU can't
> map, qemu won't report an error until it actually attempts to map a bad
> IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> visible IOMMU) then this will show up very quickly.  If there is a guest
> visible IOMMU, however, the problem might not show up until much later when
> the guest actually attempt to DMA with an IOVA the host can't handle.
>
> This patch adds a test so that we will detect earlier if the guest is
> attempting to use IOVA ranges that the host IOMMU won't be able to deal
> with.
>
> For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> incorrect, but no worse than what we have already.  We can't do better for
> now because the Type1 kernel interface doesn't tell us what IOVA range the
> IOMMU actually supports.
>
> For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> IOVA range and validate guest IOVA ranges against it, and this patch does
> so.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>   hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
>   include/hw/vfio/vfio-common.h |  6 ++++++
>   2 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9953b9c..c37f1a1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       if (int128_ge(int128_make64(iova), llend)) {
>           return;
>       }
> +    end = int128_get64(llend);
> +
> +    if ((iova < container->iommu_data.min_iova)
> +        || ((end - 1) > container->iommu_data.max_iova)) {
> +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> +                     container, iova, end - 1);
> +        ret = -EFAULT; /* FIXME: better choice here? */
> +        goto fail;
> +    }
>
>       memory_region_ref(section->mr);
>
>       if (memory_region_is_iommu(section->mr)) {
>           VFIOGuestIOMMU *giommu;
>
> -        trace_vfio_listener_region_add_iommu(iova,
> -                    int128_get64(int128_sub(llend, int128_one())));
> +        trace_vfio_listener_region_add_iommu(iova, end - 1);
>           /*
>            * FIXME: We should do some checking to see if the
>            * capabilities of the host VFIO IOMMU are adequate to model
> @@ -388,7 +397,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>
>       /* Here we assume that memory_region_is_ram(section->mr)==true */
>
> -    end = int128_get64(llend);
>       vaddr = memory_region_get_ram_ptr(section->mr) +
>               section->offset_within_region +
>               (iova - section->offset_within_address_space);
> @@ -687,7 +695,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>               ret = -errno;
>               goto free_container_exit;
>           }
> +
> +        /*
> +         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> +         * IOVA whatsoever.  That's not actually true, but the current
> +         * kernel interface doesn't tell us what it can map, and the
> +         * existing Type1 IOMMUs generally support any IOVA we're
> +         * going to actually try in practice.
> +         */
> +        container->iommu_data.min_iova = 0;
> +        container->iommu_data.max_iova = (hwaddr)-1;


vfio_listener_skipped_section() checks for 
(section->offset_within_address_space & (1ULL << 63)) which could go if 
.max_iova here was not -1 but 0x7fff.ffff.ffff.ffff.



>       } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> +        struct vfio_iommu_spapr_tce_info info;
> +
>           ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>           if (ret) {
>               error_report("vfio: failed to set group container: %m");
> @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>               ret = -errno;
>               goto free_container_exit;
>           }
> +
> +        /*
> +         * FIXME: This only considers the host IOMMU' 32-bit window.
> +         * At some point we need to add support for the optional
> +         * 64-bit window and dynamic windows
> +         */
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> +        if (ret) {
> +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> +            ret = -errno;
> +            goto free_container_exit;
> +        }
> +        container->iommu_data.min_iova = info.dma32_window_start;
> +        container->iommu_data.max_iova = container->iommu_data.min_iova
> +            + info.dma32_window_size - 1;
>       } else {
>           error_report("vfio: No available IOMMU models");
>           ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index aff18cd..88ec213 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -71,6 +71,12 @@ typedef struct VFIOContainer {
>           MemoryListener listener;
>           int error;
>           bool initialized;
> +        /*
> +         * FIXME: This assumes the host IOMMU can support only a
> +         * single contiguous IOVA window.  We may need to generalize
> +         * that in future
> +         */
> +        hwaddr min_iova, max_iova;
>       } iommu_data;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path David Gibson
@ 2015-09-23  9:13   ` Thomas Huth
  2015-09-23 13:31   ` Laurent Vivier
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Huth @ 2015-09-23  9:13 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> If a DMA mapping operation fails in vfio_listener_region_add() it
> checks to see if we've already completed initial setup of the
> container.  If so it reports an error so the setup code can fail
> gracefully, otherwise throws a hw_error().
> 
> There are other potential failure cases in vfio_listener_region_add()
> which could benefit from the same logic, so move it to its own
> fail: block.  Later patches can use this to extend other failure cases
> to fail as gracefully as possible under the circumstances.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e3152f6..9953b9c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -400,19 +400,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>                       "0x%"HWADDR_PRIx", %p) = %d (%m)",
>                       container, iova, end - iova, vaddr, ret);
> +        goto fail;
> +    }
>  
> -        /*
> -         * On the initfn path, store the first error in the container so we
> -         * can gracefully fail.  Runtime, there's not much we can do other
> -         * than throw a hardware error.
> -         */
> -        if (!container->iommu_data.initialized) {
> -            if (!container->iommu_data.error) {
> -                container->iommu_data.error = ret;
> -            }
> -        } else {
> -            hw_error("vfio: DMA mapping failed, unable to continue");
> +    return;
> +
> +fail:
> +    /*
> +     * On the initfn path, store the first error in the container so we
> +     * can gracefully fail.  Runtime, there's not much we can do other
> +     * than throw a hardware error.
> +     */
> +    if (!container->iommu_data.initialized) {
> +        if (!container->iommu_data.error) {
> +            container->iommu_data.error = ret;
>          }
> +    } else {
> +        hw_error("vfio: DMA mapping failed, unable to continue");
>      }
>  }

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
  2015-09-18  6:38   ` Alexey Kardashevskiy
@ 2015-09-23 10:10   ` Thomas Huth
  2015-09-23 11:07     ` David Gibson
  2015-09-23 14:26   ` Laurent Vivier
  2 siblings, 1 reply; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 10:10 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> The current vfio core code assumes that the host IOMMU is capable of
> mapping any IOVA the guest wants to use to where we need.  However, real
> IOMMUs generally only support translating a certain range of IOVAs (the
> "DMA window") not a full 64-bit address space.
> 
> The common x86 IOMMUs support a wide enough range that guests are very
> unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> machines - in the default configuration - supports only a much more limited
> IOVA range, usually 0..2GiB.
> 
> If the guest attempts to set up an IOVA range that the host IOMMU can't
> map, qemu won't report an error until it actually attempts to map a bad
> IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> visible IOMMU) then this will show up very quickly.  If there is a guest
> visible IOMMU, however, the problem might not show up until much later when
> the guest actually attempt to DMA with an IOVA the host can't handle.
> 
> This patch adds a test so that we will detect earlier if the guest is
> attempting to use IOVA ranges that the host IOMMU won't be able to deal
> with.
> 
> For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> incorrect, but no worse than what we have already.  We can't do better for
> now because the Type1 kernel interface doesn't tell us what IOVA range the
> IOMMU actually supports.
> 
> For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> IOVA range and validate guest IOVA ranges against it, and this patch does
> so.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
>  include/hw/vfio/vfio-common.h |  6 ++++++
>  2 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9953b9c..c37f1a1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      if (int128_ge(int128_make64(iova), llend)) {
>          return;
>      }
> +    end = int128_get64(llend);
> +
> +    if ((iova < container->iommu_data.min_iova)
> +        || ((end - 1) > container->iommu_data.max_iova)) {

(Too much paranthesis for my taste ;-))

> +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> +                     container, iova, end - 1);
> +        ret = -EFAULT; /* FIXME: better choice here? */

Maybe -EINVAL? ... but -EFAULT also sounds ok for me.

> +        goto fail;
> +    }
...
> @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> +
> +        /*
> +         * FIXME: This only considers the host IOMMU' 32-bit window.
> +         * At some point we need to add support for the optional
> +         * 64-bit window and dynamic windows
> +         */
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> +        if (ret) {
> +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");

Isn't that %m a glibc extension only? ... Well, this code likely only
runs on Linux with a glibc, so it likely doesn't matter, I guess...

> +            ret = -errno;
> +            goto free_container_exit;
> +        }
> +        container->iommu_data.min_iova = info.dma32_window_start;
> +        container->iommu_data.max_iova = container->iommu_data.min_iova
> +            + info.dma32_window_size - 1;
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index aff18cd..88ec213 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -71,6 +71,12 @@ typedef struct VFIOContainer {
>          MemoryListener listener;
>          int error;
>          bool initialized;
> +        /*
> +         * FIXME: This assumes the host IOMMU can support only a
> +         * single contiguous IOVA window.  We may need to generalize
> +         * that in future
> +         */
> +        hwaddr min_iova, max_iova;

Should that maybe be dma_addr_t instead of hwaddr ?

>      } iommu_data;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
> 

 Thomas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes David Gibson
@ 2015-09-23 10:29   ` Thomas Huth
  2015-09-23 14:30   ` Laurent Vivier
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 10:29 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> Depending on the host IOMMU type we determine and record the available page
> sizes for IOMMU translation.  We'll need this for other validation in
> future patches.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 13 +++++++++++++
>  include/hw/vfio/vfio-common.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index c37f1a1..daaac48 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -680,6 +680,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>      if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>          ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>          bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> +        struct vfio_iommu_type1_info info;
>  
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
> @@ -705,6 +706,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>           */
>          container->iommu_data.min_iova = 0;
>          container->iommu_data.max_iova = (hwaddr)-1;
> +
> +        /* Assume just 4K IOVA page size */
> +        container->iommu_data.iova_pgsizes = 0x1000;
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
> +        /* Ignore errors */
> +        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> +            container->iommu_data.iova_pgsizes = info.iova_pgsizes;
> +        }
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>          struct vfio_iommu_spapr_tce_info info;
>  
> @@ -748,6 +758,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>          container->iommu_data.min_iova = info.dma32_window_start;
>          container->iommu_data.max_iova = container->iommu_data.min_iova
>              + info.dma32_window_size - 1;
> +
> +        /* Assume just 4K IOVA pages for now */
> +        container->iommu_data.iova_pgsizes = 0x1000;
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 88ec213..c09db6d 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -77,6 +77,7 @@ typedef struct VFIOContainer {
>           * that in future
>           */
>          hwaddr min_iova, max_iova;
> +        uint64_t iova_pgsizes;
>      } iommu_data;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
  2015-09-18  6:15   ` Alexey Kardashevskiy
@ 2015-09-23 10:31   ` Thomas Huth
  2015-09-23 23:14     ` David Gibson
  2015-09-23 13:18   ` Laurent Vivier
  2 siblings, 1 reply; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 10:31 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> Currently the VFIOContainer iommu_data field contains a union with
> different information for different host iommu types.  However:
>    * It only actually contains information for the x86-like "Type1" iommu
>    * Because we have a common listener the Type1 fields are actually used
> on all IOMMU types, including the SPAPR TCE type as well
>    * There's no tag in the VFIOContainer to tell you which union member is
> valid anyway.
> 
> In fact we now have a general structure for the listener which is unlikely
> to ever need per-iommu-type information, so this patch removes the union.
> 
> In a similar way we can unify the setup of the vfio memory listener in
> vfio_connect_container() that is currently split across a switch on iommu
> type, but is effectively the same in both cases.
> 
> The iommu_data.release pointer was only needed as a cleanup function
> which would handle potentially different data in the union.  With the
> union gone, it too can be removed.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 51 +++++++++++++++++--------------------------
>  include/hw/vfio/vfio-common.h | 14 +++---------
>  2 files changed, 23 insertions(+), 42 deletions(-)
...
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 59a321d..aff18cd 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -64,21 +64,13 @@ typedef struct VFIOAddressSpace {
>  
>  struct VFIOGroup;
>  
> -typedef struct VFIOType1 {
> -    MemoryListener listener;
> -    int error;
> -    bool initialized;
> -} VFIOType1;
> -
>  typedef struct VFIOContainer {
>      VFIOAddressSpace *space;
>      int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>      struct {
> -        /* enable abstraction to support various iommu backends */
> -        union {
> -            VFIOType1 type1;
> -        };
> -        void (*release)(struct VFIOContainer *);
> +        MemoryListener listener;
> +        int error;
> +        bool initialized;
>      } iommu_data;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
> 

I think I agree with Alexey here ... keeping the iommu_data struct
around those fields looks cumbersome. Is there a reason you did not
remove the struct completely?

 Thomas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications David Gibson
@ 2015-09-23 10:40   ` Thomas Huth
  2015-09-23 16:35     ` Laurent Vivier
  2015-09-23 23:47     ` David Gibson
  2015-09-23 17:04   ` Laurent Vivier
  1 sibling, 2 replies; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 10:40 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_register_iommu_notifier_replay() function to
> handle this case.  As well as registering the new notifier it replays
> existing mappings.  Because the IOMMU memory region doesn't internally
> remember the granularity of the guest IOMMU it has a small hack where the
> caller must specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/exec/memory.h | 16 ++++++++++++++++
>  memory.c              | 18 ++++++++++++++++++
>  2 files changed, 34 insertions(+)

> diff --git a/memory.c b/memory.c
> index 0d8b2d9..6b5a2f1 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>      notifier_list_add(&mr->iommu_notify, n);
>  }
>  
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity, bool is_write)

granularity itself is not an address, but a size, isn't it? So using
"hwaddr" sounds wrong here.

> +{
> +    hwaddr addr;

dma_addr_t ?

> +    IOMMUTLBEntry iotlb;
> +
> +    memory_region_register_iommu_notifier(mr, n);
> +
> +    for (addr = 0;
> +         int128_lt(int128_make64(addr), mr->size);
> +         addr += granularity) {
> +
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        if (iotlb.perm != IOMMU_NONE)
> +            n->notify(n, &iotlb);

Missing curly braces.

> +    }
> +}
> +

 Thomas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-17 23:31     ` David Gibson
@ 2015-09-23 11:02       ` Thomas Huth
  2015-09-23 23:50         ` David Gibson
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 11:02 UTC (permalink / raw)
  To: David Gibson, Alex Williamson
  Cc: lvivier, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On 18/09/15 01:31, David Gibson wrote:
> On Thu, Sep 17, 2015 at 10:54:24AM -0600, Alex Williamson wrote:
>> On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
>>> At present the memory listener used by vfio to keep host IOMMU mappings
>>> in sync with the guest memory image assumes that if a guest IOMMU
>>> appears, then it has no existing mappings.
>>>
>>> This may not be true if a VFIO device is hotplugged onto a guest bus
>>> which didn't previously include a VFIO device, and which has existing
>>> guest IOMMU mappings.
>>>
>>> Therefore, use the memory_region_register_iommu_notifier_replay()
>>> function in order to fix this case, replaying existing guest IOMMU
>>> mappings, bringing the host IOMMU into sync with the guest IOMMU.
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>>  hw/vfio/common.c | 34 +++++++++++++++++++---------------
>>>  1 file changed, 19 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index daaac48..543c38e 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -312,6 +312,22 @@ out:
>>>      rcu_read_unlock();
>>>  }
>>>  
>>> +static hwaddr vfio_container_granularity(VFIOContainer *container)
>>> +{
>>> +    uint64_t pgsize;
>>> +
>>> +    assert(container->iommu_data.iova_pgsizes);
>>
>> return (hwaddr)1 << (ffsl(container->iommu_data.iova_pgsizes) - 1;
> 
> Ah, yes, that should work.  I didn't do it that way mostly because I
> tend to confuse myself when I try to remember exactly how ffs
> semantics work.

Maybe use ffsll instead of ffsl, in case the code ever runs on a 32-bit
host instead of a 64-bit host ?

 Thomas




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-23 10:10   ` Thomas Huth
@ 2015-09-23 11:07     ` David Gibson
  2015-09-23 23:43       ` David Gibson
  0 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-23 11:07 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 5440 bytes --]

On Wed, Sep 23, 2015 at 12:10:46PM +0200, Thomas Huth wrote:
> On 17/09/15 15:09, David Gibson wrote:
> > The current vfio core code assumes that the host IOMMU is capable of
> > mapping any IOVA the guest wants to use to where we need.  However, real
> > IOMMUs generally only support translating a certain range of IOVAs (the
> > "DMA window") not a full 64-bit address space.
> > 
> > The common x86 IOMMUs support a wide enough range that guests are very
> > unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> > machines - in the default configuration - supports only a much more limited
> > IOVA range, usually 0..2GiB.
> > 
> > If the guest attempts to set up an IOVA range that the host IOMMU can't
> > map, qemu won't report an error until it actually attempts to map a bad
> > IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> > visible IOMMU) then this will show up very quickly.  If there is a guest
> > visible IOMMU, however, the problem might not show up until much later when
> > the guest actually attempt to DMA with an IOVA the host can't handle.
> > 
> > This patch adds a test so that we will detect earlier if the guest is
> > attempting to use IOVA ranges that the host IOMMU won't be able to deal
> > with.
> > 
> > For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> > incorrect, but no worse than what we have already.  We can't do better for
> > now because the Type1 kernel interface doesn't tell us what IOVA range the
> > IOMMU actually supports.
> > 
> > For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> > IOVA range and validate guest IOVA ranges against it, and this patch does
> > so.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
> >  include/hw/vfio/vfio-common.h |  6 ++++++
> >  2 files changed, 45 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 9953b9c..c37f1a1 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
> >      if (int128_ge(int128_make64(iova), llend)) {
> >          return;
> >      }
> > +    end = int128_get64(llend);
> > +
> > +    if ((iova < container->iommu_data.min_iova)
> > +        || ((end - 1) > container->iommu_data.max_iova)) {
> 
> (Too much paranthesis for my taste ;-))

Yes, well, we've already established our tastes differ on that point.

> > +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> > +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> > +                     container, iova, end - 1);
> > +        ret = -EFAULT; /* FIXME: better choice here? */
> 
> Maybe -EINVAL? ... but -EFAULT also sounds ok for me.

I try to avoid EINVAL unless it's clearly the only right choice.  So
many things use it that it tends to be very unhelpful when you get one.

> > +        goto fail;
> > +    }
> ...
> > @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> >              ret = -errno;
> >              goto free_container_exit;
> >          }
> > +
> > +        /*
> > +         * FIXME: This only considers the host IOMMU' 32-bit window.
> > +         * At some point we need to add support for the optional
> > +         * 64-bit window and dynamic windows
> > +         */
> > +        info.argsz = sizeof(info);
> > +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > +        if (ret) {
> > +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> 
> Isn't that %m a glibc extension only? ... Well, this code likely only
> runs on Linux with a glibc, so it likely doesn't matter, I guess...

Yes, it is, but it's already used extensively within qemu.

> > +            ret = -errno;
> > +            goto free_container_exit;
> > +        }
> > +        container->iommu_data.min_iova = info.dma32_window_start;
> > +        container->iommu_data.max_iova = container->iommu_data.min_iova
> > +            + info.dma32_window_size - 1;
> >      } else {
> >          error_report("vfio: No available IOMMU models");
> >          ret = -EINVAL;
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index aff18cd..88ec213 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -71,6 +71,12 @@ typedef struct VFIOContainer {
> >          MemoryListener listener;
> >          int error;
> >          bool initialized;
> > +        /*
> > +         * FIXME: This assumes the host IOMMU can support only a
> > +         * single contiguous IOVA window.  We may need to generalize
> > +         * that in future
> > +         */
> > +        hwaddr min_iova, max_iova;
> 
> Should that maybe be dma_addr_t instead of hwaddr ?

Ah, yes it probably should.

> >      } iommu_data;
> >      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> >      QLIST_HEAD(, VFIOGroup) group_list;
> > 
> 
>  Thomas
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured David Gibson
@ 2015-09-23 11:08   ` Thomas Huth
  2015-09-23 23:56     ` David Gibson
  2015-09-23 18:55   ` Laurent Vivier
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 11:08 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> At present the PCI host bridge (PHB) for the pseries machine type has a
> fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
> memory via the PAPR paravirtualized IOMMU.
> 
> For better support of VFIO devices, we're going to want to allow for
> different configurations of the DMA window.
> 
> Eventually we'll want to allow the guest itself to reconfigure the window
> via the PAPR dynamic DMA window interface, but as a preliminary this patch
> allows the user to reconfigure the window with new properties on the PHB
> device.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_pci.c          | 7 +++++--
>  include/hw/pci-host/spapr.h | 3 +--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index b088491..622c4ac 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1394,7 +1394,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
>      sPAPRTCETable *tcet;
>      uint32_t nb_table;
>  
> -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
> +    nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
>      tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>                                 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
>      if (!tcet) {
> @@ -1404,7 +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
>      }
>  
>      /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, 0,
> +    memory_region_add_subregion(&sphb->iommu_root, sphb->dma_win_addr,
>                                  spapr_tce_get_iommu(tcet));
>  }
>  
> @@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] = {
>                         SPAPR_PCI_IO_WIN_SIZE),
>      DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
>                       true),
> +    /* Default DMA window is 0..1GB */
> +    DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr, 0),
> +    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState, dma_win_size, 0x40000000),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 5322b56..7de5e02 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -78,6 +78,7 @@ struct sPAPRPHBState {
>      MemoryRegion memwindow, iowindow, msiwindow;
>  
>      uint32_t dma_liobn;
> +    hwaddr dma_win_addr, dma_win_size;

Maybe use dma_addr_t for dma_win_addr? And dma_win_size isn't an
address, so I'd maybe use uint64_t here instead.

>      AddressSpace iommu_as;
>      MemoryRegion iommu_root;
>  
> @@ -115,8 +116,6 @@ struct sPAPRPHBVFIOState {
>  
>  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
>  
> -#define SPAPR_PCI_DMA32_SIZE         0x40000000
> -
>  static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());

Anyway, patch looks fine to me, so:

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
  2015-09-17 16:54   ` Alex Williamson
@ 2015-09-23 11:24     ` Thomas Huth
  2015-09-24  0:35       ` David Gibson
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 11:24 UTC (permalink / raw)
  To: Alex Williamson, David Gibson
  Cc: lvivier, aik, gwshan, qemu-devel, qemu-ppc, pbonzini

On 17/09/15 18:54, Alex Williamson wrote:
> On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
>> Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
>> all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
>> this is decided at creation time.
>>
>> To support hotplug of VFIO devices, we need to allow a TCE table which
>> previously didn't allow VFIO devices to be switched so that it can.  This
>> patch adds an spapr_tce_need_vfio() function to do this, by reallocating
>> the table in userspace if necessary.
>>
>> Currently this doesn't allow the KVM acceleration to be re-enabled if all
>> the VFIO devices are removed.  That's an optimization for another time.
> 
> 
> Same comment as previous patch, spapr_tce_need_userspace_table() (or
> something) makes the code much more self documenting.

May I also add the using the word "need" in a function name is could be
a little bit confusing here? It's maybe just me, but if I read
..._need_...() I first think of a function that just checks something
and returns a boolean for yes or no, not of a function that changes
something.
Could you maybe use something like "..._change_to_..." instead?

 Thomas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (10 preceding siblings ...)
  2015-09-17 16:54 ` [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge Alex Williamson
@ 2015-09-23 11:26 ` Thomas Huth
  2015-09-23 16:46 ` Laurent Vivier
  12 siblings, 0 replies; 54+ messages in thread
From: Thomas Huth @ 2015-09-23 11:26 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: lvivier, pbonzini, qemu-ppc, qemu-devel

On 17/09/15 15:09, David Gibson wrote:
> Currently the pseries machine type uses two types of PCI Host Bridge
> (PHB) devices: "spapr-pci-host-bridge" the 'normal' variant intended
> for emulated PCI devices, and "spapr-pci-vfio-host-bridge" intended
> for VFIO devices.
> 
> When using VFIO with pseries, a separate spapr-pci-vfio-host-bridge
> device is needed for every host IOMMU group from which you're using
> VFIO devices.  This is quite awkward for the user and/or management
> tools.  It's especially awkward since the current code makes
> essentially no attempt to detect and warn the user if the wrong sorts
> of devices are connected to the wrong PHB.
> 
> It turns out that the VFIO core code is actually general enough that
> VFIO devices almost work on the normal spapr-pci-host-bridge device.
> In fact with the right combination of circumstances they *can* work
> right now.
> 
> spapr-pci-vfio-host-bridge does 3 additional things:
> 
>     1) It disables KVM acceleration of the guest IOMMU.  That
>        acceleration breaks VFIO because it means guest IOMMU updates
>        bypass the VFIO infrastructure which keeps the host IOMMU in
>        sync.
> 
>     2) It automatically configures the guest PHB's DMA window to match
>        the capabilities of the host IOMMU, and advertises that to the
>        guest.
> 
>     3) It provides additional handling of EEH (Enhanced Error
>        Handling) functions.
> 
> This patch series:
>     * Allows VFIO devices to be used on the spapr-pci-host-bridge by
>       auto-switching the KVM TCE acceleration
> 
>     * Adds verification that the host IOMMU can handle the DMA windows
>       used by guest PHBs
> 
>     * Allows the DMA window on the guest PHB to be configured with
>       device properties.  This can be used to make sure it matches a
>       host window, but in practice the default setting will already
>       work with the host IOMMU on all current systems.
> 
>     * Adds support to the VFIO core to allow a VFIO device to be
>       hotplugged onto a bus which doesn't yet have VFIO devices.  This
>       already worked for systems without a guest-visible IOMMU
>       (i.e. x86), this series makes it work even with a guest visible
>       IOMMU.
> 
>     * Makes a few related cleanups along the way
> 
> This series does NOT allow EEH operations on VFIO devices on the
> spapr-pci-host-bridge device, so the spapr-pci-vfio-host-bridge device
> is left in for now.  It turns out there are some serious existing
> problems in both the qemu EEH implementation and (worse) in the
> EEH/VFIO kernel interface.  Fixing those is a problem for another day.
> Maybe tomorrow.
> 
> 
> I've tested basic assignment of an xHCI to a pseries guest, both at
> startup and with hotplug.  I haven't (yet) tested VFIO on x86 with
> this series.
> 
> This series probably needs to be merged via several different trees.
> I'm intending to split up as necessary once it's had some review.
> 
> David Gibson (10):
>   vfio: Remove unneeded union from VFIOContainer
>   vfio: Generalize vfio_listener_region_add failure path
>   vfio: Check guest IOVA ranges against host IOMMU capabilities
>   vfio: Record host IOMMU's available IO page sizes
>   memory: Allow replay of IOMMU mapping notifications
>   vfio: Allow hotplug of containers onto existing guest IOMMU mappings
>   spapr_pci: Allow PCI host bridge DMA window to be configured
>   spapr_iommu: Rename vfio_accel parameter
>   spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
>   spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
> 
>  hw/ppc/spapr_iommu.c          |  25 ++++++-
>  hw/ppc/spapr_pci.c            |  13 +++-
>  hw/vfio/common.c              | 152 +++++++++++++++++++++++++++---------------
>  include/exec/memory.h         |  16 +++++
>  include/hw/pci-host/spapr.h   |   3 +-
>  include/hw/ppc/spapr.h        |   6 +-
>  include/hw/vfio/vfio-common.h |  21 +++---
>  memory.c                      |  18 +++++
>  target-ppc/kvm.c              |   4 +-
>  target-ppc/kvm_ppc.h          |   2 +-
>  10 files changed, 184 insertions(+), 76 deletions(-)

Apart from some (mostly cosmetic) nits, the patch series looks fine to me.

 Thomas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
  2015-09-18  6:15   ` Alexey Kardashevskiy
  2015-09-23 10:31   ` Thomas Huth
@ 2015-09-23 13:18   ` Laurent Vivier
  2 siblings, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 13:18 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> Currently the VFIOContainer iommu_data field contains a union with
> different information for different host iommu types.  However:
>    * It only actually contains information for the x86-like "Type1" iommu
>    * Because we have a common listener the Type1 fields are actually used
> on all IOMMU types, including the SPAPR TCE type as well
>    * There's no tag in the VFIOContainer to tell you which union member is
> valid anyway.
> 
> In fact we now have a general structure for the listener which is unlikely
> to ever need per-iommu-type information, so this patch removes the union.
> 
> In a similar way we can unify the setup of the vfio memory listener in
> vfio_connect_container() that is currently split across a switch on iommu
> type, but is effectively the same in both cases.
> 
> The iommu_data.release pointer was only needed as a cleanup function
> which would handle potentially different data in the union.  With the
> union gone, it too can be removed.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 51 +++++++++++++++++--------------------------
>  include/hw/vfio/vfio-common.h | 14 +++---------
>  2 files changed, 23 insertions(+), 42 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6d21311..e3152f6 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -316,7 +316,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
>      VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            iommu_data.type1.listener);
> +                                            iommu_data.listener);
>      hwaddr iova, end;
>      Int128 llend;
>      void *vaddr;
> @@ -406,9 +406,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           * can gracefully fail.  Runtime, there's not much we can do other
>           * than throw a hardware error.
>           */
> -        if (!container->iommu_data.type1.initialized) {
> -            if (!container->iommu_data.type1.error) {
> -                container->iommu_data.type1.error = ret;
> +        if (!container->iommu_data.initialized) {
> +            if (!container->iommu_data.error) {
> +                container->iommu_data.error = ret;
>              }
>          } else {
>              hw_error("vfio: DMA mapping failed, unable to continue");
> @@ -420,7 +420,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
>      VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            iommu_data.type1.listener);
> +                                            iommu_data.listener);
>      hwaddr iova, end;
>      int ret;
>  
> @@ -485,7 +485,7 @@ static const MemoryListener vfio_memory_listener = {
>  
>  static void vfio_listener_release(VFIOContainer *container)
>  {
> -    memory_listener_unregister(&container->iommu_data.type1.listener);
> +    memory_listener_unregister(&container->iommu_data.listener);
>  }
>  
>  int vfio_mmap_region(Object *obj, VFIORegion *region,
> @@ -683,21 +683,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> -
> -        container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> -
> -        memory_listener_register(&container->iommu_data.type1.listener,
> -                                 container->space->as);
> -
> -        if (container->iommu_data.type1.error) {
> -            ret = container->iommu_data.type1.error;
> -            error_report("vfio: memory listener initialization failed for container");
> -            goto listener_release_exit;
> -        }
> -
> -        container->iommu_data.type1.initialized = true;
> -
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
> @@ -723,19 +708,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> -
> -        container->iommu_data.type1.listener = vfio_memory_listener;
> -        container->iommu_data.release = vfio_listener_release;
> -
> -        memory_listener_register(&container->iommu_data.type1.listener,
> -                                 container->space->as);
> -
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
>          goto free_container_exit;
>      }
>  
> +    container->iommu_data.listener = vfio_memory_listener;
> +
> +    memory_listener_register(&container->iommu_data.listener,
> +                             container->space->as);
> +
> +    if (container->iommu_data.error) {
> +        ret = container->iommu_data.error;
> +        error_report("vfio: memory listener initialization failed for container");
> +        goto listener_release_exit;
> +    }
> +
> +    container->iommu_data.initialized = true;
> +
>      QLIST_INIT(&container->group_list);
>      QLIST_INSERT_HEAD(&space->containers, container, next);
>  
> @@ -774,9 +765,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>          VFIOAddressSpace *space = container->space;
>          VFIOGuestIOMMU *giommu, *tmp;
>  
> -        if (container->iommu_data.release) {
> -            container->iommu_data.release(container);
> -        }
> +        vfio_listener_release(container);
>          QLIST_REMOVE(container, next);
>  
>          QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 59a321d..aff18cd 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -64,21 +64,13 @@ typedef struct VFIOAddressSpace {
>  
>  struct VFIOGroup;
>  
> -typedef struct VFIOType1 {
> -    MemoryListener listener;
> -    int error;
> -    bool initialized;
> -} VFIOType1;
> -
>  typedef struct VFIOContainer {
>      VFIOAddressSpace *space;
>      int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>      struct {
> -        /* enable abstraction to support various iommu backends */
> -        union {
> -            VFIOType1 type1;
> -        };
> -        void (*release)(struct VFIOContainer *);
> +        MemoryListener listener;
> +        int error;
> +        bool initialized;
>      } iommu_data;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
> 
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path David Gibson
  2015-09-23  9:13   ` Thomas Huth
@ 2015-09-23 13:31   ` Laurent Vivier
  1 sibling, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 13:31 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> If a DMA mapping operation fails in vfio_listener_region_add() it
> checks to see if we've already completed initial setup of the
> container.  If so it reports an error so the setup code can fail
> gracefully, otherwise throws a hw_error().
> 
> There are other potential failure cases in vfio_listener_region_add()
> which could benefit from the same logic, so move it to its own
> fail: block.  Later patches can use this to extend other failure cases
> to fail as gracefully as possible under the circumstances.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e3152f6..9953b9c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -400,19 +400,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>                       "0x%"HWADDR_PRIx", %p) = %d (%m)",
>                       container, iova, end - iova, vaddr, ret);
> +        goto fail;
> +    }
>  
> -        /*
> -         * On the initfn path, store the first error in the container so we
> -         * can gracefully fail.  Runtime, there's not much we can do other
> -         * than throw a hardware error.
> -         */
> -        if (!container->iommu_data.initialized) {
> -            if (!container->iommu_data.error) {
> -                container->iommu_data.error = ret;
> -            }
> -        } else {
> -            hw_error("vfio: DMA mapping failed, unable to continue");
> +    return;
> +
> +fail:
> +    /*
> +     * On the initfn path, store the first error in the container so we
> +     * can gracefully fail.  Runtime, there's not much we can do other
> +     * than throw a hardware error.
> +     */
> +    if (!container->iommu_data.initialized) {
> +        if (!container->iommu_data.error) {
> +            container->iommu_data.error = ret;
>          }
> +    } else {
> +        hw_error("vfio: DMA mapping failed, unable to continue");
>      }
>  }
>  
> 
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
  2015-09-18  6:38   ` Alexey Kardashevskiy
  2015-09-23 10:10   ` Thomas Huth
@ 2015-09-23 14:26   ` Laurent Vivier
  2 siblings, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 14:26 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> The current vfio core code assumes that the host IOMMU is capable of
> mapping any IOVA the guest wants to use to where we need.  However, real
> IOMMUs generally only support translating a certain range of IOVAs (the
> "DMA window") not a full 64-bit address space.
> 
> The common x86 IOMMUs support a wide enough range that guests are very
> unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> machines - in the default configuration - supports only a much more limited
> IOVA range, usually 0..2GiB.
> 
> If the guest attempts to set up an IOVA range that the host IOMMU can't
> map, qemu won't report an error until it actually attempts to map a bad
> IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> visible IOMMU) then this will show up very quickly.  If there is a guest
> visible IOMMU, however, the problem might not show up until much later when
> the guest actually attempt to DMA with an IOVA the host can't handle.
> 
> This patch adds a test so that we will detect earlier if the guest is
> attempting to use IOVA ranges that the host IOMMU won't be able to deal
> with.
> 
> For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> incorrect, but no worse than what we have already.  We can't do better for
> now because the Type1 kernel interface doesn't tell us what IOVA range the
> IOMMU actually supports.
> 
> For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> IOVA range and validate guest IOVA ranges against it, and this patch does
> so.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
>  include/hw/vfio/vfio-common.h |  6 ++++++
>  2 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9953b9c..c37f1a1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      if (int128_ge(int128_make64(iova), llend)) {
>          return;
>      }
> +    end = int128_get64(llend);
> +
> +    if ((iova < container->iommu_data.min_iova)
> +        || ((end - 1) > container->iommu_data.max_iova)) {
> +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> +                     container, iova, end - 1);
> +        ret = -EFAULT; /* FIXME: better choice here? */
> +        goto fail;
> +    }
>  
>      memory_region_ref(section->mr);
>  
>      if (memory_region_is_iommu(section->mr)) {
>          VFIOGuestIOMMU *giommu;
>  
> -        trace_vfio_listener_region_add_iommu(iova,
> -                    int128_get64(int128_sub(llend, int128_one())));
> +        trace_vfio_listener_region_add_iommu(iova, end - 1);
>          /*
>           * FIXME: We should do some checking to see if the
>           * capabilities of the host VFIO IOMMU are adequate to model
> @@ -388,7 +397,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>  
>      /* Here we assume that memory_region_is_ram(section->mr)==true */
>  
> -    end = int128_get64(llend);
>      vaddr = memory_region_get_ram_ptr(section->mr) +
>              section->offset_within_region +
>              (iova - section->offset_within_address_space);
> @@ -687,7 +695,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> +
> +        /*
> +         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> +         * IOVA whatsoever.  That's not actually true, but the current
> +         * kernel interface doesn't tell us what it can map, and the
> +         * existing Type1 IOMMUs generally support any IOVA we're
> +         * going to actually try in practice.
> +         */
> +        container->iommu_data.min_iova = 0;
> +        container->iommu_data.max_iova = (hwaddr)-1;
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> +        struct vfio_iommu_spapr_tce_info info;
> +
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
>              error_report("vfio: failed to set group container: %m");
> @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>              ret = -errno;
>              goto free_container_exit;
>          }
> +
> +        /*
> +         * FIXME: This only considers the host IOMMU' 32-bit window.
> +         * At some point we need to add support for the optional
> +         * 64-bit window and dynamic windows
> +         */
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> +        if (ret) {
> +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> +            ret = -errno;
> +            goto free_container_exit;
> +        }
> +        container->iommu_data.min_iova = info.dma32_window_start;
> +        container->iommu_data.max_iova = container->iommu_data.min_iova
> +            + info.dma32_window_size - 1;
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index aff18cd..88ec213 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -71,6 +71,12 @@ typedef struct VFIOContainer {
>          MemoryListener listener;
>          int error;
>          bool initialized;
> +        /*
> +         * FIXME: This assumes the host IOMMU can support only a
> +         * single contiguous IOVA window.  We may need to generalize
> +         * that in future
> +         */
> +        hwaddr min_iova, max_iova;
>      } iommu_data;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
> 
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes David Gibson
  2015-09-23 10:29   ` Thomas Huth
@ 2015-09-23 14:30   ` Laurent Vivier
  1 sibling, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 14:30 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> Depending on the host IOMMU type we determine and record the available page
> sizes for IOMMU translation.  We'll need this for other validation in
> future patches.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c              | 13 +++++++++++++
>  include/hw/vfio/vfio-common.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index c37f1a1..daaac48 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -680,6 +680,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>      if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>          ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>          bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> +        struct vfio_iommu_type1_info info;
>  
>          ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
>          if (ret) {
> @@ -705,6 +706,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>           */
>          container->iommu_data.min_iova = 0;
>          container->iommu_data.max_iova = (hwaddr)-1;
> +
> +        /* Assume just 4K IOVA page size */
> +        container->iommu_data.iova_pgsizes = 0x1000;
> +        info.argsz = sizeof(info);
> +        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
> +        /* Ignore errors */
> +        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> +            container->iommu_data.iova_pgsizes = info.iova_pgsizes;
> +        }
>      } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
>          struct vfio_iommu_spapr_tce_info info;
>  
> @@ -748,6 +758,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>          container->iommu_data.min_iova = info.dma32_window_start;
>          container->iommu_data.max_iova = container->iommu_data.min_iova
>              + info.dma32_window_size - 1;
> +
> +        /* Assume just 4K IOVA pages for now */
> +        container->iommu_data.iova_pgsizes = 0x1000;
>      } else {
>          error_report("vfio: No available IOMMU models");
>          ret = -EINVAL;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 88ec213..c09db6d 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -77,6 +77,7 @@ typedef struct VFIOContainer {
>           * that in future
>           */
>          hwaddr min_iova, max_iova;
> +        uint64_t iova_pgsizes;
>      } iommu_data;
>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>      QLIST_HEAD(, VFIOGroup) group_list;
> 
Reviewed-by: Laurent Vivier <lvivier@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-23 10:40   ` Thomas Huth
@ 2015-09-23 16:35     ` Laurent Vivier
  2015-09-23 23:47     ` David Gibson
  1 sibling, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 16:35 UTC (permalink / raw)
  To: Thomas Huth, David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, qemu-ppc, qemu-devel



On 23/09/2015 12:40, Thomas Huth wrote:
> On 17/09/15 15:09, David Gibson wrote:
>> When we have guest visible IOMMUs, we allow notifiers to be registered
>> which will be informed of all changes to IOMMU mappings.  This is used by
>> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
>>
>> However, unlike with a memory region listener, an iommu notifier won't be
>> told about any mappings which already exist in the (guest) IOMMU at the
>> time it is registered.  This can cause problems if hotplugging a VFIO
>> device onto a guest bus which had existing guest IOMMU mappings, but didn't
>> previously have an VFIO devices (and hence no host IOMMU mappings).
>>
>> This adds a memory_region_register_iommu_notifier_replay() function to
>> handle this case.  As well as registering the new notifier it replays
>> existing mappings.  Because the IOMMU memory region doesn't internally
>> remember the granularity of the guest IOMMU it has a small hack where the
>> caller must specify a granularity at which to replay mappings.
>>
>> If there are finer mappings in the guest IOMMU these will be reported in
>> the iotlb structures passed to the notifier which it must handle (probably
>> causing it to flag an error).  This isn't new - the VFIO iommu notifier
>> must already handle notifications about guest IOMMU mappings too short
>> for it to represent in the host IOMMU.
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> ---
>>  include/exec/memory.h | 16 ++++++++++++++++
>>  memory.c              | 18 ++++++++++++++++++
>>  2 files changed, 34 insertions(+)
> 
>> diff --git a/memory.c b/memory.c
>> index 0d8b2d9..6b5a2f1 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>>      notifier_list_add(&mr->iommu_notify, n);
>>  }
>>  
>> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
>> +                                                  hwaddr granularity, bool is_write)
> 
> granularity itself is not an address, but a size, isn't it? So using
> "hwaddr" sounds wrong here.

I disagree: in memory.c,  all size are defined with hwaddr.

>> +{
>> +    hwaddr addr;
> 
> dma_addr_t ?
> 
>> +    IOMMUTLBEntry iotlb;
>> +
>> +    memory_region_register_iommu_notifier(mr, n);
>> +
>> +    for (addr = 0;
>> +         int128_lt(int128_make64(addr), mr->size);
>> +         addr += granularity) {
>> +
>> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
>> +        if (iotlb.perm != IOMMU_NONE)
>> +            n->notify(n, &iotlb);
> 
> Missing curly braces.
> 
>> +    }
>> +}
>> +
> 
>  Thomas
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge
  2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
                   ` (11 preceding siblings ...)
  2015-09-23 11:26 ` Thomas Huth
@ 2015-09-23 16:46 ` Laurent Vivier
  2015-09-24  1:02   ` David Gibson
  12 siblings, 1 reply; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 16:46 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> Currently the pseries machine type uses two types of PCI Host Bridge
> (PHB) devices: "spapr-pci-host-bridge" the 'normal' variant intended
> for emulated PCI devices, and "spapr-pci-vfio-host-bridge" intended
> for VFIO devices.
> 
> When using VFIO with pseries, a separate spapr-pci-vfio-host-bridge
> device is needed for every host IOMMU group from which you're using
> VFIO devices.  This is quite awkward for the user and/or management
> tools.  It's especially awkward since the current code makes
> essentially no attempt to detect and warn the user if the wrong sorts
> of devices are connected to the wrong PHB.
> 
> It turns out that the VFIO core code is actually general enough that
> VFIO devices almost work on the normal spapr-pci-host-bridge device.
> In fact with the right combination of circumstances they *can* work
> right now.
> 
> spapr-pci-vfio-host-bridge does 3 additional things:
> 
>     1) It disables KVM acceleration of the guest IOMMU.  That
>        acceleration breaks VFIO because it means guest IOMMU updates
>        bypass the VFIO infrastructure which keeps the host IOMMU in
>        sync.
> 
>     2) It automatically configures the guest PHB's DMA window to match
>        the capabilities of the host IOMMU, and advertises that to the
>        guest.
> 
>     3) It provides additional handling of EEH (Enhanced Error
>        Handling) functions.
> 
> This patch series:
>     * Allows VFIO devices to be used on the spapr-pci-host-bridge by
>       auto-switching the KVM TCE acceleration
> 
>     * Adds verification that the host IOMMU can handle the DMA windows
>       used by guest PHBs
> 
>     * Allows the DMA window on the guest PHB to be configured with
>       device properties.  This can be used to make sure it matches a
>       host window, but in practice the default setting will already
>       work with the host IOMMU on all current systems.
> 
>     * Adds support to the VFIO core to allow a VFIO device to be
>       hotplugged onto a bus which doesn't yet have VFIO devices.  This
>       already worked for systems without a guest-visible IOMMU
>       (i.e. x86), this series makes it work even with a guest visible
>       IOMMU.
> 
>     * Makes a few related cleanups along the way
> 
> This series does NOT allow EEH operations on VFIO devices on the
> spapr-pci-host-bridge device, so the spapr-pci-vfio-host-bridge device
> is left in for now.  It turns out there are some serious existing
> problems in both the qemu EEH implementation and (worse) in the
> EEH/VFIO kernel interface.  Fixing those is a problem for another day.
> Maybe tomorrow.
> 
> 
> I've tested basic assignment of an xHCI to a pseries guest, both at
> startup and with hotplug.  I haven't (yet) tested VFIO on x86 with
> this series.
> 
> This series probably needs to be merged via several different trees.
> I'm intending to split up as necessary once it's had some review.
> 
> David Gibson (10):
>   vfio: Remove unneeded union from VFIOContainer
>   vfio: Generalize vfio_listener_region_add failure path
>   vfio: Check guest IOVA ranges against host IOMMU capabilities
>   vfio: Record host IOMMU's available IO page sizes
>   memory: Allow replay of IOMMU mapping notifications
>   vfio: Allow hotplug of containers onto existing guest IOMMU mappings
>   spapr_pci: Allow PCI host bridge DMA window to be configured
>   spapr_iommu: Rename vfio_accel parameter
>   spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
>   spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
> 
>  hw/ppc/spapr_iommu.c          |  25 ++++++-
>  hw/ppc/spapr_pci.c            |  13 +++-
>  hw/vfio/common.c              | 152 +++++++++++++++++++++++++++---------------
>  include/exec/memory.h         |  16 +++++
>  include/hw/pci-host/spapr.h   |   3 +-
>  include/hw/ppc/spapr.h        |   6 +-
>  include/hw/vfio/vfio-common.h |  21 +++---
>  memory.c                      |  18 +++++
>  target-ppc/kvm.c              |   4 +-
>  target-ppc/kvm_ppc.h          |   2 +-
>  10 files changed, 184 insertions(+), 76 deletions(-)
> 

Just a comment on the form: checkpatch.pl vociferates because of "DOS
line endings", "trailing whitespace" and "line over 80 characters".

<troll>
	Please, use "vi" with "set ff=unix"
</troll>

Stefan has a good tip to check automatically commits:

http://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html

Laurent

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications David Gibson
  2015-09-23 10:40   ` Thomas Huth
@ 2015-09-23 17:04   ` Laurent Vivier
  2015-09-23 23:50     ` David Gibson
  1 sibling, 1 reply; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 17:04 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> When we have guest visible IOMMUs, we allow notifiers to be registered
> which will be informed of all changes to IOMMU mappings.  This is used by
> vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> 
> However, unlike with a memory region listener, an iommu notifier won't be
> told about any mappings which already exist in the (guest) IOMMU at the
> time it is registered.  This can cause problems if hotplugging a VFIO
> device onto a guest bus which had existing guest IOMMU mappings, but didn't
> previously have an VFIO devices (and hence no host IOMMU mappings).
> 
> This adds a memory_region_register_iommu_notifier_replay() function to
> handle this case.  As well as registering the new notifier it replays
> existing mappings.  Because the IOMMU memory region doesn't internally
> remember the granularity of the guest IOMMU it has a small hack where the
> caller must specify a granularity at which to replay mappings.
> 
> If there are finer mappings in the guest IOMMU these will be reported in
> the iotlb structures passed to the notifier which it must handle (probably
> causing it to flag an error).  This isn't new - the VFIO iommu notifier
> must already handle notifications about guest IOMMU mappings too short
> for it to represent in the host IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/exec/memory.h | 16 ++++++++++++++++
>  memory.c              | 18 ++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5baaf48..3cf145b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -583,6 +583,22 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
>  
>  /**
> + * memory_region_register_iommu_notifier_replay: register a notifier
> + * for changes to IOMMU translation entries, and replay existing IOMMU
> + * translations to the new notifier.
> + *
> + * @mr: the memory region to observe
> + * @n: the notifier to be added; the notifier receives a pointer to an
> + *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> + *     valid on exit from the notifier.
> + * @granularity: Minimum page granularity to replay notifications for
> + * @is_write: Whether to treat the replay as a translate "write"
> + *     through the iommu
> + */
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity, bool is_write);
> +
> +/**
>   * memory_region_unregister_iommu_notifier: unregister a notifier for
>   * changes to IOMMU translation entries.
>   *
> diff --git a/memory.c b/memory.c
> index 0d8b2d9..6b5a2f1 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
>      notifier_list_add(&mr->iommu_notify, n);
>  }
>  
> +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> +                                                  hwaddr granularity, bool is_write)
> +{
> +    hwaddr addr;
> +    IOMMUTLBEntry iotlb;
> +
> +    memory_region_register_iommu_notifier(mr, n);
> +
> +    for (addr = 0;
> +         int128_lt(int128_make64(addr), mr->size);

"addr < memory_region_size(mr)" should be enough.

> +         addr += granularity) {
> +
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        if (iotlb.perm != IOMMU_NONE)
> +            n->notify(n, &iotlb);
> +    }
> +}
> +
>  void memory_region_unregister_iommu_notifier(Notifier *n)
>  {
>      notifier_remove(n);
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
  2015-09-17 16:54   ` Alex Williamson
@ 2015-09-23 18:44   ` Laurent Vivier
  1 sibling, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 18:44 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> At present the memory listener used by vfio to keep host IOMMU mappings
> in sync with the guest memory image assumes that if a guest IOMMU
> appears, then it has no existing mappings.
> 
> This may not be true if a VFIO device is hotplugged onto a guest bus
> which didn't previously include a VFIO device, and which has existing
> guest IOMMU mappings.
> 
> Therefore, use the memory_region_register_iommu_notifier_replay()
> function in order to fix this case, replaying existing guest IOMMU
> mappings, bringing the host IOMMU into sync with the guest IOMMU.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/common.c | 34 +++++++++++++++++++---------------
>  1 file changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index daaac48..543c38e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -312,6 +312,22 @@ out:
>      rcu_read_unlock();
>  }
>  
> +static hwaddr vfio_container_granularity(VFIOContainer *container)
> +{
> +    uint64_t pgsize;
> +
> +    assert(container->iommu_data.iova_pgsizes);
> +
> +    /* Find the smallest page size supported by the IOMMU */
> +    for (pgsize = 1; pgsize; pgsize <<= 1) {
> +        if (pgsize & container->iommu_data.iova_pgsizes) {
> +            return pgsize;
> +        }
> +    }

Perhaps we can use gcc builtin ?

	return 1 << ctz64(container->iommu_data.iova_pgsizes);

> +    /* Can't happen */
> +    assert(0);
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>                                       MemoryRegionSection *section)
>  {
> @@ -371,26 +387,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           * would be the right place to wire that up (tell the KVM
>           * device emulation the VFIO iommu handles to use).
>           */
> -        /*
> -         * This assumes that the guest IOMMU is empty of
> -         * mappings at this point.
> -         *
> -         * One way of doing this is:
> -         * 1. Avoid sharing IOMMUs between emulated devices or different
> -         * IOMMU groups.
> -         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
> -         * there are some mappings in IOMMU.
> -         *
> -         * VFIO on SPAPR does that. Other IOMMU models may do that different,
> -         * they must make sure there are no existing mappings or
> -         * loop through existing mappings to map them into VFIO.
> -         */
>          giommu = g_malloc0(sizeof(*giommu));
>          giommu->iommu = section->mr;
>          giommu->container = container;
>          giommu->n.notify = vfio_iommu_map_notify;
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +
> +        memory_region_register_iommu_notifier_replay(giommu->iommu, &giommu->n,
> +            vfio_container_granularity(container), false);
>  
>          return;
>      }
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured David Gibson
  2015-09-23 11:08   ` Thomas Huth
@ 2015-09-23 18:55   ` Laurent Vivier
  2015-09-23 23:54     ` David Gibson
  1 sibling, 1 reply; 54+ messages in thread
From: Laurent Vivier @ 2015-09-23 18:55 UTC (permalink / raw)
  To: David Gibson, alex.williamson, aik, gwshan
  Cc: pbonzini, thuth, qemu-ppc, qemu-devel



On 17/09/2015 15:09, David Gibson wrote:
> At present the PCI host bridge (PHB) for the pseries machine type has a
> fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
> memory via the PAPR paravirtualized IOMMU.
> 
> For better support of VFIO devices, we're going to want to allow for
> different configurations of the DMA window.
> 
> Eventually we'll want to allow the guest itself to reconfigure the window
> via the PAPR dynamic DMA window interface, but as a preliminary this patch
> allows the user to reconfigure the window with new properties on the PHB
> device.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_pci.c          | 7 +++++--
>  include/hw/pci-host/spapr.h | 3 +--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index b088491..622c4ac 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1394,7 +1394,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
>      sPAPRTCETable *tcet;
>      uint32_t nb_table;
>  
> -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
> +    nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
>      tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>                                 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
>      if (!tcet) {
> @@ -1404,7 +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
>      }
>  
>      /* Register default 32bit DMA window */
> -    memory_region_add_subregion(&sphb->iommu_root, 0,
> +    memory_region_add_subregion(&sphb->iommu_root, sphb->dma_win_addr,
>                                  spapr_tce_get_iommu(tcet));
>  }
>  
> @@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] = {
>                         SPAPR_PCI_IO_WIN_SIZE),
>      DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
>                       true),
> +    /* Default DMA window is 0..1GB */
> +    DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr, 0),
> +    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState, dma_win_size, 0x40000000),
>      DEFINE_PROP_END_OF_LIST(),
>  };

Add them too to the vmstate_spapr_pci ?

>  
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 5322b56..7de5e02 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -78,6 +78,7 @@ struct sPAPRPHBState {
>      MemoryRegion memwindow, iowindow, msiwindow;
>  
>      uint32_t dma_liobn;
> +    hwaddr dma_win_addr, dma_win_size;
>      AddressSpace iommu_as;
>      MemoryRegion iommu_root;
>  
> @@ -115,8 +116,6 @@ struct sPAPRPHBVFIOState {
>  
>  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
>  
> -#define SPAPR_PCI_DMA32_SIZE         0x40000000
> -
>  static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer
  2015-09-23 10:31   ` Thomas Huth
@ 2015-09-23 23:14     ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:14 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2905 bytes --]

On Wed, Sep 23, 2015 at 12:31:25PM +0200, Thomas Huth wrote:
> On 17/09/15 15:09, David Gibson wrote:
> > Currently the VFIOContainer iommu_data field contains a union with
> > different information for different host iommu types.  However:
> >    * It only actually contains information for the x86-like "Type1" iommu
> >    * Because we have a common listener the Type1 fields are actually used
> > on all IOMMU types, including the SPAPR TCE type as well
> >    * There's no tag in the VFIOContainer to tell you which union member is
> > valid anyway.
> > 
> > In fact we now have a general structure for the listener which is unlikely
> > to ever need per-iommu-type information, so this patch removes the union.
> > 
> > In a similar way we can unify the setup of the vfio memory listener in
> > vfio_connect_container() that is currently split across a switch on iommu
> > type, but is effectively the same in both cases.
> > 
> > The iommu_data.release pointer was only needed as a cleanup function
> > which would handle potentially different data in the union.  With the
> > union gone, it too can be removed.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/vfio/common.c              | 51 +++++++++++++++++--------------------------
> >  include/hw/vfio/vfio-common.h | 14 +++---------
> >  2 files changed, 23 insertions(+), 42 deletions(-)
> ...
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index 59a321d..aff18cd 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -64,21 +64,13 @@ typedef struct VFIOAddressSpace {
> >  
> >  struct VFIOGroup;
> >  
> > -typedef struct VFIOType1 {
> > -    MemoryListener listener;
> > -    int error;
> > -    bool initialized;
> > -} VFIOType1;
> > -
> >  typedef struct VFIOContainer {
> >      VFIOAddressSpace *space;
> >      int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> >      struct {
> > -        /* enable abstraction to support various iommu backends */
> > -        union {
> > -            VFIOType1 type1;
> > -        };
> > -        void (*release)(struct VFIOContainer *);
> > +        MemoryListener listener;
> > +        int error;
> > +        bool initialized;
> >      } iommu_data;
> >      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> >      QLIST_HEAD(, VFIOGroup) group_list;
> > 
> 
> I think I agree with Alexey here ... keeping the iommu_data struct
> around those fields looks cumbersome. Is there a reason you did not
> remove the struct completely?

Uh.. it seemed like a good idea at the time?

I'll remove it in the next spin.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-09-23 11:07     ` David Gibson
@ 2015-09-23 23:43       ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:43 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 5924 bytes --]

On Wed, Sep 23, 2015 at 09:07:06PM +1000, David Gibson wrote:
> On Wed, Sep 23, 2015 at 12:10:46PM +0200, Thomas Huth wrote:
> > On 17/09/15 15:09, David Gibson wrote:
> > > The current vfio core code assumes that the host IOMMU is capable of
> > > mapping any IOVA the guest wants to use to where we need.  However, real
> > > IOMMUs generally only support translating a certain range of IOVAs (the
> > > "DMA window") not a full 64-bit address space.
> > > 
> > > The common x86 IOMMUs support a wide enough range that guests are very
> > > unlikely to go beyond it in practice, however the IOMMU used on IBM Power
> > > machines - in the default configuration - supports only a much more limited
> > > IOVA range, usually 0..2GiB.
> > > 
> > > If the guest attempts to set up an IOVA range that the host IOMMU can't
> > > map, qemu won't report an error until it actually attempts to map a bad
> > > IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
> > > visible IOMMU) then this will show up very quickly.  If there is a guest
> > > visible IOMMU, however, the problem might not show up until much later when
> > > the guest actually attempt to DMA with an IOVA the host can't handle.
> > > 
> > > This patch adds a test so that we will detect earlier if the guest is
> > > attempting to use IOVA ranges that the host IOMMU won't be able to deal
> > > with.
> > > 
> > > For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
> > > incorrect, but no worse than what we have already.  We can't do better for
> > > now because the Type1 kernel interface doesn't tell us what IOVA range the
> > > IOMMU actually supports.
> > > 
> > > For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
> > > IOVA range and validate guest IOVA ranges against it, and this patch does
> > > so.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  hw/vfio/common.c              | 42 +++++++++++++++++++++++++++++++++++++++---
> > >  include/hw/vfio/vfio-common.h |  6 ++++++
> > >  2 files changed, 45 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 9953b9c..c37f1a1 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -344,14 +344,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
> > >      if (int128_ge(int128_make64(iova), llend)) {
> > >          return;
> > >      }
> > > +    end = int128_get64(llend);
> > > +
> > > +    if ((iova < container->iommu_data.min_iova)
> > > +        || ((end - 1) > container->iommu_data.max_iova)) {
> > 
> > (Too much paranthesis for my taste ;-))
> 
> Yes, well, we've already established our tastes differ on that point.
> 
> > > +        error_report("vfio: IOMMU container %p can't map guest IOVA region"
> > > +                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
> > > +                     container, iova, end - 1);
> > > +        ret = -EFAULT; /* FIXME: better choice here? */
> > 
> > Maybe -EINVAL? ... but -EFAULT also sounds ok for me.
> 
> I try to avoid EINVAL unless it's clearly the only right choice.  So
> many things use it that it tends to be very unhelpful when you get one.
> 
> > > +        goto fail;
> > > +    }
> > ...
> > > @@ -712,6 +732,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
> > >              ret = -errno;
> > >              goto free_container_exit;
> > >          }
> > > +
> > > +        /*
> > > +         * FIXME: This only considers the host IOMMU' 32-bit window.
> > > +         * At some point we need to add support for the optional
> > > +         * 64-bit window and dynamic windows
> > > +         */
> > > +        info.argsz = sizeof(info);
> > > +        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > > +        if (ret) {
> > > +            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
> > 
> > Isn't that %m a glibc extension only? ... Well, this code likely only
> > runs on Linux with a glibc, so it likely doesn't matter, I guess...
> 
> Yes, it is, but it's already used extensively within qemu.
> 
> > > +            ret = -errno;
> > > +            goto free_container_exit;
> > > +        }
> > > +        container->iommu_data.min_iova = info.dma32_window_start;
> > > +        container->iommu_data.max_iova = container->iommu_data.min_iova
> > > +            + info.dma32_window_size - 1;
> > >      } else {
> > >          error_report("vfio: No available IOMMU models");
> > >          ret = -EINVAL;
> > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > > index aff18cd..88ec213 100644
> > > --- a/include/hw/vfio/vfio-common.h
> > > +++ b/include/hw/vfio/vfio-common.h
> > > @@ -71,6 +71,12 @@ typedef struct VFIOContainer {
> > >          MemoryListener listener;
> > >          int error;
> > >          bool initialized;
> > > +        /*
> > > +         * FIXME: This assumes the host IOMMU can support only a
> > > +         * single contiguous IOVA window.  We may need to generalize
> > > +         * that in future
> > > +         */
> > > +        hwaddr min_iova, max_iova;
> > 
> > Should that maybe be dma_addr_t instead of hwaddr ?
> 
> Ah, yes it probably should.

Actually, on further consideration, no it shouldn't.  hwaddr is what's
used throughout the VFIO code, in address_space_translate() and in
IOMMUTLBEntry, for both sides of the translation.  In fact, I'm not
entirely convinced there's any reason to have dma_addr_t distinct from
hwaddr at all, but that's a cleanup for some other day.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-23 10:40   ` Thomas Huth
  2015-09-23 16:35     ` Laurent Vivier
@ 2015-09-23 23:47     ` David Gibson
  1 sibling, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:47 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3227 bytes --]

On Wed, Sep 23, 2015 at 12:40:18PM +0200, Thomas Huth wrote:
> On 17/09/15 15:09, David Gibson wrote:
> > When we have guest visible IOMMUs, we allow notifiers to be registered
> > which will be informed of all changes to IOMMU mappings.  This is used by
> > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> > 
> > However, unlike with a memory region listener, an iommu notifier won't be
> > told about any mappings which already exist in the (guest) IOMMU at the
> > time it is registered.  This can cause problems if hotplugging a VFIO
> > device onto a guest bus which had existing guest IOMMU mappings, but didn't
> > previously have an VFIO devices (and hence no host IOMMU mappings).
> > 
> > This adds a memory_region_register_iommu_notifier_replay() function to
> > handle this case.  As well as registering the new notifier it replays
> > existing mappings.  Because the IOMMU memory region doesn't internally
> > remember the granularity of the guest IOMMU it has a small hack where the
> > caller must specify a granularity at which to replay mappings.
> > 
> > If there are finer mappings in the guest IOMMU these will be reported in
> > the iotlb structures passed to the notifier which it must handle (probably
> > causing it to flag an error).  This isn't new - the VFIO iommu notifier
> > must already handle notifications about guest IOMMU mappings too short
> > for it to represent in the host IOMMU.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/exec/memory.h | 16 ++++++++++++++++
> >  memory.c              | 18 ++++++++++++++++++
> >  2 files changed, 34 insertions(+)
> 
> > diff --git a/memory.c b/memory.c
> > index 0d8b2d9..6b5a2f1 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> >      notifier_list_add(&mr->iommu_notify, n);
> >  }
> >  
> > +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> > +                                                  hwaddr granularity, bool is_write)
> 
> granularity itself is not an address, but a size, isn't it? So using
> "hwaddr" sounds wrong here.

As Laurent says, hwaddr is used for sizes through memory.c - I think
the rationale is that a size is a difference between two hwaddrs, so
should have the same type.

> > +{
> > +    hwaddr addr;
> 
> dma_addr_t ?

As mentioned in previus reply, I don't think so on consideration.

> > +    IOMMUTLBEntry iotlb;
> > +
> > +    memory_region_register_iommu_notifier(mr, n);
> > +
> > +    for (addr = 0;
> > +         int128_lt(int128_make64(addr), mr->size);
> > +         addr += granularity) {
> > +
> > +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> > +        if (iotlb.perm != IOMMU_NONE)
> > +            n->notify(n, &iotlb);
> 
> Missing curly braces.

Thanks, fixed.

> 
> > +    }
> > +}
> > +
> 
>  Thomas
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-23 17:04   ` Laurent Vivier
@ 2015-09-23 23:50     ` David Gibson
  2015-09-24  7:09       ` Laurent Vivier
  0 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:50 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: thuth, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 4303 bytes --]

On Wed, Sep 23, 2015 at 07:04:55PM +0200, Laurent Vivier wrote:
> 
> 
> On 17/09/2015 15:09, David Gibson wrote:
> > When we have guest visible IOMMUs, we allow notifiers to be registered
> > which will be informed of all changes to IOMMU mappings.  This is used by
> > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.
> > 
> > However, unlike with a memory region listener, an iommu notifier won't be
> > told about any mappings which already exist in the (guest) IOMMU at the
> > time it is registered.  This can cause problems if hotplugging a VFIO
> > device onto a guest bus which had existing guest IOMMU mappings, but didn't
> > previously have an VFIO devices (and hence no host IOMMU mappings).
> > 
> > This adds a memory_region_register_iommu_notifier_replay() function to
> > handle this case.  As well as registering the new notifier it replays
> > existing mappings.  Because the IOMMU memory region doesn't internally
> > remember the granularity of the guest IOMMU it has a small hack where the
> > caller must specify a granularity at which to replay mappings.
> > 
> > If there are finer mappings in the guest IOMMU these will be reported in
> > the iotlb structures passed to the notifier which it must handle (probably
> > causing it to flag an error).  This isn't new - the VFIO iommu notifier
> > must already handle notifications about guest IOMMU mappings too short
> > for it to represent in the host IOMMU.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/exec/memory.h | 16 ++++++++++++++++
> >  memory.c              | 18 ++++++++++++++++++
> >  2 files changed, 34 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 5baaf48..3cf145b 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -583,6 +583,22 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >  void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
> >  
> >  /**
> > + * memory_region_register_iommu_notifier_replay: register a notifier
> > + * for changes to IOMMU translation entries, and replay existing IOMMU
> > + * translations to the new notifier.
> > + *
> > + * @mr: the memory region to observe
> > + * @n: the notifier to be added; the notifier receives a pointer to an
> > + *     #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> > + *     valid on exit from the notifier.
> > + * @granularity: Minimum page granularity to replay notifications for
> > + * @is_write: Whether to treat the replay as a translate "write"
> > + *     through the iommu
> > + */
> > +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> > +                                                  hwaddr granularity, bool is_write);
> > +
> > +/**
> >   * memory_region_unregister_iommu_notifier: unregister a notifier for
> >   * changes to IOMMU translation entries.
> >   *
> > diff --git a/memory.c b/memory.c
> > index 0d8b2d9..6b5a2f1 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1403,6 +1403,24 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
> >      notifier_list_add(&mr->iommu_notify, n);
> >  }
> >  
> > +void memory_region_register_iommu_notifier_replay(MemoryRegion *mr, Notifier *n,
> > +                                                  hwaddr granularity, bool is_write)
> > +{
> > +    hwaddr addr;
> > +    IOMMUTLBEntry iotlb;
> > +
> > +    memory_region_register_iommu_notifier(mr, n);
> > +
> > +    for (addr = 0;
> > +         int128_lt(int128_make64(addr), mr->size);
> 
> "addr < memory_region_size(mr)" should be enough.

Ah, yes, much neater, thanks.

> > +         addr += granularity) {
> > +
> > +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> > +        if (iotlb.perm != IOMMU_NONE)
> > +            n->notify(n, &iotlb);
> > +    }
> > +}
> > +
> >  void memory_region_unregister_iommu_notifier(Notifier *n)
> >  {
> >      notifier_remove(n);
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-09-23 11:02       ` Thomas Huth
@ 2015-09-23 23:50         ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:50 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, Alex Williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2057 bytes --]

On Wed, Sep 23, 2015 at 01:02:32PM +0200, Thomas Huth wrote:
> On 18/09/15 01:31, David Gibson wrote:
> > On Thu, Sep 17, 2015 at 10:54:24AM -0600, Alex Williamson wrote:
> >> On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> >>> At present the memory listener used by vfio to keep host IOMMU mappings
> >>> in sync with the guest memory image assumes that if a guest IOMMU
> >>> appears, then it has no existing mappings.
> >>>
> >>> This may not be true if a VFIO device is hotplugged onto a guest bus
> >>> which didn't previously include a VFIO device, and which has existing
> >>> guest IOMMU mappings.
> >>>
> >>> Therefore, use the memory_region_register_iommu_notifier_replay()
> >>> function in order to fix this case, replaying existing guest IOMMU
> >>> mappings, bringing the host IOMMU into sync with the guest IOMMU.
> >>>
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---
> >>>  hw/vfio/common.c | 34 +++++++++++++++++++---------------
> >>>  1 file changed, 19 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>> index daaac48..543c38e 100644
> >>> --- a/hw/vfio/common.c
> >>> +++ b/hw/vfio/common.c
> >>> @@ -312,6 +312,22 @@ out:
> >>>      rcu_read_unlock();
> >>>  }
> >>>  
> >>> +static hwaddr vfio_container_granularity(VFIOContainer *container)
> >>> +{
> >>> +    uint64_t pgsize;
> >>> +
> >>> +    assert(container->iommu_data.iova_pgsizes);
> >>
> >> return (hwaddr)1 << (ffsl(container->iommu_data.iova_pgsizes) - 1;
> > 
> > Ah, yes, that should work.  I didn't do it that way mostly because I
> > tend to confuse myself when I try to remember exactly how ffs
> > semantics work.
> 
> Maybe use ffsll instead of ffsl, in case the code ever runs on a 32-bit
> host instead of a 64-bit host ?

Already done :)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-23 18:55   ` Laurent Vivier
@ 2015-09-23 23:54     ` David Gibson
  2015-09-24  6:59       ` Laurent Vivier
  0 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:54 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: thuth, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3161 bytes --]

On Wed, Sep 23, 2015 at 08:55:01PM +0200, Laurent Vivier wrote:
> 
> 
> On 17/09/2015 15:09, David Gibson wrote:
> > At present the PCI host bridge (PHB) for the pseries machine type has a
> > fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
> > memory via the PAPR paravirtualized IOMMU.
> > 
> > For better support of VFIO devices, we're going to want to allow for
> > different configurations of the DMA window.
> > 
> > Eventually we'll want to allow the guest itself to reconfigure the window
> > via the PAPR dynamic DMA window interface, but as a preliminary this patch
> > allows the user to reconfigure the window with new properties on the PHB
> > device.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr_pci.c          | 7 +++++--
> >  include/hw/pci-host/spapr.h | 3 +--
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index b088491..622c4ac 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1394,7 +1394,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> >      sPAPRTCETable *tcet;
> >      uint32_t nb_table;
> >  
> > -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
> > +    nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
> >      tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> >                                 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
> >      if (!tcet) {
> > @@ -1404,7 +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> >      }
> >  
> >      /* Register default 32bit DMA window */
> > -    memory_region_add_subregion(&sphb->iommu_root, 0,
> > +    memory_region_add_subregion(&sphb->iommu_root, sphb->dma_win_addr,
> >                                  spapr_tce_get_iommu(tcet));
> >  }
> >  
> > @@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] = {
> >                         SPAPR_PCI_IO_WIN_SIZE),
> >      DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
> >                       true),
> > +    /* Default DMA window is 0..1GB */
> > +    DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr, 0),
> > +    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState, dma_win_size, 0x40000000),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> 
> Add them too to the vmstate_spapr_pci ?

No.  At least, not just now.

This is the sort of configuration information that typically isn't
migrated, instead requiring the far end to be set up matching.  Which
is a problem, but not one within the scope of this patch series.  In
any case just transferring the values wouldn't be enough - we'd need
post_load handlers to actually rewire the TCE table accordingly.

We will need to do this once we add dynamic DMA window support, but
again, not in scope just now.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-23 11:08   ` Thomas Huth
@ 2015-09-23 23:56     ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-23 23:56 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3774 bytes --]

On Wed, Sep 23, 2015 at 01:08:34PM +0200, Thomas Huth wrote:
> On 17/09/15 15:09, David Gibson wrote:
> > At present the PCI host bridge (PHB) for the pseries machine type has a
> > fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
> > memory via the PAPR paravirtualized IOMMU.
> > 
> > For better support of VFIO devices, we're going to want to allow for
> > different configurations of the DMA window.
> > 
> > Eventually we'll want to allow the guest itself to reconfigure the window
> > via the PAPR dynamic DMA window interface, but as a preliminary this patch
> > allows the user to reconfigure the window with new properties on the PHB
> > device.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr_pci.c          | 7 +++++--
> >  include/hw/pci-host/spapr.h | 3 +--
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index b088491..622c4ac 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1394,7 +1394,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> >      sPAPRTCETable *tcet;
> >      uint32_t nb_table;
> >  
> > -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT;
> > +    nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
> >      tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> >                                 0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
> >      if (!tcet) {
> > @@ -1404,7 +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> >      }
> >  
> >      /* Register default 32bit DMA window */
> > -    memory_region_add_subregion(&sphb->iommu_root, 0,
> > +    memory_region_add_subregion(&sphb->iommu_root, sphb->dma_win_addr,
> >                                  spapr_tce_get_iommu(tcet));
> >  }
> >  
> > @@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] = {
> >                         SPAPR_PCI_IO_WIN_SIZE),
> >      DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState, dr_enabled,
> >                       true),
> > +    /* Default DMA window is 0..1GB */
> > +    DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr, 0),
> > +    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState, dma_win_size, 0x40000000),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> > index 5322b56..7de5e02 100644
> > --- a/include/hw/pci-host/spapr.h
> > +++ b/include/hw/pci-host/spapr.h
> > @@ -78,6 +78,7 @@ struct sPAPRPHBState {
> >      MemoryRegion memwindow, iowindow, msiwindow;
> >  
> >      uint32_t dma_liobn;
> > +    hwaddr dma_win_addr, dma_win_size;
> 
> Maybe use dma_addr_t for dma_win_addr? And dma_win_size isn't an
> address, so I'd maybe use uint64_t here instead.

So dma_win_addr is being passed to memory_region_add_subregion() which
takes a hwaddr, so I think hwaddr is correct.

> >      AddressSpace iommu_as;
> >      MemoryRegion iommu_root;
> >  
> > @@ -115,8 +116,6 @@ struct sPAPRPHBVFIOState {
> >  
> >  #define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
> >  
> > -#define SPAPR_PCI_DMA32_SIZE         0x40000000
> > -
> >  static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
> >  {
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> 
> Anyway, patch looks fine to me, so:
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
  2015-09-23 11:24     ` Thomas Huth
@ 2015-09-24  0:35       ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-09-24  0:35 UTC (permalink / raw)
  To: Thomas Huth
  Cc: lvivier, aik, gwshan, qemu-devel, Alex Williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]

On Wed, Sep 23, 2015 at 01:24:00PM +0200, Thomas Huth wrote:
> On 17/09/15 18:54, Alex Williamson wrote:
> > On Thu, 2015-09-17 at 23:09 +1000, David Gibson wrote:
> >> Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
> >> all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
> >> this is decided at creation time.
> >>
> >> To support hotplug of VFIO devices, we need to allow a TCE table which
> >> previously didn't allow VFIO devices to be switched so that it can.  This
> >> patch adds an spapr_tce_need_vfio() function to do this, by reallocating
> >> the table in userspace if necessary.
> >>
> >> Currently this doesn't allow the KVM acceleration to be re-enabled if all
> >> the VFIO devices are removed.  That's an optimization for another time.
> > 
> > 
> > Same comment as previous patch, spapr_tce_need_userspace_table() (or
> > something) makes the code much more self documenting.
> 
> May I also add the using the word "need" in a function name is could be
> a little bit confusing here? It's maybe just me, but if I read
> ..._need_...() I first think of a function that just checks something
> and returns a boolean for yes or no, not of a function that changes
> something.
> Could you maybe use something like "..._change_to_..." instead?

Yeah, I never did much like that name, just couldn't think of a better
one.

Hmm.. how about spapr_tce_set_need_vfio(), taking a bool parameter.  I
won't implement the reverse transition just yet, but that gives us an
obvious place to put it when/if we do.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge
  2015-09-23 16:46 ` Laurent Vivier
@ 2015-09-24  1:02   ` David Gibson
  2015-09-24  7:02     ` Laurent Vivier
  0 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-09-24  1:02 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: thuth, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

[-- Attachment #1: Type: text/plain, Size: 5275 bytes --]

On Wed, Sep 23, 2015 at 06:46:17PM +0200, Laurent Vivier wrote:
> 
> 
> On 17/09/2015 15:09, David Gibson wrote:
> > Currently the pseries machine type uses two types of PCI Host Bridge
> > (PHB) devices: "spapr-pci-host-bridge" the 'normal' variant intended
> > for emulated PCI devices, and "spapr-pci-vfio-host-bridge" intended
> > for VFIO devices.
> > 
> > When using VFIO with pseries, a separate spapr-pci-vfio-host-bridge
> > device is needed for every host IOMMU group from which you're using
> > VFIO devices.  This is quite awkward for the user and/or management
> > tools.  It's especially awkward since the current code makes
> > essentially no attempt to detect and warn the user if the wrong sorts
> > of devices are connected to the wrong PHB.
> > 
> > It turns out that the VFIO core code is actually general enough that
> > VFIO devices almost work on the normal spapr-pci-host-bridge device.
> > In fact with the right combination of circumstances they *can* work
> > right now.
> > 
> > spapr-pci-vfio-host-bridge does 3 additional things:
> > 
> >     1) It disables KVM acceleration of the guest IOMMU.  That
> >        acceleration breaks VFIO because it means guest IOMMU updates
> >        bypass the VFIO infrastructure which keeps the host IOMMU in
> >        sync.
> > 
> >     2) It automatically configures the guest PHB's DMA window to match
> >        the capabilities of the host IOMMU, and advertises that to the
> >        guest.
> > 
> >     3) It provides additional handling of EEH (Enhanced Error
> >        Handling) functions.
> > 
> > This patch series:
> >     * Allows VFIO devices to be used on the spapr-pci-host-bridge by
> >       auto-switching the KVM TCE acceleration
> > 
> >     * Adds verification that the host IOMMU can handle the DMA windows
> >       used by guest PHBs
> > 
> >     * Allows the DMA window on the guest PHB to be configured with
> >       device properties.  This can be used to make sure it matches a
> >       host window, but in practice the default setting will already
> >       work with the host IOMMU on all current systems.
> > 
> >     * Adds support to the VFIO core to allow a VFIO device to be
> >       hotplugged onto a bus which doesn't yet have VFIO devices.  This
> >       already worked for systems without a guest-visible IOMMU
> >       (i.e. x86), this series makes it work even with a guest visible
> >       IOMMU.
> > 
> >     * Makes a few related cleanups along the way
> > 
> > This series does NOT allow EEH operations on VFIO devices on the
> > spapr-pci-host-bridge device, so the spapr-pci-vfio-host-bridge device
> > is left in for now.  It turns out there are some serious existing
> > problems in both the qemu EEH implementation and (worse) in the
> > EEH/VFIO kernel interface.  Fixing those is a problem for another day.
> > Maybe tomorrow.
> > 
> > 
> > I've tested basic assignment of an xHCI to a pseries guest, both at
> > startup and with hotplug.  I haven't (yet) tested VFIO on x86 with
> > this series.
> > 
> > This series probably needs to be merged via several different trees.
> > I'm intending to split up as necessary once it's had some review.
> > 
> > David Gibson (10):
> >   vfio: Remove unneeded union from VFIOContainer
> >   vfio: Generalize vfio_listener_region_add failure path
> >   vfio: Check guest IOVA ranges against host IOMMU capabilities
> >   vfio: Record host IOMMU's available IO page sizes
> >   memory: Allow replay of IOMMU mapping notifications
> >   vfio: Allow hotplug of containers onto existing guest IOMMU mappings
> >   spapr_pci: Allow PCI host bridge DMA window to be configured
> >   spapr_iommu: Rename vfio_accel parameter
> >   spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
> >   spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
> > 
> >  hw/ppc/spapr_iommu.c          |  25 ++++++-
> >  hw/ppc/spapr_pci.c            |  13 +++-
> >  hw/vfio/common.c              | 152 +++++++++++++++++++++++++++---------------
> >  include/exec/memory.h         |  16 +++++
> >  include/hw/pci-host/spapr.h   |   3 +-
> >  include/hw/ppc/spapr.h        |   6 +-
> >  include/hw/vfio/vfio-common.h |  21 +++---
> >  memory.c                      |  18 +++++
> >  target-ppc/kvm.c              |   4 +-
> >  target-ppc/kvm_ppc.h          |   2 +-
> >  10 files changed, 184 insertions(+), 76 deletions(-)
> > 
> 
> Just a comment on the form: checkpatch.pl vociferates because of "DOS
> line endings", "trailing whitespace" and "line over 80 characters".

Hmm.. I see the trailing whitespace and over 80 characters warnings
along with a few others.  Nothing about DOS line endings though.
Wondering if that could be something that's come from your mailer.

> 
> <troll>
> 	Please, use "vi" with "set ff=unix"
> </troll>
> 
> Stefan has a good tip to check automatically commits:
> 
> http://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html

Nice, thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-23 23:54     ` David Gibson
@ 2015-09-24  6:59       ` Laurent Vivier
  2015-10-03  0:25         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Laurent Vivier @ 2015-09-24  6:59 UTC (permalink / raw)
  To: David Gibson
  Cc: thuth, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 24/09/2015 01:54, David Gibson wrote:
> On Wed, Sep 23, 2015 at 08:55:01PM +0200, Laurent Vivier wrote:
>> 
>> 
>> On 17/09/2015 15:09, David Gibson wrote:
>>> At present the PCI host bridge (PHB) for the pseries machine
>>> type has a fixed DMA window from 0..1GB (in PCI address space)
>>> which is mapped to real memory via the PAPR paravirtualized
>>> IOMMU.
>>> 
>>> For better support of VFIO devices, we're going to want to
>>> allow for different configurations of the DMA window.
>>> 
>>> Eventually we'll want to allow the guest itself to reconfigure
>>> the window via the PAPR dynamic DMA window interface, but as a
>>> preliminary this patch allows the user to reconfigure the
>>> window with new properties on the PHB device.
>>> 
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- 
>>> hw/ppc/spapr_pci.c          | 7 +++++-- 
>>> include/hw/pci-host/spapr.h | 3 +-- 2 files changed, 6
>>> insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index
>>> b088491..622c4ac 100644 --- a/hw/ppc/spapr_pci.c +++
>>> b/hw/ppc/spapr_pci.c @@ -1394,7 +1394,7 @@ static void
>>> spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp) 
>>> sPAPRTCETable *tcet; uint32_t nb_table;
>>> 
>>> -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT; +
>>> nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT; tcet =
>>> spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn, 0,
>>> SPAPR_TCE_PAGE_SHIFT, nb_table, false); if (!tcet) { @@ -1404,7
>>> +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>> *sphb, Error **errp) }
>>> 
>>> /* Register default 32bit DMA window */ -
>>> memory_region_add_subregion(&sphb->iommu_root, 0, +
>>> memory_region_add_subregion(&sphb->iommu_root,
>>> sphb->dma_win_addr, spapr_tce_get_iommu(tcet)); }
>>> 
>>> @@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] =
>>> { SPAPR_PCI_IO_WIN_SIZE), 
>>> DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState,
>>> dr_enabled, true), +    /* Default DMA window is 0..1GB */ +
>>> DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr,
>>> 0), +    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState,
>>> dma_win_size, 0x40000000), DEFINE_PROP_END_OF_LIST(), };
>> 
>> Add them too to the vmstate_spapr_pci ?
> 
> No.  At least, not just now.
> 
> This is the sort of configuration information that typically isn't 
> migrated, instead requiring the far end to be set up matching.
> Which

I think this is the aim of VMSTATE_UINT64_EQUAL() ?

> is a problem, but not one within the scope of this patch series.
> In any case just transferring the values wouldn't be enough - we'd
> need post_load handlers to actually rewire the TCE table
> accordingly.
> 
> We will need to do this once we add dynamic DMA window support,
> but again, not in scope just now.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlYDn1UACgkQNKT2yavzbFM20ACfcUVIPLObcYh4y5U8CcoLVGoO
FR8AoOgigTMFu7FOh7wI8U+fGNtYv6Ji
=4opa
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge
  2015-09-24  1:02   ` David Gibson
@ 2015-09-24  7:02     ` Laurent Vivier
  0 siblings, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-24  7:02 UTC (permalink / raw)
  To: David Gibson
  Cc: thuth, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 24/09/2015 03:02, David Gibson wrote:
> On Wed, Sep 23, 2015 at 06:46:17PM +0200, Laurent Vivier wrote:
>> 
>> 
>> On 17/09/2015 15:09, David Gibson wrote:
>>> Currently the pseries machine type uses two types of PCI Host
>>> Bridge (PHB) devices: "spapr-pci-host-bridge" the 'normal'
>>> variant intended for emulated PCI devices, and
>>> "spapr-pci-vfio-host-bridge" intended for VFIO devices.
>>> 
>>> When using VFIO with pseries, a separate
>>> spapr-pci-vfio-host-bridge device is needed for every host
>>> IOMMU group from which you're using VFIO devices.  This is
>>> quite awkward for the user and/or management tools.  It's
>>> especially awkward since the current code makes essentially no
>>> attempt to detect and warn the user if the wrong sorts of
>>> devices are connected to the wrong PHB.
>>> 
>>> It turns out that the VFIO core code is actually general enough
>>> that VFIO devices almost work on the normal
>>> spapr-pci-host-bridge device. In fact with the right
>>> combination of circumstances they *can* work right now.
>>> 
>>> spapr-pci-vfio-host-bridge does 3 additional things:
>>> 
>>> 1) It disables KVM acceleration of the guest IOMMU.  That 
>>> acceleration breaks VFIO because it means guest IOMMU updates 
>>> bypass the VFIO infrastructure which keeps the host IOMMU in 
>>> sync.
>>> 
>>> 2) It automatically configures the guest PHB's DMA window to
>>> match the capabilities of the host IOMMU, and advertises that
>>> to the guest.
>>> 
>>> 3) It provides additional handling of EEH (Enhanced Error 
>>> Handling) functions.
>>> 
>>> This patch series: * Allows VFIO devices to be used on the
>>> spapr-pci-host-bridge by auto-switching the KVM TCE
>>> acceleration
>>> 
>>> * Adds verification that the host IOMMU can handle the DMA
>>> windows used by guest PHBs
>>> 
>>> * Allows the DMA window on the guest PHB to be configured with 
>>> device properties.  This can be used to make sure it matches a 
>>> host window, but in practice the default setting will already 
>>> work with the host IOMMU on all current systems.
>>> 
>>> * Adds support to the VFIO core to allow a VFIO device to be 
>>> hotplugged onto a bus which doesn't yet have VFIO devices.
>>> This already worked for systems without a guest-visible IOMMU 
>>> (i.e. x86), this series makes it work even with a guest
>>> visible IOMMU.
>>> 
>>> * Makes a few related cleanups along the way
>>> 
>>> This series does NOT allow EEH operations on VFIO devices on
>>> the spapr-pci-host-bridge device, so the
>>> spapr-pci-vfio-host-bridge device is left in for now.  It turns
>>> out there are some serious existing problems in both the qemu
>>> EEH implementation and (worse) in the EEH/VFIO kernel
>>> interface.  Fixing those is a problem for another day. Maybe
>>> tomorrow.
>>> 
>>> 
>>> I've tested basic assignment of an xHCI to a pseries guest,
>>> both at startup and with hotplug.  I haven't (yet) tested VFIO
>>> on x86 with this series.
>>> 
>>> This series probably needs to be merged via several different
>>> trees. I'm intending to split up as necessary once it's had
>>> some review.
>>> 
>>> David Gibson (10): vfio: Remove unneeded union from
>>> VFIOContainer vfio: Generalize vfio_listener_region_add failure
>>> path vfio: Check guest IOVA ranges against host IOMMU
>>> capabilities vfio: Record host IOMMU's available IO page sizes 
>>> memory: Allow replay of IOMMU mapping notifications vfio: Allow
>>> hotplug of containers onto existing guest IOMMU mappings 
>>> spapr_pci: Allow PCI host bridge DMA window to be configured 
>>> spapr_iommu: Rename vfio_accel parameter spapr_iommu: Provide a
>>> function to switch a TCE table to allowing VFIO spapr_pci:
>>> Allow VFIO devices to work on the normal PCI host bridge
>>> 
>>> hw/ppc/spapr_iommu.c          |  25 ++++++- hw/ppc/spapr_pci.c
>>> |  13 +++- hw/vfio/common.c              | 152
>>> +++++++++++++++++++++++++++--------------- 
>>> include/exec/memory.h         |  16 +++++ 
>>> include/hw/pci-host/spapr.h   |   3 +- include/hw/ppc/spapr.h
>>> |   6 +- include/hw/vfio/vfio-common.h |  21 +++--- memory.c
>>> |  18 +++++ target-ppc/kvm.c              |   4 +- 
>>> target-ppc/kvm_ppc.h          |   2 +- 10 files changed, 184
>>> insertions(+), 76 deletions(-)
>>> 
>> 
>> Just a comment on the form: checkpatch.pl vociferates because of
>> "DOS line endings", "trailing whitespace" and "line over 80
>> characters".
> 
> Hmm.. I see the trailing whitespace and over 80 characters
> warnings along with a few others.  Nothing about DOS line endings
> though. Wondering if that could be something that's come from your
> mailer.

Possible, but very strange: it's the first time it happens (thunderbird)
.

>> 
>> <troll> Please, use "vi" with "set ff=unix" </troll>
>> 
>> Stefan has a good tip to check automatically commits:
>> 
>> http://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchp
l.html
>
>> 
> Nice, thanks.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlYDoA0ACgkQNKT2yavzbFMaDwCgx/b5gq4zARrEqot574NNIKTV
xGUAnjRPykSMtSMCvvwPKtTqotpYN2Jz
=4Chr
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications
  2015-09-23 23:50     ` David Gibson
@ 2015-09-24  7:09       ` Laurent Vivier
  0 siblings, 0 replies; 54+ messages in thread
From: Laurent Vivier @ 2015-09-24  7:09 UTC (permalink / raw)
  To: David Gibson
  Cc: thuth, aik, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 24/09/2015 01:50, David Gibson wrote:
> On Wed, Sep 23, 2015 at 07:04:55PM +0200, Laurent Vivier wrote:
>> 
>> 
>> On 17/09/2015 15:09, David Gibson wrote:
>>> When we have guest visible IOMMUs, we allow notifiers to be
>>> registered which will be informed of all changes to IOMMU
>>> mappings.  This is used by vfio to keep the host IOMMU mappings
>>> in sync with guest IOMMU mappings.
>>> 
>>> However, unlike with a memory region listener, an iommu
>>> notifier won't be told about any mappings which already exist
>>> in the (guest) IOMMU at the time it is registered.  This can
>>> cause problems if hotplugging a VFIO device onto a guest bus
>>> which had existing guest IOMMU mappings, but didn't previously
>>> have an VFIO devices (and hence no host IOMMU mappings).
>>> 
>>> This adds a memory_region_register_iommu_notifier_replay()
>>> function to handle this case.  As well as registering the new
>>> notifier it replays existing mappings.  Because the IOMMU
>>> memory region doesn't internally remember the granularity of
>>> the guest IOMMU it has a small hack where the caller must
>>> specify a granularity at which to replay mappings.
>>> 
>>> If there are finer mappings in the guest IOMMU these will be
>>> reported in the iotlb structures passed to the notifier which
>>> it must handle (probably causing it to flag an error).  This
>>> isn't new - the VFIO iommu notifier must already handle
>>> notifications about guest IOMMU mappings too short for it to
>>> represent in the host IOMMU.
>>> 
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- 
>>> include/exec/memory.h | 16 ++++++++++++++++ memory.c
>>> | 18 ++++++++++++++++++ 2 files changed, 34 insertions(+)
>>> 
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h 
>>> index 5baaf48..3cf145b 100644 --- a/include/exec/memory.h +++
>>> b/include/exec/memory.h @@ -583,6 +583,22 @@ void
>>> memory_region_notify_iommu(MemoryRegion *mr, void
>>> memory_region_register_iommu_notifier(MemoryRegion *mr,
>>> Notifier *n);
>>> 
>>> /** + * memory_region_register_iommu_notifier_replay: register
>>> a notifier + * for changes to IOMMU translation entries, and
>>> replay existing IOMMU + * translations to the new notifier. +
>>> * + * @mr: the memory region to observe + * @n: the notifier to
>>> be added; the notifier receives a pointer to an + *
>>> #IOMMUTLBEntry as the opaque value; the pointer ceases to be +
>>> *     valid on exit from the notifier. + * @granularity:
>>> Minimum page granularity to replay notifications for + *
>>> @is_write: Whether to treat the replay as a translate "write" +
>>> *     through the iommu + */ +void
>>> memory_region_register_iommu_notifier_replay(MemoryRegion *mr,
>>> Notifier *n, +
>>> hwaddr granularity, bool is_write); + +/** *
>>> memory_region_unregister_iommu_notifier: unregister a notifier
>>> for * changes to IOMMU translation entries. * diff --git
>>> a/memory.c b/memory.c index 0d8b2d9..6b5a2f1 100644 ---
>>> a/memory.c +++ b/memory.c @@ -1403,6 +1403,24 @@ void
>>> memory_region_register_iommu_notifier(MemoryRegion *mr,
>>> Notifier *n) notifier_list_add(&mr->iommu_notify, n); }
>>> 
>>> +void memory_region_register_iommu_notifier_replay(MemoryRegion
>>> *mr, Notifier *n, +
>>> hwaddr granularity, bool is_write) +{ +    hwaddr addr; +
>>> IOMMUTLBEntry iotlb; + +
>>> memory_region_register_iommu_notifier(mr, n); + +    for (addr
>>> = 0; +         int128_lt(int128_make64(addr), mr->size);
>> 
>> "addr < memory_region_size(mr)" should be enough.
> 
> Ah, yes, much neater, thanks.

but rethinking about that, you can have an infinite loop (with int128
or with memory_region_size()) if mr->size >= UINT64_MAX:

as hwaddr is a 64bit and a multiple of granularity which is a power of
two. the last value of addr is UINT64 + 1 - granularity, so the next
is (uint64_t)(UINT64 + 1), which is 0, so addr is never >= mr->size.

> 
>>> +         addr += granularity) { + +        iotlb =
>>> mr->iommu_ops->translate(mr, addr, is_write); +        if
>>> (iotlb.perm != IOMMU_NONE) +            n->notify(n, &iotlb); +
>>> } +} + void memory_region_unregister_iommu_notifier(Notifier
>>> *n) { notifier_remove(n);
>>> 
>> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlYDob0ACgkQNKT2yavzbFOnFACcDk+2PHhX/WfCCkTdXKH4XhWi
UYcAoOpe+C+8tzX02VlGTsCAV9ZxiEwQ
=Cnfl
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-09-24  6:59       ` Laurent Vivier
@ 2015-10-03  0:25         ` Alexey Kardashevskiy
  2015-10-05 14:13           ` Paolo Bonzini
  0 siblings, 1 reply; 54+ messages in thread
From: Alexey Kardashevskiy @ 2015-10-03  0:25 UTC (permalink / raw)
  To: Laurent Vivier, David Gibson
  Cc: thuth, gwshan, qemu-devel, alex.williamson, qemu-ppc, pbonzini

On 09/24/2015 04:59 PM, Laurent Vivier wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On 24/09/2015 01:54, David Gibson wrote:
>> On Wed, Sep 23, 2015 at 08:55:01PM +0200, Laurent Vivier wrote:
>>>
>>>
>>> On 17/09/2015 15:09, David Gibson wrote:
>>>> At present the PCI host bridge (PHB) for the pseries machine
>>>> type has a fixed DMA window from 0..1GB (in PCI address space)
>>>> which is mapped to real memory via the PAPR paravirtualized
>>>> IOMMU.
>>>>
>>>> For better support of VFIO devices, we're going to want to
>>>> allow for different configurations of the DMA window.
>>>>
>>>> Eventually we'll want to allow the guest itself to reconfigure
>>>> the window via the PAPR dynamic DMA window interface, but as a
>>>> preliminary this patch allows the user to reconfigure the
>>>> window with new properties on the PHB device.
>>>>
>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> ---
>>>> hw/ppc/spapr_pci.c          | 7 +++++--
>>>> include/hw/pci-host/spapr.h | 3 +-- 2 files changed, 6
>>>> insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index
>>>> b088491..622c4ac 100644 --- a/hw/ppc/spapr_pci.c +++
>>>> b/hw/ppc/spapr_pci.c @@ -1394,7 +1394,7 @@ static void
>>>> spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
>>>> sPAPRTCETable *tcet; uint32_t nb_table;
>>>>
>>>> -    nb_table = SPAPR_PCI_DMA32_SIZE >> SPAPR_TCE_PAGE_SHIFT; +
>>>> nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT; tcet =
>>>> spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn, 0,
>>>> SPAPR_TCE_PAGE_SHIFT, nb_table, false); if (!tcet) { @@ -1404,7
>>>> +1404,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState
>>>> *sphb, Error **errp) }
>>>>
>>>> /* Register default 32bit DMA window */ -
>>>> memory_region_add_subregion(&sphb->iommu_root, 0, +
>>>> memory_region_add_subregion(&sphb->iommu_root,
>>>> sphb->dma_win_addr, spapr_tce_get_iommu(tcet)); }
>>>>
>>>> @@ -1437,6 +1437,9 @@ static Property spapr_phb_properties[] =
>>>> { SPAPR_PCI_IO_WIN_SIZE),
>>>> DEFINE_PROP_BOOL("dynamic-reconfiguration", sPAPRPHBState,
>>>> dr_enabled, true), +    /* Default DMA window is 0..1GB */ +
>>>> DEFINE_PROP_UINT64("dma_win_addr", sPAPRPHBState, dma_win_addr,
>>>> 0), +    DEFINE_PROP_UINT64("dma_win_size", sPAPRPHBState,
>>>> dma_win_size, 0x40000000), DEFINE_PROP_END_OF_LIST(), };
>>>
>>> Add them too to the vmstate_spapr_pci ?
>>
>> No.  At least, not just now.
>>
>> This is the sort of configuration information that typically isn't
>> migrated, instead requiring the far end to be set up matching.
>> Which
>
> I think this is the aim of VMSTATE_UINT64_EQUAL() ?

We use it only for things which cannot be set via the command line and 
ideally there should be no VMSTATE_*_EQUAL. If something can be set via the 
command line, then the management software (read - libvirt) runs QEMU with 
explicit parameters to guarantee that these are equal. Checking it in 
VMSTATE is too late when migrating with full disk copy - it takes a lot of 
time before the destination QEMU discovers that it cannot actually run.


>
>> is a problem, but not one within the scope of this patch series.
>> In any case just transferring the values wouldn't be enough - we'd
>> need post_load handlers to actually rewire the TCE table
>> accordingly.
>>
>> We will need to do this once we add dynamic DMA window support,
>> but again, not in scope just now.
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iEYEARECAAYFAlYDn1UACgkQNKT2yavzbFM20ACfcUVIPLObcYh4y5U8CcoLVGoO
> FR8AoOgigTMFu7FOh7wI8U+fGNtYv6Ji
> =4opa
> -----END PGP SIGNATURE-----
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-10-03  0:25         ` Alexey Kardashevskiy
@ 2015-10-05 14:13           ` Paolo Bonzini
  2015-10-06  3:25             ` David Gibson
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Bonzini @ 2015-10-05 14:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Laurent Vivier, David Gibson
  Cc: thuth, Michael Roth, Juan Quintela, gwshan, qemu-devel,
	alex.williamson, qemu-ppc



On 03/10/2015 02:25, Alexey Kardashevskiy wrote:
>> I think this is the aim of VMSTATE_UINT64_EQUAL() ?
> 
> We use it only for things which cannot be set via the command line
> and ideally there should be no VMSTATE_*_EQUAL. If something can be
> set via the command line, then the management software (read -
> libvirt) runs QEMU with explicit parameters to guarantee that these
> are equal.

VMSTATE_*_EQUAL is used when a value is later used as e.g. the size of
an array.  It basically provides bounds checking for the subsequent
array, avoiding that an invalid migration file or an error issuing the
QEMU command on the destination transforms into a buffer overflow.

Michael Roth did most of this work, IIRC.  Documenting it in
docs/migration.txt would be nice.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-10-05 14:13           ` Paolo Bonzini
@ 2015-10-06  3:25             ` David Gibson
  2015-10-06  4:18               ` David Gibson
  0 siblings, 1 reply; 54+ messages in thread
From: David Gibson @ 2015-10-06  3:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Laurent Vivier, thuth, Michael Roth, Juan Quintela,
	Alexey Kardashevskiy, gwshan, qemu-devel, alex.williamson,
	qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Mon, Oct 05, 2015 at 04:13:30PM +0200, Paolo Bonzini wrote:
> 
> 
> On 03/10/2015 02:25, Alexey Kardashevskiy wrote:
> >> I think this is the aim of VMSTATE_UINT64_EQUAL() ?
> > 
> > We use it only for things which cannot be set via the command line
> > and ideally there should be no VMSTATE_*_EQUAL. If something can be
> > set via the command line, then the management software (read -
> > libvirt) runs QEMU with explicit parameters to guarantee that these
> > are equal.
> 
> VMSTATE_*_EQUAL is used when a value is later used as e.g. the size of
> an array.  It basically provides bounds checking for the subsequent
> array, avoiding that an invalid migration file or an error issuing the
> QEMU command on the destination transforms into a buffer overflow.
> 
> Michael Roth did most of this work, IIRC.  Documenting it in
> docs/migration.txt would be nice.

Ah.. which means we probably should use VMSTATE_*_EQUAL here since the
window size determines the size of the array of actual TCEs to follow
shortly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured
  2015-10-06  3:25             ` David Gibson
@ 2015-10-06  4:18               ` David Gibson
  0 siblings, 0 replies; 54+ messages in thread
From: David Gibson @ 2015-10-06  4:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Laurent Vivier, thuth, Michael Roth, Juan Quintela,
	Alexey Kardashevskiy, gwshan, qemu-devel, alex.williamson,
	qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1582 bytes --]

On Tue, Oct 06, 2015 at 02:25:07PM +1100, David Gibson wrote:
> On Mon, Oct 05, 2015 at 04:13:30PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 03/10/2015 02:25, Alexey Kardashevskiy wrote:
> > >> I think this is the aim of VMSTATE_UINT64_EQUAL() ?
> > > 
> > > We use it only for things which cannot be set via the command line
> > > and ideally there should be no VMSTATE_*_EQUAL. If something can be
> > > set via the command line, then the management software (read -
> > > libvirt) runs QEMU with explicit parameters to guarantee that these
> > > are equal.
> > 
> > VMSTATE_*_EQUAL is used when a value is later used as e.g. the size of
> > an array.  It basically provides bounds checking for the subsequent
> > array, avoiding that an invalid migration file or an error issuing the
> > QEMU command on the destination transforms into a buffer overflow.
> > 
> > Michael Roth did most of this work, IIRC.  Documenting it in
> > docs/migration.txt would be nice.
> 
> Ah.. which means we probably should use VMSTATE_*_EQUAL here since the
> window size determines the size of the array of actual TCEs to follow
> shortly.

Wait.. no we don't.

The vmstate for the sPAPRTCETable object which actually holds the
IOMMU page table information already has a suitable VMSTATE_*_EQUAL to
protect the variable sized array, so we don't need another one here.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2015-10-06  4:18 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-17 13:09 [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 01/10] vfio: Remove unneeded union from VFIOContainer David Gibson
2015-09-18  6:15   ` Alexey Kardashevskiy
2015-09-23 10:31   ` Thomas Huth
2015-09-23 23:14     ` David Gibson
2015-09-23 13:18   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 02/10] vfio: Generalize vfio_listener_region_add failure path David Gibson
2015-09-23  9:13   ` Thomas Huth
2015-09-23 13:31   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 03/10] vfio: Check guest IOVA ranges against host IOMMU capabilities David Gibson
2015-09-18  6:38   ` Alexey Kardashevskiy
2015-09-23 10:10   ` Thomas Huth
2015-09-23 11:07     ` David Gibson
2015-09-23 23:43       ` David Gibson
2015-09-23 14:26   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 04/10] vfio: Record host IOMMU's available IO page sizes David Gibson
2015-09-23 10:29   ` Thomas Huth
2015-09-23 14:30   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 05/10] memory: Allow replay of IOMMU mapping notifications David Gibson
2015-09-23 10:40   ` Thomas Huth
2015-09-23 16:35     ` Laurent Vivier
2015-09-23 23:47     ` David Gibson
2015-09-23 17:04   ` Laurent Vivier
2015-09-23 23:50     ` David Gibson
2015-09-24  7:09       ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 06/10] vfio: Allow hotplug of containers onto existing guest IOMMU mappings David Gibson
2015-09-17 16:54   ` Alex Williamson
2015-09-17 23:31     ` David Gibson
2015-09-23 11:02       ` Thomas Huth
2015-09-23 23:50         ` David Gibson
2015-09-23 18:44   ` Laurent Vivier
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 07/10] spapr_pci: Allow PCI host bridge DMA window to be configured David Gibson
2015-09-23 11:08   ` Thomas Huth
2015-09-23 23:56     ` David Gibson
2015-09-23 18:55   ` Laurent Vivier
2015-09-23 23:54     ` David Gibson
2015-09-24  6:59       ` Laurent Vivier
2015-10-03  0:25         ` Alexey Kardashevskiy
2015-10-05 14:13           ` Paolo Bonzini
2015-10-06  3:25             ` David Gibson
2015-10-06  4:18               ` David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 08/10] spapr_iommu: Rename vfio_accel parameter David Gibson
2015-09-17 16:54   ` Alex Williamson
2015-09-17 23:34     ` David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 09/10] spapr_iommu: Provide a function to switch a TCE table to allowing VFIO David Gibson
2015-09-17 16:54   ` Alex Williamson
2015-09-23 11:24     ` Thomas Huth
2015-09-24  0:35       ` David Gibson
2015-09-17 13:09 ` [Qemu-devel] [RFC PATCH 10/10] spapr_pci: Allow VFIO devices to work on the normal PCI host bridge David Gibson
2015-09-17 16:54 ` [Qemu-devel] [RFC PATCH 00/10] pseries: Allow VFIO devices on spapr-pci-host-bridge Alex Williamson
2015-09-23 11:26 ` Thomas Huth
2015-09-23 16:46 ` Laurent Vivier
2015-09-24  1:02   ` David Gibson
2015-09-24  7:02     ` Laurent Vivier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.