All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH qemu v3 0/4] vfio-pci, spapr: Allow in-kernel acceleration
@ 2017-04-01 12:37 Alexey Kardashevskiy
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion Alexey Kardashevskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-01 12:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson,
	Paolo Bonzini

This is my current working tree to support kernel's
"powerpc/kvm/vfio: Enable in-kernel acceleration".

Changes:
v3:
* fixed multiple architectures with respect to IOMMU MR
* removed sPAPRIOMMUMemoryRegion

v2:
* QOM'fy of IOMMUMemoryRegion
* fix comments from v1 review

Please comment. Thanks.



Alexey Kardashevskiy (4):
  memory/iommu: QOM'fy IOMMU MemoryRegion
  vfio-pci: Reorder group-to-container attaching
  vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group
    attached to LIOBN
  spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device

 hw/s390x/s390-pci-bus.h       |   2 +-
 include/exec/memory.h         |  64 ++++++++++++++++++------
 include/hw/i386/intel_iommu.h |   2 +-
 include/hw/ppc/spapr.h        |   4 +-
 include/hw/vfio/vfio-common.h |   4 +-
 include/qemu/typedefs.h       |   1 +
 target/ppc/kvm_ppc.h          |   6 +++
 exec.c                        |  16 +++---
 hw/dma/sun4m_iommu.c          |   2 +-
 hw/i386/amd_iommu.c           |  11 ++--
 hw/i386/intel_iommu.c         |  14 +++---
 hw/ppc/spapr_iommu.c          |  29 ++++++++---
 hw/s390x/s390-pci-bus.c       |   6 +--
 hw/s390x/s390-pci-inst.c      |   8 +--
 hw/vfio/common.c              |  41 ++++++++++-----
 hw/vfio/spapr.c               |  34 ++++++++++++-
 memory.c                      | 114 +++++++++++++++++++++++++++++-------------
 target/ppc/kvm.c              |   7 ++-
 hw/vfio/trace-events          |   1 +
 19 files changed, 265 insertions(+), 101 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion
  2017-04-01 12:37 [Qemu-devel] [RFC PATCH qemu v3 0/4] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
@ 2017-04-01 12:37 ` Alexey Kardashevskiy
  2017-04-03  2:26   ` David Gibson
  2017-04-03 12:53   ` Philippe Mathieu-Daudé
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 2/4] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-01 12:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson,
	Paolo Bonzini

This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.

This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.

This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.

This defines memory_region_init_iommu_type() to allow creating
IOMMUMemoryRegion subclasses. In order to support custom QOM type,
this splits memory_region_init() to object_initialize() +
memory_region_do_init.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v3:
* added mr->is_iommu
* updated i386/x86_64/s390/sun
---
 hw/s390x/s390-pci-bus.h       |   2 +-
 include/exec/memory.h         |  64 ++++++++++++++++++------
 include/hw/i386/intel_iommu.h |   2 +-
 include/hw/ppc/spapr.h        |   3 +-
 include/hw/vfio/vfio-common.h |   2 +-
 include/qemu/typedefs.h       |   1 +
 exec.c                        |  16 +++---
 hw/dma/sun4m_iommu.c          |   2 +-
 hw/i386/amd_iommu.c           |  11 ++--
 hw/i386/intel_iommu.c         |  14 +++---
 hw/ppc/spapr_iommu.c          |  20 +++++---
 hw/s390x/s390-pci-bus.c       |   6 +--
 hw/s390x/s390-pci-inst.c      |   8 +--
 hw/vfio/common.c              |  12 +++--
 hw/vfio/spapr.c               |   3 +-
 memory.c                      | 114 +++++++++++++++++++++++++++++-------------
 16 files changed, 188 insertions(+), 92 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index dcbf4820c9..441ddbedcb 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -267,7 +267,7 @@ typedef struct S390PCIIOMMU {
     S390PCIBusDevice *pbdev;
     AddressSpace as;
     MemoryRegion mr;
-    MemoryRegion iommu_mr;
+    IOMMUMemoryRegion iommu_mr;
     bool enabled;
     uint64_t g_iota;
     uint64_t pba;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e39256ad03..29d59f4f7f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -37,6 +37,10 @@
 #define MEMORY_REGION(obj) \
         OBJECT_CHECK(MemoryRegion, (obj), TYPE_MEMORY_REGION)
 
+#define TYPE_IOMMU_MEMORY_REGION "qemu:iommu-memory-region"
+#define IOMMU_MEMORY_REGION(obj) \
+        OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_IOMMU_MEMORY_REGION)
+
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
 
@@ -167,11 +171,12 @@ typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
 
 struct MemoryRegionIOMMUOps {
     /* Return a TLB entry that contains a given address. */
-    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
+    IOMMUTLBEntry (*translate)(IOMMUMemoryRegion *iommu, hwaddr addr,
+                               bool is_write);
     /* Returns minimum supported page size */
-    uint64_t (*get_min_page_size)(MemoryRegion *iommu);
+    uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
     /* Called when IOMMU Notifier flag changed */
-    void (*notify_flag_changed)(MemoryRegion *iommu,
+    void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
                                 IOMMUNotifierFlag old_flags,
                                 IOMMUNotifierFlag new_flags);
 };
@@ -195,7 +200,6 @@ struct MemoryRegion {
     uint8_t dirty_log_mask;
     RAMBlock *ram_block;
     Object *owner;
-    const MemoryRegionIOMMUOps *iommu_ops;
 
     const MemoryRegionOps *ops;
     void *opaque;
@@ -218,6 +222,13 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
+    bool is_iommu;
+};
+
+struct IOMMUMemoryRegion {
+    MemoryRegion parent_obj;
+
+    const MemoryRegionIOMMUOps *iommu_ops;
     QLIST_HEAD(, IOMMUNotifier) iommu_notify;
     IOMMUNotifierFlag iommu_notify_flags;
 };
@@ -555,19 +566,39 @@ static inline void memory_region_init_reservation(MemoryRegion *mr,
 }
 
 /**
+ * memory_region_init_iommu_type: Initialize a memory region of a custom type
+ * that translates addresses
+ *
+ * An IOMMU region translates addresses and forwards accesses to a target
+ * memory region.
+ *
+ * @typename: QOM class name
+ * @mr: the #IOMMUMemoryRegion to be initialized
+ * @owner: the object that tracks the region's reference count
+ * @ops: a function that translates addresses into the @target region
+ * @name: used for debugging; not visible to the user or ABI
+ * @size: size of the region.
+ */
+void memory_region_init_iommu_type(const char *mrtypename,
+                                   IOMMUMemoryRegion *iommumr,
+                                   Object *owner,
+                                   const MemoryRegionIOMMUOps *ops,
+                                   const char *name,
+                                   uint64_t size);
+/**
  * memory_region_init_iommu: Initialize a memory region that translates
  * addresses
  *
  * An IOMMU region translates addresses and forwards accesses to a target
  * memory region.
  *
- * @mr: the #MemoryRegion to be initialized
+ * @mr: the #IOMMUMemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
  * @ops: a function that translates addresses into the @target region
  * @name: used for debugging; not visible to the user or ABI
  * @size: size of the region.
  */
-void memory_region_init_iommu(MemoryRegion *mr,
+void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
                               struct Object *owner,
                               const MemoryRegionIOMMUOps *ops,
                               const char *name,
@@ -622,20 +653,25 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
 }
 
 /**
- * memory_region_is_iommu: check whether a memory region is an iommu
+ * memory_region_get_iommu: check whether a memory region is an iommu
  *
- * Returns %true is a memory region is an iommu.
+ * Returns pointer to IOMMUMemoryRegion if a memory region is an iommu,
+ * otherwise NULL.
  *
  * @mr: the memory region being queried
  */
-static inline bool memory_region_is_iommu(MemoryRegion *mr)
+static inline IOMMUMemoryRegion *memory_region_get_iommu(MemoryRegion *mr)
 {
     if (mr->alias) {
-        return memory_region_is_iommu(mr->alias);
+        return memory_region_get_iommu(mr->alias);
     }
-    return mr->iommu_ops;
+    if (mr->is_iommu) {
+        return (IOMMUMemoryRegion *) mr;
+    }
+    return NULL;
 }
 
+#define memory_region_is_iommu(mr) (memory_region_get_iommu(mr) != NULL)
 
 /**
  * memory_region_iommu_get_min_page_size: get minimum supported page size
@@ -645,7 +681,7 @@ static inline bool memory_region_is_iommu(MemoryRegion *mr)
  *
  * @mr: the memory region being queried
  */
-uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
+uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *mr);
 
 /**
  * memory_region_notify_iommu: notify a change in an IOMMU translation entry.
@@ -664,7 +700,7 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
  *         replaces all old entries for the same virtual I/O address range.
  *         Deleted entries have .@perm == 0.
  */
-void memory_region_notify_iommu(MemoryRegion *mr,
+void memory_region_notify_iommu(IOMMUMemoryRegion *mr,
                                 IOMMUTLBEntry entry);
 
 /**
@@ -689,7 +725,7 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
  * @is_write: Whether to treat the replay as a translate "write"
  *     through the iommu
  */
-void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
+void memory_region_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n,
                                 bool is_write);
 
 /**
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index fe645aa93a..34f1b61957 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -82,7 +82,7 @@ struct VTDAddressSpace {
     PCIBus *bus;
     uint8_t devfn;
     AddressSpace as;
-    MemoryRegion iommu;
+    IOMMUMemoryRegion iommu;
     MemoryRegion iommu_ir;      /* Interrupt region: 0xfeeXXXXX */
     IntelIOMMUState *iommu_state;
     VTDContextCacheEntry context_cache_entry;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index e27de64d31..6997ed7e98 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -590,7 +590,8 @@ struct sPAPRTCETable {
     bool bypass;
     bool need_vfio;
     int fd;
-    MemoryRegion root, iommu;
+    MemoryRegion root;
+    IOMMUMemoryRegion iommu;
     struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
     QLIST_ENTRY(sPAPRTCETable) list;
 };
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c582de18c9..7a4135ae6f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -94,7 +94,7 @@ typedef struct VFIOContainer {
 
 typedef struct VFIOGuestIOMMU {
     VFIOContainer *container;
-    MemoryRegion *iommu;
+    IOMMUMemoryRegion *iommu;
     hwaddr iommu_offset;
     IOMMUNotifier n;
     QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index e95f28cfec..b45f71ec11 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -45,6 +45,7 @@ typedef struct MachineState MachineState;
 typedef struct MemoryListener MemoryListener;
 typedef struct MemoryMappingList MemoryMappingList;
 typedef struct MemoryRegion MemoryRegion;
+typedef struct IOMMUMemoryRegion IOMMUMemoryRegion;
 typedef struct MemoryRegionCache MemoryRegionCache;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
diff --git a/exec.c b/exec.c
index e57a8a2178..bbd8df7a9d 100644
--- a/exec.c
+++ b/exec.c
@@ -462,20 +462,20 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
 {
     IOMMUTLBEntry iotlb = {0};
     MemoryRegionSection *section;
-    MemoryRegion *mr;
+    IOMMUMemoryRegion *iommumr;
 
     for (;;) {
         AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
         section = address_space_lookup_region(d, addr, false);
         addr = addr - section->offset_within_address_space
                + section->offset_within_region;
-        mr = section->mr;
 
-        if (!mr->iommu_ops) {
+        iommumr = memory_region_get_iommu(section->mr);
+        if (!iommumr) {
             break;
         }
 
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
         if (!(iotlb.perm & (1 << is_write))) {
             iotlb.target_as = NULL;
             break;
@@ -497,17 +497,19 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
     IOMMUTLBEntry iotlb;
     MemoryRegionSection *section;
     MemoryRegion *mr;
+    IOMMUMemoryRegion *iommumr;
 
     for (;;) {
         AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
         section = address_space_translate_internal(d, addr, &addr, plen, true);
         mr = section->mr;
 
-        if (!mr->iommu_ops) {
+        iommumr = memory_region_get_iommu(mr);
+        if (!iommumr) {
             break;
         }
 
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
         addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                 | (addr & iotlb.addr_mask));
         *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
@@ -538,7 +540,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
 
     section = address_space_translate_internal(d, addr, xlat, plen, false);
 
-    assert(!section->mr->iommu_ops);
+    assert(!memory_region_is_iommu(section->mr));
     return section;
 }
 #endif
diff --git a/hw/dma/sun4m_iommu.c b/hw/dma/sun4m_iommu.c
index b3cbc54c23..539115b629 100644
--- a/hw/dma/sun4m_iommu.c
+++ b/hw/dma/sun4m_iommu.c
@@ -134,7 +134,7 @@
 typedef struct IOMMUState {
     SysBusDevice parent_obj;
 
-    MemoryRegion iomem;
+    IOMMUMemoryRegion iomem;
     uint32_t regs[IOMMU_NREGS];
     hwaddr iostart;
     qemu_irq irq;
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index f86a40aa30..e76c29aec6 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -51,7 +51,7 @@ struct AMDVIAddressSpace {
     uint8_t bus_num;            /* bus number                           */
     uint8_t devfn;              /* device function                      */
     AMDVIState *iommu_state;    /* AMDVI - one per machine              */
-    MemoryRegion iommu;         /* Device's address translation region  */
+    IOMMUMemoryRegion iommu;    /* Device's address translation region  */
     MemoryRegion iommu_ir;      /* Device's interrupt remapping region  */
     AddressSpace as;            /* device's corresponding address space */
 };
@@ -986,7 +986,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
     return addr >= AMDVI_INT_ADDR_FIRST && addr <= AMDVI_INT_ADDR_LAST;
 }
 
-static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
+static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
                                      bool is_write)
 {
     AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
@@ -1045,8 +1045,9 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
         memory_region_init_iommu(&iommu_as[devfn]->iommu, OBJECT(s),
                                  &s->iommu_ops, "amd-iommu", UINT64_MAX);
-        address_space_init(&iommu_as[devfn]->as, &iommu_as[devfn]->iommu,
-                           "amd-iommu");
+	address_space_init(&iommu_as[devfn]->as,
+			   MEMORY_REGION(&iommu_as[devfn]->iommu),
+			   "amd-iommu");
     }
     return &iommu_as[devfn]->as;
 }
@@ -1066,7 +1067,7 @@ static const MemoryRegionOps mmio_mem_ops = {
     }
 };
 
-static void amdvi_iommu_notify_flag_changed(MemoryRegion *iommu,
+static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
                                             IOMMUNotifierFlag old,
                                             IOMMUNotifierFlag new)
 {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 22d8226e43..233fa75b64 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1457,7 +1457,8 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
     entry.iova = addr;
     entry.perm = IOMMU_NONE;
     entry.translated_addr = 0;
-    memory_region_notify_iommu(entry.target_as->root, entry);
+    memory_region_notify_iommu(memory_region_get_iommu(entry.target_as->root),
+			       entry);
 
 done:
     return true;
@@ -1968,7 +1969,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
     }
 }
 
-static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
+static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
                                          bool is_write)
 {
     VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
@@ -2000,7 +2001,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
     return ret;
 }
 
-static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
+static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
                                           IOMMUNotifierFlag old,
                                           IOMMUNotifierFlag new)
 {
@@ -2394,10 +2395,11 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
         memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
                               &vtd_mem_ir_ops, s, "intel_iommu_ir",
                               VTD_INTERRUPT_ADDR_SIZE);
-        memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
-                                    &vtd_dev_as->iommu_ir);
+	memory_region_add_subregion(MEMORY_REGION(&vtd_dev_as->iommu),
+				    VTD_INTERRUPT_ADDR_FIRST,
+				    &vtd_dev_as->iommu_ir);
         address_space_init(&vtd_dev_as->as,
-                           &vtd_dev_as->iommu, name);
+                           MEMORY_REGION(&vtd_dev_as->iommu), name);
     }
     return vtd_dev_as;
 }
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 9e30e148d6..5051110b9d 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -110,7 +110,8 @@ static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
 }
 
 /* Called from RCU critical section */
-static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
+static IOMMUTLBEntry spapr_tce_translate_iommu(IOMMUMemoryRegion *iommu,
+                                               hwaddr addr,
                                                bool is_write)
 {
     sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
@@ -150,14 +151,14 @@ static void spapr_tce_table_pre_save(void *opaque)
                                tcet->bus_offset, tcet->page_shift);
 }
 
-static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
+static uint64_t spapr_tce_get_min_page_size(IOMMUMemoryRegion *iommu)
 {
     sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
 
     return 1ULL << tcet->page_shift;
 }
 
-static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
+static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
                                           IOMMUNotifierFlag old,
                                           IOMMUNotifierFlag new)
 {
@@ -265,7 +266,9 @@ static int spapr_tce_table_realize(DeviceState *dev)
     memory_region_init(&tcet->root, tcetobj, tmp, UINT64_MAX);
 
     snprintf(tmp, sizeof(tmp), "tce-iommu-%x", tcet->liobn);
-    memory_region_init_iommu(&tcet->iommu, tcetobj, &spapr_iommu_ops, tmp, 0);
+    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION,
+                                  &tcet->iommu, tcetobj, &spapr_iommu_ops,
+                                  tmp, 0);
 
     QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
 
@@ -341,9 +344,10 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
                                         &tcet->fd,
                                         tcet->need_vfio);
 
-    memory_region_set_size(&tcet->iommu,
+    memory_region_set_size(MEMORY_REGION(&tcet->iommu),
                            (uint64_t)tcet->nb_table << tcet->page_shift);
-    memory_region_add_subregion(&tcet->root, tcet->bus_offset, &tcet->iommu);
+    memory_region_add_subregion(&tcet->root, tcet->bus_offset,
+                                MEMORY_REGION(&tcet->iommu));
 }
 
 void spapr_tce_table_disable(sPAPRTCETable *tcet)
@@ -352,8 +356,8 @@ void spapr_tce_table_disable(sPAPRTCETable *tcet)
         return;
     }
 
-    memory_region_del_subregion(&tcet->root, &tcet->iommu);
-    memory_region_set_size(&tcet->iommu, 0);
+    memory_region_del_subregion(&tcet->root, MEMORY_REGION(&tcet->iommu));
+    memory_region_set_size(MEMORY_REGION(&tcet->iommu), 0);
 
     spapr_tce_free_table(tcet->table, tcet->fd, tcet->nb_table);
     tcet->fd = -1;
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 69b0291e8a..27e7336a76 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -354,7 +354,7 @@ out:
     return pte;
 }
 
-static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *mr, hwaddr addr,
+static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
                                           bool is_write)
 {
     uint64_t pte;
@@ -523,14 +523,14 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
     memory_region_init_iommu(&iommu->iommu_mr, OBJECT(&iommu->mr),
                              &s390_iommu_ops, name, iommu->pal + 1);
     iommu->enabled = true;
-    memory_region_add_subregion(&iommu->mr, 0, &iommu->iommu_mr);
+    memory_region_add_subregion(&iommu->mr, 0, MEMORY_REGION(&iommu->iommu_mr));
     g_free(name);
 }
 
 void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
 {
     iommu->enabled = false;
-    memory_region_del_subregion(&iommu->mr, &iommu->iommu_mr);
+    memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr));
     object_unparent(OBJECT(&iommu->iommu_mr));
 }
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index d2a8c0a083..b4fe4da798 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -561,7 +561,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
     S390PCIIOMMU *iommu;
     hwaddr start, end;
     IOMMUTLBEntry entry;
-    MemoryRegion *mr;
+    IOMMUMemoryRegion *iommumr;
 
     cpu_synchronize_state(CPU(cpu));
 
@@ -620,9 +620,9 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
         goto out;
     }
 
-    mr = &iommu->iommu_mr;
+    iommumr = &iommu->iommu_mr;
     while (start < end) {
-        entry = mr->iommu_ops->translate(mr, start, 0);
+        entry = iommumr->iommu_ops->translate(iommumr, start, 0);
 
         if (!entry.translated_addr) {
             pbdev->state = ZPCI_FS_ERROR;
@@ -633,7 +633,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
             goto out;
         }
 
-        memory_region_notify_iommu(mr, entry);
+        memory_region_notify_iommu(iommumr, entry);
         start += entry.addr_mask + 1;
     }
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f3ba9b9007..ab95db689c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -465,6 +465,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
+        IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
 
         trace_vfio_listener_region_add_iommu(iova, end);
         /*
@@ -474,7 +475,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * device emulation the VFIO iommu handles to use).
          */
         giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
+        giommu->iommu = iommumr;
         giommu->iommu_offset = section->offset_within_address_space -
                                section->offset_within_region;
         giommu->container = container;
@@ -482,7 +483,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
         giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+        memory_region_register_iommu_notifier(section->mr, &giommu->n);
         memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
 
         return;
@@ -550,8 +551,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
         VFIOGuestIOMMU *giommu;
 
         QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
-            if (giommu->iommu == section->mr) {
-                memory_region_unregister_iommu_notifier(giommu->iommu,
+            if (MEMORY_REGION(giommu->iommu) == section->mr) {
+                memory_region_unregister_iommu_notifier(section->mr,
                                                         &giommu->n);
                 QLIST_REMOVE(giommu, giommu_next);
                 g_free(giommu);
@@ -1141,7 +1142,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
         QLIST_REMOVE(container, next);
 
         QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
-            memory_region_unregister_iommu_notifier(giommu->iommu, &giommu->n);
+            memory_region_unregister_iommu_notifier(
+                    MEMORY_REGION(giommu->iommu), &giommu->n);
             QLIST_REMOVE(giommu, giommu_next);
             g_free(giommu);
         }
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 4409bcc0d7..551870d46b 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -143,7 +143,8 @@ int vfio_spapr_create_window(VFIOContainer *container,
                              hwaddr *pgsize)
 {
     int ret;
-    unsigned pagesize = memory_region_iommu_get_min_page_size(section->mr);
+    IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
+    unsigned pagesize = memory_region_iommu_get_min_page_size(iommumr);
     unsigned entries, pages;
     struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
 
diff --git a/memory.c b/memory.c
index 4c95aaf39c..62796536dc 100644
--- a/memory.c
+++ b/memory.c
@@ -975,12 +975,11 @@ static char *memory_region_escape_name(const char *name)
     return escaped;
 }
 
-void memory_region_init(MemoryRegion *mr,
-                        Object *owner,
-                        const char *name,
-                        uint64_t size)
+static void memory_region_do_init(MemoryRegion *mr,
+                                  Object *owner,
+                                  const char *name,
+                                  uint64_t size)
 {
-    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
     mr->size = int128_make64(size);
     if (size == UINT64_MAX) {
         mr->size = int128_2_64();
@@ -1004,6 +1003,15 @@ void memory_region_init(MemoryRegion *mr,
     }
 }
 
+void memory_region_init(MemoryRegion *mr,
+                        Object *owner,
+                        const char *name,
+                        uint64_t size)
+{
+    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
+    memory_region_do_init(mr, owner, name, size);
+}
+
 static void memory_region_get_addr(Object *obj, Visitor *v, const char *name,
                                    void *opaque, Error **errp)
 {
@@ -1090,6 +1098,13 @@ static void memory_region_initfn(Object *obj)
                         NULL, NULL, &error_abort);
 }
 
+static void iommu_memory_region_initfn(Object *obj)
+{
+    MemoryRegion *mr = MEMORY_REGION(obj);
+
+    mr->is_iommu = true;
+}
+
 static uint64_t unassigned_mem_read(void *opaque, hwaddr addr,
                                     unsigned size)
 {
@@ -1473,17 +1488,33 @@ void memory_region_init_rom_device(MemoryRegion *mr,
     mr->ram_block = qemu_ram_alloc(size, mr, errp);
 }
 
-void memory_region_init_iommu(MemoryRegion *mr,
-                              Object *owner,
-                              const MemoryRegionIOMMUOps *ops,
-                              const char *name,
-                              uint64_t size)
+void memory_region_init_iommu_type(const char *mrtypename,
+                                   IOMMUMemoryRegion *iommumr,
+                                   Object *owner,
+                                   const MemoryRegionIOMMUOps *ops,
+                                   const char *name,
+                                   uint64_t size)
 {
-    memory_region_init(mr, owner, name, size);
-    mr->iommu_ops = ops,
+    struct MemoryRegion *mr;
+    size_t instance_size = object_type_get_instance_size(mrtypename);
+
+    object_initialize(iommumr, instance_size, mrtypename);
+    mr = MEMORY_REGION(iommumr);
+    memory_region_do_init(mr, owner, name, size);
+    iommumr->iommu_ops = ops,
     mr->terminates = true;  /* then re-forwards */
-    QLIST_INIT(&mr->iommu_notify);
-    mr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
+    QLIST_INIT(&iommumr->iommu_notify);
+    iommumr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
+}
+
+void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
+                              Object *owner,
+                              const MemoryRegionIOMMUOps *ops,
+                              const char *name,
+                              uint64_t size)
+{
+    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION, iommumr,
+                                  owner, ops, name, size);
 }
 
 static void memory_region_finalize(Object *obj)
@@ -1578,57 +1609,61 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
     return memory_region_get_dirty_log_mask(mr) & (1 << client);
 }
 
-static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
+static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommumr)
 {
     IOMMUNotifierFlag flags = IOMMU_NOTIFIER_NONE;
     IOMMUNotifier *iommu_notifier;
 
-    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
         flags |= iommu_notifier->notifier_flags;
     }
 
-    if (flags != mr->iommu_notify_flags &&
-        mr->iommu_ops->notify_flag_changed) {
-        mr->iommu_ops->notify_flag_changed(mr, mr->iommu_notify_flags,
-                                           flags);
+    if (flags != iommumr->iommu_notify_flags &&
+        iommumr->iommu_ops->notify_flag_changed) {
+        iommumr->iommu_ops->notify_flag_changed(iommumr,
+                                                iommumr->iommu_notify_flags,
+                                                flags);
     }
 
-    mr->iommu_notify_flags = flags;
+    iommumr->iommu_notify_flags = flags;
 }
 
 void memory_region_register_iommu_notifier(MemoryRegion *mr,
                                            IOMMUNotifier *n)
 {
+    IOMMUMemoryRegion *iommumr;
+
     if (mr->alias) {
         memory_region_register_iommu_notifier(mr->alias, n);
         return;
     }
 
     /* We need to register for at least one bitfield */
+    iommumr = IOMMU_MEMORY_REGION(mr);
     assert(n->notifier_flags != IOMMU_NOTIFIER_NONE);
-    QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
-    memory_region_update_iommu_notify_flags(mr);
+    QLIST_INSERT_HEAD(&iommumr->iommu_notify, n, node);
+    memory_region_update_iommu_notify_flags(iommumr);
 }
 
-uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
+uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommumr)
 {
-    assert(memory_region_is_iommu(mr));
-    if (mr->iommu_ops && mr->iommu_ops->get_min_page_size) {
-        return mr->iommu_ops->get_min_page_size(mr);
+    if (iommumr->iommu_ops && iommumr->iommu_ops->get_min_page_size) {
+        return iommumr->iommu_ops->get_min_page_size(iommumr);
     }
     return TARGET_PAGE_SIZE;
 }
 
-void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
+void memory_region_iommu_replay(IOMMUMemoryRegion *iommumr, IOMMUNotifier *n,
                                 bool is_write)
 {
+    MemoryRegion *mr = MEMORY_REGION(iommumr);
     hwaddr addr, granularity;
     IOMMUTLBEntry iotlb;
 
-    granularity = memory_region_iommu_get_min_page_size(mr);
+    granularity = memory_region_iommu_get_min_page_size(iommumr);
 
     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
         if (iotlb.perm != IOMMU_NONE) {
             n->notify(n, &iotlb);
         }
@@ -1644,21 +1679,24 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
                                              IOMMUNotifier *n)
 {
+    IOMMUMemoryRegion *iommumr;
+
     if (mr->alias) {
         memory_region_unregister_iommu_notifier(mr->alias, n);
         return;
     }
     QLIST_REMOVE(n, node);
-    memory_region_update_iommu_notify_flags(mr);
+    iommumr = IOMMU_MEMORY_REGION(mr);
+    memory_region_update_iommu_notify_flags(iommumr);
 }
 
-void memory_region_notify_iommu(MemoryRegion *mr,
+void memory_region_notify_iommu(IOMMUMemoryRegion *iommumr,
                                 IOMMUTLBEntry entry)
 {
     IOMMUNotifier *iommu_notifier;
     IOMMUNotifierFlag request_flags;
 
-    assert(memory_region_is_iommu(mr));
+    assert(memory_region_is_iommu(MEMORY_REGION(iommumr)));
 
     if (entry.perm & IOMMU_RW) {
         request_flags = IOMMU_NOTIFIER_MAP;
@@ -1666,7 +1704,7 @@ void memory_region_notify_iommu(MemoryRegion *mr,
         request_flags = IOMMU_NOTIFIER_UNMAP;
     }
 
-    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
         if (iommu_notifier->notifier_flags & request_flags) {
             iommu_notifier->notify(iommu_notifier, &entry);
         }
@@ -2660,9 +2698,17 @@ static const TypeInfo memory_region_info = {
     .instance_finalize  = memory_region_finalize,
 };
 
+static const TypeInfo iommu_memory_region_info = {
+    .parent             = TYPE_MEMORY_REGION,
+    .name               = TYPE_IOMMU_MEMORY_REGION,
+    .instance_size      = sizeof(IOMMUMemoryRegion),
+    .instance_init      = iommu_memory_region_initfn,
+};
+
 static void memory_register_types(void)
 {
     type_register_static(&memory_region_info);
+    type_register_static(&iommu_memory_region_info);
 }
 
 type_init(memory_register_types)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC PATCH qemu v3 2/4] vfio-pci: Reorder group-to-container attaching
  2017-04-01 12:37 [Qemu-devel] [RFC PATCH qemu v3 0/4] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion Alexey Kardashevskiy
@ 2017-04-01 12:37 ` Alexey Kardashevskiy
  2017-04-03  2:27   ` David Gibson
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN Alexey Kardashevskiy
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 4/4] spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device Alexey Kardashevskiy
  3 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-01 12:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson,
	Paolo Bonzini

At the moment VFIO PCI device initialization works as follows:
vfio_realize
	vfio_get_group
		vfio_connect_container
			register memory listeners (1)
			update QEMU groups lists
		vfio_kvm_device_add_group

Then (example for pseries) the machine reset hook triggers region_add()
for all regions where listeners from (1) are listening:

ppc_spapr_reset
	spapr_phb_reset
		spapr_tce_table_enable
			memory_region_add_subregion
				vfio_listener_region_add
					vfio_spapr_create_window

This scheme works fine until we need to handle VFIO PCI device hotplug
_and_ we want to enable in-kernel acceleration on, i.e. after PCI hotplug
we need a place to call
ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
(from VFIOGroup), vfio_listener_region_add() seems to be the only place
for this ioctl().

However this only works during boot time because the machine reset
happens strictly after all devices are finalized. When hotplug happens,
vfio_listener_region_add() is called when a memory listener is registered
but when this happens:
1. new group is not added to the container->group_list yet;
2. VFIO KVM device is unaware of the new IOMMU group.

This moves bits around to have all necessary VFIO infrastructure
in place for both initial startup and hotplug cases.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* moved container->initialized back to its correct location
* added missing QLIST_REMOVE()
---
 hw/vfio/common.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ab95db689c..e8188eb3d5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1087,6 +1087,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         goto free_container_exit;
     }
 
+    vfio_kvm_device_add_group(group);
+
+    QLIST_INIT(&container->group_list);
+    QLIST_INSERT_HEAD(&space->containers, container, next);
+
+    group->container = container;
+    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+
     container->listener = vfio_memory_listener;
 
     memory_listener_register(&container->listener, container->space->as);
@@ -1100,14 +1108,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     container->initialized = true;
 
-    QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
-
-    group->container = container;
-    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-
     return 0;
 listener_release_exit:
+    QLIST_REMOVE(group, container_next);
+    QLIST_REMOVE(container, next);
+    vfio_kvm_device_del_group(group);
     vfio_listener_release(container);
 
 free_container_exit:
@@ -1212,8 +1217,6 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 
     QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
-    vfio_kvm_device_add_group(group);
-
     return group;
 
 close_fd_exit:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN
  2017-04-01 12:37 [Qemu-devel] [RFC PATCH qemu v3 0/4] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion Alexey Kardashevskiy
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 2/4] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
@ 2017-04-01 12:37 ` Alexey Kardashevskiy
  2017-04-03  3:01   ` David Gibson
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 4/4] spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device Alexey Kardashevskiy
  3 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-01 12:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson,
	Paolo Bonzini

This implements a notification for a new IOMMU group attached to
sPAPR's logical IO bus (LIOBN) to enable in-kernel TCE acceleration.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/hw/ppc/spapr.h        |  1 +
 include/hw/vfio/vfio-common.h |  2 ++
 hw/ppc/spapr_iommu.c          |  5 +++++
 hw/vfio/common.c              | 10 ++++++++++
 hw/vfio/spapr.c               | 31 +++++++++++++++++++++++++++++++
 hw/vfio/trace-events          |  1 +
 6 files changed, 50 insertions(+)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6997ed7e98..8a1b32f89a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -617,6 +617,7 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
                             uint32_t page_shift, uint64_t bus_offset,
                             uint32_t nb_table);
 void spapr_tce_table_disable(sPAPRTCETable *tcet);
+int spapr_tce_get_fd(sPAPRTCETable *tcet);
 void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio);
 
 MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7a4135ae6f..b99f4af96e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -175,6 +175,8 @@ extern const MemoryListener vfio_prereg_listener;
 int vfio_spapr_create_window(VFIOContainer *container,
                              MemoryRegionSection *section,
                              hwaddr *pgsize);
+int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd,
+                          IOMMUMemoryRegion *iommumr);
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
 
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 5051110b9d..f7531a6408 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -171,6 +171,11 @@ static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
     }
 }
 
+int spapr_tce_get_fd(sPAPRTCETable *tcet)
+{
+    return tcet->fd;
+}
+
 static int spapr_tce_table_post_load(void *opaque, int version_id)
 {
     sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e8188eb3d5..b94b29be15 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -440,6 +440,16 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto fail;
         }
 
+#ifdef CONFIG_KVM
+        if (kvm_enabled()) {
+            VFIOGroup *group;
+
+            QLIST_FOREACH(group, &container->group_list, container_next) {
+                vfio_spapr_notify_kvm(vfio_kvm_device_fd, group->fd,
+                                      IOMMU_MEMORY_REGION(section->mr));
+            }
+        }
+#endif
         vfio_host_win_add(container, section->offset_within_address_space,
                           section->offset_within_address_space +
                           int128_get64(section->size) - 1, pgsize);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 551870d46b..6410438e62 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -15,8 +15,12 @@
 
 #include "hw/vfio/vfio-common.h"
 #include "hw/hw.h"
+#include "hw/ppc/spapr.h"
 #include "qemu/error-report.h"
 #include "trace.h"
+#ifdef CONFIG_KVM
+#include "linux/kvm.h"
+#endif
 
 static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
 {
@@ -188,6 +192,33 @@ int vfio_spapr_create_window(VFIOContainer *container,
     return 0;
 }
 
+int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd,
+                          IOMMUMemoryRegion *iommumr)
+{
+#ifdef CONFIG_KVM
+    struct kvm_vfio_spapr_tce param = {
+        .groupfd = groupfd,
+    };
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
+        .addr = (uint64_t)(unsigned long)&param,
+    };
+    sPAPRTCETable *tcet = container_of(iommumr, sPAPRTCETable, iommu);
+
+    param.tablefd = spapr_tce_get_fd(tcet);
+    if (param.tablefd != -1) {
+        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+            error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
+                         param.tablefd, param.groupfd, strerror(errno));
+            return -errno;
+        }
+    }
+    trace_vfio_spapr_notify_kvm(groupfd, param.tablefd);
+#endif
+    return 0;
+}
+
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space)
 {
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 2561c6d31a..084a92f7c2 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
 vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
 vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
 vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
+vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC PATCH qemu v3 4/4] spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device
  2017-04-01 12:37 [Qemu-devel] [RFC PATCH qemu v3 0/4] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN Alexey Kardashevskiy
@ 2017-04-01 12:37 ` Alexey Kardashevskiy
  2017-04-03  3:01   ` David Gibson
  3 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-01 12:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson,
	Paolo Bonzini

This uses new kernel KVM_CAP_SPAPR_TCE_VFIO capability to enable
in-kernel acceleration of TCE update requests which will go via
the VFIO KVM device.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target/ppc/kvm_ppc.h | 6 ++++++
 hw/ppc/spapr_iommu.c | 4 ++++
 target/ppc/kvm.c     | 7 ++++++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index f48243d13f..ce7327a4e0 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
 uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
+bool kvmppc_has_cap_spapr_vfio(void);
 #endif /* !CONFIG_USER_ONLY */
 bool kvmppc_has_cap_epr(void);
 int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
@@ -216,6 +217,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(char *obj_path)
     return true;
 }
 
+static inline bool kvmppc_has_cap_spapr_vfio(void)
+{
+    return false;
+}
+
 #endif /* !CONFIG_USER_ONLY */
 
 static inline bool kvmppc_has_cap_epr(void)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index f7531a6408..35c2516fcb 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
 
     tcet->need_vfio = need_vfio;
 
+    if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
+        return;
+    }
+
     oldtable = tcet->table;
 
     tcet->table = spapr_tce_alloc_table(tcet->liobn,
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 20f07ff320..76bd92c663 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -132,7 +132,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
     cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
     cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
-    cap_spapr_vfio = false;
+    cap_spapr_vfio = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO);
     cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
     cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
     cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
@@ -2422,6 +2422,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
     return cap_mmu_hash_v3;
 }
 
+bool kvmppc_has_cap_spapr_vfio(void)
+{
+    return cap_spapr_vfio;
+}
+
 static PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
 {
     ObjectClass *oc = OBJECT_CLASS(pcc);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion Alexey Kardashevskiy
@ 2017-04-03  2:26   ` David Gibson
  2017-04-03 12:53   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 12+ messages in thread
From: David Gibson @ 2017-04-03  2:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, Alex Williamson, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 35861 bytes --]

On Sat, Apr 01, 2017 at 11:37:38PM +1100, Alexey Kardashevskiy wrote:
> This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
> as a parent.
> 
> This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
> dymanic QOM casting in fast path (address_space_translate, etc),
> this adds an @is_iommu boolean flag to MR and provides new helper to
> do simple cast to IOMMU MR - memory_region_get_iommu. The flag
> is set in the instance init callback. This defines
> memory_region_is_iommu as memory_region_get_iommu()!=NULL.
> 
> This switches MemoryRegion to IOMMUMemoryRegion in most places except
> the ones where MemoryRegion may be an alias.
> 
> This defines memory_region_init_iommu_type() to allow creating
> IOMMUMemoryRegion subclasses. In order to support custom QOM type,
> this splits memory_region_init() to object_initialize() +
> memory_region_do_init.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v3:
> * added mr->is_iommu
> * updated i386/x86_64/s390/sun
> ---
>  hw/s390x/s390-pci-bus.h       |   2 +-
>  include/exec/memory.h         |  64 ++++++++++++++++++------
>  include/hw/i386/intel_iommu.h |   2 +-
>  include/hw/ppc/spapr.h        |   3 +-
>  include/hw/vfio/vfio-common.h |   2 +-
>  include/qemu/typedefs.h       |   1 +
>  exec.c                        |  16 +++---
>  hw/dma/sun4m_iommu.c          |   2 +-
>  hw/i386/amd_iommu.c           |  11 ++--
>  hw/i386/intel_iommu.c         |  14 +++---
>  hw/ppc/spapr_iommu.c          |  20 +++++---
>  hw/s390x/s390-pci-bus.c       |   6 +--
>  hw/s390x/s390-pci-inst.c      |   8 +--
>  hw/vfio/common.c              |  12 +++--
>  hw/vfio/spapr.c               |   3 +-
>  memory.c                      | 114 +++++++++++++++++++++++++++++-------------
>  16 files changed, 188 insertions(+), 92 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
> index dcbf4820c9..441ddbedcb 100644
> --- a/hw/s390x/s390-pci-bus.h
> +++ b/hw/s390x/s390-pci-bus.h
> @@ -267,7 +267,7 @@ typedef struct S390PCIIOMMU {
>      S390PCIBusDevice *pbdev;
>      AddressSpace as;
>      MemoryRegion mr;
> -    MemoryRegion iommu_mr;
> +    IOMMUMemoryRegion iommu_mr;
>      bool enabled;
>      uint64_t g_iota;
>      uint64_t pba;
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index e39256ad03..29d59f4f7f 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -37,6 +37,10 @@
>  #define MEMORY_REGION(obj) \
>          OBJECT_CHECK(MemoryRegion, (obj), TYPE_MEMORY_REGION)
>  
> +#define TYPE_IOMMU_MEMORY_REGION "qemu:iommu-memory-region"
> +#define IOMMU_MEMORY_REGION(obj) \
> +        OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_IOMMU_MEMORY_REGION)
> +
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>  
> @@ -167,11 +171,12 @@ typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
>  
>  struct MemoryRegionIOMMUOps {
>      /* Return a TLB entry that contains a given address. */
> -    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
> +    IOMMUTLBEntry (*translate)(IOMMUMemoryRegion *iommu, hwaddr addr,
> +                               bool is_write);
>      /* Returns minimum supported page size */
> -    uint64_t (*get_min_page_size)(MemoryRegion *iommu);
> +    uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
>      /* Called when IOMMU Notifier flag changed */
> -    void (*notify_flag_changed)(MemoryRegion *iommu,
> +    void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
>                                  IOMMUNotifierFlag old_flags,
>                                  IOMMUNotifierFlag new_flags);
>  };
> @@ -195,7 +200,6 @@ struct MemoryRegion {
>      uint8_t dirty_log_mask;
>      RAMBlock *ram_block;
>      Object *owner;
> -    const MemoryRegionIOMMUOps *iommu_ops;
>  
>      const MemoryRegionOps *ops;
>      void *opaque;
> @@ -218,6 +222,13 @@ struct MemoryRegion {
>      const char *name;
>      unsigned ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
> +    bool is_iommu;
> +};
> +
> +struct IOMMUMemoryRegion {
> +    MemoryRegion parent_obj;
> +
> +    const MemoryRegionIOMMUOps *iommu_ops;
>      QLIST_HEAD(, IOMMUNotifier) iommu_notify;
>      IOMMUNotifierFlag iommu_notify_flags;
>  };
> @@ -555,19 +566,39 @@ static inline void memory_region_init_reservation(MemoryRegion *mr,
>  }
>  
>  /**
> + * memory_region_init_iommu_type: Initialize a memory region of a custom type
> + * that translates addresses
> + *
> + * An IOMMU region translates addresses and forwards accesses to a target
> + * memory region.
> + *
> + * @typename: QOM class name
> + * @mr: the #IOMMUMemoryRegion to be initialized
> + * @owner: the object that tracks the region's reference count
> + * @ops: a function that translates addresses into the @target region
> + * @name: used for debugging; not visible to the user or ABI
> + * @size: size of the region.
> + */
> +void memory_region_init_iommu_type(const char *mrtypename,
> +                                   IOMMUMemoryRegion *iommumr,
> +                                   Object *owner,
> +                                   const MemoryRegionIOMMUOps *ops,
> +                                   const char *name,
> +                                   uint64_t size);
> +/**
>   * memory_region_init_iommu: Initialize a memory region that translates
>   * addresses
>   *
>   * An IOMMU region translates addresses and forwards accesses to a target
>   * memory region.
>   *
> - * @mr: the #MemoryRegion to be initialized
> + * @mr: the #IOMMUMemoryRegion to be initialized
>   * @owner: the object that tracks the region's reference count
>   * @ops: a function that translates addresses into the @target region
>   * @name: used for debugging; not visible to the user or ABI
>   * @size: size of the region.
>   */
> -void memory_region_init_iommu(MemoryRegion *mr,
> +void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
>                                struct Object *owner,
>                                const MemoryRegionIOMMUOps *ops,
>                                const char *name,
> @@ -622,20 +653,25 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>  }
>  
>  /**
> - * memory_region_is_iommu: check whether a memory region is an iommu
> + * memory_region_get_iommu: check whether a memory region is an iommu
>   *
> - * Returns %true is a memory region is an iommu.
> + * Returns pointer to IOMMUMemoryRegion if a memory region is an iommu,
> + * otherwise NULL.
>   *
>   * @mr: the memory region being queried
>   */
> -static inline bool memory_region_is_iommu(MemoryRegion *mr)
> +static inline IOMMUMemoryRegion *memory_region_get_iommu(MemoryRegion *mr)
>  {
>      if (mr->alias) {
> -        return memory_region_is_iommu(mr->alias);
> +        return memory_region_get_iommu(mr->alias);
>      }
> -    return mr->iommu_ops;
> +    if (mr->is_iommu) {
> +        return (IOMMUMemoryRegion *) mr;
> +    }
> +    return NULL;
>  }
>  
> +#define memory_region_is_iommu(mr) (memory_region_get_iommu(mr) != NULL)
>  
>  /**
>   * memory_region_iommu_get_min_page_size: get minimum supported page size
> @@ -645,7 +681,7 @@ static inline bool memory_region_is_iommu(MemoryRegion *mr)
>   *
>   * @mr: the memory region being queried
>   */
> -uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
> +uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *mr);
>  
>  /**
>   * memory_region_notify_iommu: notify a change in an IOMMU translation entry.
> @@ -664,7 +700,7 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
>   *         replaces all old entries for the same virtual I/O address range.
>   *         Deleted entries have .@perm == 0.
>   */
> -void memory_region_notify_iommu(MemoryRegion *mr,
> +void memory_region_notify_iommu(IOMMUMemoryRegion *mr,
>                                  IOMMUTLBEntry entry);
>  
>  /**
> @@ -689,7 +725,7 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
>   * @is_write: Whether to treat the replay as a translate "write"
>   *     through the iommu
>   */
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> +void memory_region_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n,
>                                  bool is_write);
>  
>  /**
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index fe645aa93a..34f1b61957 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -82,7 +82,7 @@ struct VTDAddressSpace {
>      PCIBus *bus;
>      uint8_t devfn;
>      AddressSpace as;
> -    MemoryRegion iommu;
> +    IOMMUMemoryRegion iommu;
>      MemoryRegion iommu_ir;      /* Interrupt region: 0xfeeXXXXX */
>      IntelIOMMUState *iommu_state;
>      VTDContextCacheEntry context_cache_entry;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index e27de64d31..6997ed7e98 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -590,7 +590,8 @@ struct sPAPRTCETable {
>      bool bypass;
>      bool need_vfio;
>      int fd;
> -    MemoryRegion root, iommu;
> +    MemoryRegion root;
> +    IOMMUMemoryRegion iommu;
>      struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
>      QLIST_ENTRY(sPAPRTCETable) list;
>  };
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index c582de18c9..7a4135ae6f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -94,7 +94,7 @@ typedef struct VFIOContainer {
>  
>  typedef struct VFIOGuestIOMMU {
>      VFIOContainer *container;
> -    MemoryRegion *iommu;
> +    IOMMUMemoryRegion *iommu;
>      hwaddr iommu_offset;
>      IOMMUNotifier n;
>      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index e95f28cfec..b45f71ec11 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -45,6 +45,7 @@ typedef struct MachineState MachineState;
>  typedef struct MemoryListener MemoryListener;
>  typedef struct MemoryMappingList MemoryMappingList;
>  typedef struct MemoryRegion MemoryRegion;
> +typedef struct IOMMUMemoryRegion IOMMUMemoryRegion;
>  typedef struct MemoryRegionCache MemoryRegionCache;
>  typedef struct MemoryRegionSection MemoryRegionSection;
>  typedef struct MigrationIncomingState MigrationIncomingState;
> diff --git a/exec.c b/exec.c
> index e57a8a2178..bbd8df7a9d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -462,20 +462,20 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>  {
>      IOMMUTLBEntry iotlb = {0};
>      MemoryRegionSection *section;
> -    MemoryRegion *mr;
> +    IOMMUMemoryRegion *iommumr;
>  
>      for (;;) {
>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>          section = address_space_lookup_region(d, addr, false);
>          addr = addr - section->offset_within_address_space
>                 + section->offset_within_region;
> -        mr = section->mr;
>  
> -        if (!mr->iommu_ops) {
> +        iommumr = memory_region_get_iommu(section->mr);
> +        if (!iommumr) {
>              break;
>          }
>  
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>          if (!(iotlb.perm & (1 << is_write))) {
>              iotlb.target_as = NULL;
>              break;
> @@ -497,17 +497,19 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
>      IOMMUTLBEntry iotlb;
>      MemoryRegionSection *section;
>      MemoryRegion *mr;
> +    IOMMUMemoryRegion *iommumr;
>  
>      for (;;) {
>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>          section = address_space_translate_internal(d, addr, &addr, plen, true);
>          mr = section->mr;
>  
> -        if (!mr->iommu_ops) {
> +        iommumr = memory_region_get_iommu(mr);
> +        if (!iommumr) {
>              break;
>          }
>  
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>                  | (addr & iotlb.addr_mask));
>          *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
> @@ -538,7 +540,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
>  
>      section = address_space_translate_internal(d, addr, xlat, plen, false);
>  
> -    assert(!section->mr->iommu_ops);
> +    assert(!memory_region_is_iommu(section->mr));
>      return section;
>  }
>  #endif
> diff --git a/hw/dma/sun4m_iommu.c b/hw/dma/sun4m_iommu.c
> index b3cbc54c23..539115b629 100644
> --- a/hw/dma/sun4m_iommu.c
> +++ b/hw/dma/sun4m_iommu.c
> @@ -134,7 +134,7 @@
>  typedef struct IOMMUState {
>      SysBusDevice parent_obj;
>  
> -    MemoryRegion iomem;
> +    IOMMUMemoryRegion iomem;
>      uint32_t regs[IOMMU_NREGS];
>      hwaddr iostart;
>      qemu_irq irq;
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index f86a40aa30..e76c29aec6 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -51,7 +51,7 @@ struct AMDVIAddressSpace {
>      uint8_t bus_num;            /* bus number                           */
>      uint8_t devfn;              /* device function                      */
>      AMDVIState *iommu_state;    /* AMDVI - one per machine              */
> -    MemoryRegion iommu;         /* Device's address translation region  */
> +    IOMMUMemoryRegion iommu;    /* Device's address translation region  */
>      MemoryRegion iommu_ir;      /* Device's interrupt remapping region  */
>      AddressSpace as;            /* device's corresponding address space */
>  };
> @@ -986,7 +986,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>      return addr >= AMDVI_INT_ADDR_FIRST && addr <= AMDVI_INT_ADDR_LAST;
>  }
>  
> -static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
> +static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>                                       bool is_write)
>  {
>      AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
> @@ -1045,8 +1045,9 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>  
>          memory_region_init_iommu(&iommu_as[devfn]->iommu, OBJECT(s),
>                                   &s->iommu_ops, "amd-iommu", UINT64_MAX);
> -        address_space_init(&iommu_as[devfn]->as, &iommu_as[devfn]->iommu,
> -                           "amd-iommu");
> +	address_space_init(&iommu_as[devfn]->as,
> +			   MEMORY_REGION(&iommu_as[devfn]->iommu),
> +			   "amd-iommu");
>      }
>      return &iommu_as[devfn]->as;
>  }
> @@ -1066,7 +1067,7 @@ static const MemoryRegionOps mmio_mem_ops = {
>      }
>  };
>  
> -static void amdvi_iommu_notify_flag_changed(MemoryRegion *iommu,
> +static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                              IOMMUNotifierFlag old,
>                                              IOMMUNotifierFlag new)
>  {
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 22d8226e43..233fa75b64 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1457,7 +1457,8 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>      entry.iova = addr;
>      entry.perm = IOMMU_NONE;
>      entry.translated_addr = 0;
> -    memory_region_notify_iommu(entry.target_as->root, entry);
> +    memory_region_notify_iommu(memory_region_get_iommu(entry.target_as->root),
> +			       entry);
>  
>  done:
>      return true;
> @@ -1968,7 +1969,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>      }
>  }
>  
> -static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
> +static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>                                           bool is_write)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> @@ -2000,7 +2001,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>      return ret;
>  }
>  
> -static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
> +static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                            IOMMUNotifierFlag old,
>                                            IOMMUNotifierFlag new)
>  {
> @@ -2394,10 +2395,11 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>          memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
>                                &vtd_mem_ir_ops, s, "intel_iommu_ir",
>                                VTD_INTERRUPT_ADDR_SIZE);
> -        memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
> -                                    &vtd_dev_as->iommu_ir);
> +	memory_region_add_subregion(MEMORY_REGION(&vtd_dev_as->iommu),
> +				    VTD_INTERRUPT_ADDR_FIRST,
> +				    &vtd_dev_as->iommu_ir);
>          address_space_init(&vtd_dev_as->as,
> -                           &vtd_dev_as->iommu, name);
> +                           MEMORY_REGION(&vtd_dev_as->iommu), name);
>      }
>      return vtd_dev_as;
>  }
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 9e30e148d6..5051110b9d 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -110,7 +110,8 @@ static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
>  }
>  
>  /* Called from RCU critical section */
> -static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> +static IOMMUTLBEntry spapr_tce_translate_iommu(IOMMUMemoryRegion *iommu,
> +                                               hwaddr addr,
>                                                 bool is_write)
>  {
>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> @@ -150,14 +151,14 @@ static void spapr_tce_table_pre_save(void *opaque)
>                                 tcet->bus_offset, tcet->page_shift);
>  }
>  
> -static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
> +static uint64_t spapr_tce_get_min_page_size(IOMMUMemoryRegion *iommu)
>  {
>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>  
>      return 1ULL << tcet->page_shift;
>  }
>  
> -static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
> +static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                            IOMMUNotifierFlag old,
>                                            IOMMUNotifierFlag new)
>  {
> @@ -265,7 +266,9 @@ static int spapr_tce_table_realize(DeviceState *dev)
>      memory_region_init(&tcet->root, tcetobj, tmp, UINT64_MAX);
>  
>      snprintf(tmp, sizeof(tmp), "tce-iommu-%x", tcet->liobn);
> -    memory_region_init_iommu(&tcet->iommu, tcetobj, &spapr_iommu_ops, tmp, 0);
> +    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION,
> +                                  &tcet->iommu, tcetobj, &spapr_iommu_ops,
> +                                  tmp, 0);
>  
>      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>  
> @@ -341,9 +344,10 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                                          &tcet->fd,
>                                          tcet->need_vfio);
>  
> -    memory_region_set_size(&tcet->iommu,
> +    memory_region_set_size(MEMORY_REGION(&tcet->iommu),
>                             (uint64_t)tcet->nb_table << tcet->page_shift);
> -    memory_region_add_subregion(&tcet->root, tcet->bus_offset, &tcet->iommu);
> +    memory_region_add_subregion(&tcet->root, tcet->bus_offset,
> +                                MEMORY_REGION(&tcet->iommu));
>  }
>  
>  void spapr_tce_table_disable(sPAPRTCETable *tcet)
> @@ -352,8 +356,8 @@ void spapr_tce_table_disable(sPAPRTCETable *tcet)
>          return;
>      }
>  
> -    memory_region_del_subregion(&tcet->root, &tcet->iommu);
> -    memory_region_set_size(&tcet->iommu, 0);
> +    memory_region_del_subregion(&tcet->root, MEMORY_REGION(&tcet->iommu));
> +    memory_region_set_size(MEMORY_REGION(&tcet->iommu), 0);
>  
>      spapr_tce_free_table(tcet->table, tcet->fd, tcet->nb_table);
>      tcet->fd = -1;
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 69b0291e8a..27e7336a76 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -354,7 +354,7 @@ out:
>      return pte;
>  }
>  
> -static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *mr, hwaddr addr,
> +static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
>                                            bool is_write)
>  {
>      uint64_t pte;
> @@ -523,14 +523,14 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
>      memory_region_init_iommu(&iommu->iommu_mr, OBJECT(&iommu->mr),
>                               &s390_iommu_ops, name, iommu->pal + 1);
>      iommu->enabled = true;
> -    memory_region_add_subregion(&iommu->mr, 0, &iommu->iommu_mr);
> +    memory_region_add_subregion(&iommu->mr, 0, MEMORY_REGION(&iommu->iommu_mr));
>      g_free(name);
>  }
>  
>  void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
>  {
>      iommu->enabled = false;
> -    memory_region_del_subregion(&iommu->mr, &iommu->iommu_mr);
> +    memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr));
>      object_unparent(OBJECT(&iommu->iommu_mr));
>  }
>  
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index d2a8c0a083..b4fe4da798 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -561,7 +561,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>      S390PCIIOMMU *iommu;
>      hwaddr start, end;
>      IOMMUTLBEntry entry;
> -    MemoryRegion *mr;
> +    IOMMUMemoryRegion *iommumr;
>  
>      cpu_synchronize_state(CPU(cpu));
>  
> @@ -620,9 +620,9 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>          goto out;
>      }
>  
> -    mr = &iommu->iommu_mr;
> +    iommumr = &iommu->iommu_mr;
>      while (start < end) {
> -        entry = mr->iommu_ops->translate(mr, start, 0);
> +        entry = iommumr->iommu_ops->translate(iommumr, start, 0);
>  
>          if (!entry.translated_addr) {
>              pbdev->state = ZPCI_FS_ERROR;
> @@ -633,7 +633,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>              goto out;
>          }
>  
> -        memory_region_notify_iommu(mr, entry);
> +        memory_region_notify_iommu(iommumr, entry);
>          start += entry.addr_mask + 1;
>      }
>  
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f3ba9b9007..ab95db689c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -465,6 +465,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>  
>      if (memory_region_is_iommu(section->mr)) {
>          VFIOGuestIOMMU *giommu;
> +        IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
>  
>          trace_vfio_listener_region_add_iommu(iova, end);
>          /*
> @@ -474,7 +475,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           * device emulation the VFIO iommu handles to use).
>           */
>          giommu = g_malloc0(sizeof(*giommu));
> -        giommu->iommu = section->mr;
> +        giommu->iommu = iommumr;
>          giommu->iommu_offset = section->offset_within_address_space -
>                                 section->offset_within_region;
>          giommu->container = container;
> @@ -482,7 +483,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +        memory_region_register_iommu_notifier(section->mr, &giommu->n);
>          memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
>  
>          return;
> @@ -550,8 +551,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
>          VFIOGuestIOMMU *giommu;
>  
>          QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> -            if (giommu->iommu == section->mr) {
> -                memory_region_unregister_iommu_notifier(giommu->iommu,
> +            if (MEMORY_REGION(giommu->iommu) == section->mr) {
> +                memory_region_unregister_iommu_notifier(section->mr,
>                                                          &giommu->n);
>                  QLIST_REMOVE(giommu, giommu_next);
>                  g_free(giommu);
> @@ -1141,7 +1142,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
>          QLIST_REMOVE(container, next);
>  
>          QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> -            memory_region_unregister_iommu_notifier(giommu->iommu, &giommu->n);
> +            memory_region_unregister_iommu_notifier(
> +                    MEMORY_REGION(giommu->iommu), &giommu->n);
>              QLIST_REMOVE(giommu, giommu_next);
>              g_free(giommu);
>          }
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4409bcc0d7..551870d46b 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -143,7 +143,8 @@ int vfio_spapr_create_window(VFIOContainer *container,
>                               hwaddr *pgsize)
>  {
>      int ret;
> -    unsigned pagesize = memory_region_iommu_get_min_page_size(section->mr);
> +    IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
> +    unsigned pagesize = memory_region_iommu_get_min_page_size(iommumr);
>      unsigned entries, pages;
>      struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
>  
> diff --git a/memory.c b/memory.c
> index 4c95aaf39c..62796536dc 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -975,12 +975,11 @@ static char *memory_region_escape_name(const char *name)
>      return escaped;
>  }
>  
> -void memory_region_init(MemoryRegion *mr,
> -                        Object *owner,
> -                        const char *name,
> -                        uint64_t size)
> +static void memory_region_do_init(MemoryRegion *mr,
> +                                  Object *owner,
> +                                  const char *name,
> +                                  uint64_t size)
>  {
> -    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
>      mr->size = int128_make64(size);
>      if (size == UINT64_MAX) {
>          mr->size = int128_2_64();
> @@ -1004,6 +1003,15 @@ void memory_region_init(MemoryRegion *mr,
>      }
>  }
>  
> +void memory_region_init(MemoryRegion *mr,
> +                        Object *owner,
> +                        const char *name,
> +                        uint64_t size)
> +{
> +    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
> +    memory_region_do_init(mr, owner, name, size);
> +}
> +
>  static void memory_region_get_addr(Object *obj, Visitor *v, const char *name,
>                                     void *opaque, Error **errp)
>  {
> @@ -1090,6 +1098,13 @@ static void memory_region_initfn(Object *obj)
>                          NULL, NULL, &error_abort);
>  }
>  
> +static void iommu_memory_region_initfn(Object *obj)
> +{
> +    MemoryRegion *mr = MEMORY_REGION(obj);
> +
> +    mr->is_iommu = true;
> +}
> +
>  static uint64_t unassigned_mem_read(void *opaque, hwaddr addr,
>                                      unsigned size)
>  {
> @@ -1473,17 +1488,33 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>      mr->ram_block = qemu_ram_alloc(size, mr, errp);
>  }
>  
> -void memory_region_init_iommu(MemoryRegion *mr,
> -                              Object *owner,
> -                              const MemoryRegionIOMMUOps *ops,
> -                              const char *name,
> -                              uint64_t size)
> +void memory_region_init_iommu_type(const char *mrtypename,
> +                                   IOMMUMemoryRegion *iommumr,
> +                                   Object *owner,
> +                                   const MemoryRegionIOMMUOps *ops,
> +                                   const char *name,
> +                                   uint64_t size)
>  {
> -    memory_region_init(mr, owner, name, size);
> -    mr->iommu_ops = ops,
> +    struct MemoryRegion *mr;
> +    size_t instance_size = object_type_get_instance_size(mrtypename);
> +
> +    object_initialize(iommumr, instance_size, mrtypename);
> +    mr = MEMORY_REGION(iommumr);
> +    memory_region_do_init(mr, owner, name, size);
> +    iommumr->iommu_ops = ops,
>      mr->terminates = true;  /* then re-forwards */
> -    QLIST_INIT(&mr->iommu_notify);
> -    mr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
> +    QLIST_INIT(&iommumr->iommu_notify);
> +    iommumr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
> +}
> +
> +void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
> +                              Object *owner,
> +                              const MemoryRegionIOMMUOps *ops,
> +                              const char *name,
> +                              uint64_t size)
> +{
> +    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION, iommumr,
> +                                  owner, ops, name, size);
>  }
>  
>  static void memory_region_finalize(Object *obj)
> @@ -1578,57 +1609,61 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
>      return memory_region_get_dirty_log_mask(mr) & (1 << client);
>  }
>  
> -static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> +static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommumr)
>  {
>      IOMMUNotifierFlag flags = IOMMU_NOTIFIER_NONE;
>      IOMMUNotifier *iommu_notifier;
>  
> -    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
>          flags |= iommu_notifier->notifier_flags;
>      }
>  
> -    if (flags != mr->iommu_notify_flags &&
> -        mr->iommu_ops->notify_flag_changed) {
> -        mr->iommu_ops->notify_flag_changed(mr, mr->iommu_notify_flags,
> -                                           flags);
> +    if (flags != iommumr->iommu_notify_flags &&
> +        iommumr->iommu_ops->notify_flag_changed) {
> +        iommumr->iommu_ops->notify_flag_changed(iommumr,
> +                                                iommumr->iommu_notify_flags,
> +                                                flags);
>      }
>  
> -    mr->iommu_notify_flags = flags;
> +    iommumr->iommu_notify_flags = flags;
>  }
>  
>  void memory_region_register_iommu_notifier(MemoryRegion *mr,
>                                             IOMMUNotifier *n)
>  {
> +    IOMMUMemoryRegion *iommumr;
> +
>      if (mr->alias) {
>          memory_region_register_iommu_notifier(mr->alias, n);
>          return;
>      }
>  
>      /* We need to register for at least one bitfield */
> +    iommumr = IOMMU_MEMORY_REGION(mr);
>      assert(n->notifier_flags != IOMMU_NOTIFIER_NONE);
> -    QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
> -    memory_region_update_iommu_notify_flags(mr);
> +    QLIST_INSERT_HEAD(&iommumr->iommu_notify, n, node);
> +    memory_region_update_iommu_notify_flags(iommumr);
>  }
>  
> -uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
> +uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommumr)
>  {
> -    assert(memory_region_is_iommu(mr));
> -    if (mr->iommu_ops && mr->iommu_ops->get_min_page_size) {
> -        return mr->iommu_ops->get_min_page_size(mr);
> +    if (iommumr->iommu_ops && iommumr->iommu_ops->get_min_page_size) {
> +        return iommumr->iommu_ops->get_min_page_size(iommumr);
>      }
>      return TARGET_PAGE_SIZE;
>  }
>  
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> +void memory_region_iommu_replay(IOMMUMemoryRegion *iommumr, IOMMUNotifier *n,
>                                  bool is_write)
>  {
> +    MemoryRegion *mr = MEMORY_REGION(iommumr);
>      hwaddr addr, granularity;
>      IOMMUTLBEntry iotlb;
>  
> -    granularity = memory_region_iommu_get_min_page_size(mr);
> +    granularity = memory_region_iommu_get_min_page_size(iommumr);
>  
>      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>          if (iotlb.perm != IOMMU_NONE) {
>              n->notify(n, &iotlb);
>          }
> @@ -1644,21 +1679,24 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>  void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>                                               IOMMUNotifier *n)
>  {
> +    IOMMUMemoryRegion *iommumr;
> +
>      if (mr->alias) {
>          memory_region_unregister_iommu_notifier(mr->alias, n);
>          return;
>      }
>      QLIST_REMOVE(n, node);
> -    memory_region_update_iommu_notify_flags(mr);
> +    iommumr = IOMMU_MEMORY_REGION(mr);
> +    memory_region_update_iommu_notify_flags(iommumr);
>  }
>  
> -void memory_region_notify_iommu(MemoryRegion *mr,
> +void memory_region_notify_iommu(IOMMUMemoryRegion *iommumr,
>                                  IOMMUTLBEntry entry)
>  {
>      IOMMUNotifier *iommu_notifier;
>      IOMMUNotifierFlag request_flags;
>  
> -    assert(memory_region_is_iommu(mr));
> +    assert(memory_region_is_iommu(MEMORY_REGION(iommumr)));
>  
>      if (entry.perm & IOMMU_RW) {
>          request_flags = IOMMU_NOTIFIER_MAP;
> @@ -1666,7 +1704,7 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>          request_flags = IOMMU_NOTIFIER_UNMAP;
>      }
>  
> -    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
>          if (iommu_notifier->notifier_flags & request_flags) {
>              iommu_notifier->notify(iommu_notifier, &entry);
>          }
> @@ -2660,9 +2698,17 @@ static const TypeInfo memory_region_info = {
>      .instance_finalize  = memory_region_finalize,
>  };
>  
> +static const TypeInfo iommu_memory_region_info = {
> +    .parent             = TYPE_MEMORY_REGION,
> +    .name               = TYPE_IOMMU_MEMORY_REGION,
> +    .instance_size      = sizeof(IOMMUMemoryRegion),
> +    .instance_init      = iommu_memory_region_initfn,
> +};
> +
>  static void memory_register_types(void)
>  {
>      type_register_static(&memory_region_info);
> +    type_register_static(&iommu_memory_region_info);
>  }
>  
>  type_init(memory_register_types)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 2/4] vfio-pci: Reorder group-to-container attaching
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 2/4] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
@ 2017-04-03  2:27   ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2017-04-03  2:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, Alex Williamson, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3655 bytes --]

On Sat, Apr 01, 2017 at 11:37:39PM +1100, Alexey Kardashevskiy wrote:
> At the moment VFIO PCI device initialization works as follows:
> vfio_realize
> 	vfio_get_group
> 		vfio_connect_container
> 			register memory listeners (1)
> 			update QEMU groups lists
> 		vfio_kvm_device_add_group
> 
> Then (example for pseries) the machine reset hook triggers region_add()
> for all regions where listeners from (1) are listening:
> 
> ppc_spapr_reset
> 	spapr_phb_reset
> 		spapr_tce_table_enable
> 			memory_region_add_subregion
> 				vfio_listener_region_add
> 					vfio_spapr_create_window
> 
> This scheme works fine until we need to handle VFIO PCI device hotplug
> _and_ we want to enable in-kernel acceleration on, i.e. after PCI hotplug
> we need a place to call
> ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
> Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
> (from VFIOGroup), vfio_listener_region_add() seems to be the only place
> for this ioctl().
> 
> However this only works during boot time because the machine reset
> happens strictly after all devices are finalized. When hotplug happens,
> vfio_listener_region_add() is called when a memory listener is registered
> but when this happens:
> 1. new group is not added to the container->group_list yet;
> 2. VFIO KVM device is unaware of the new IOMMU group.
> 
> This moves bits around to have all necessary VFIO infrastructure
> in place for both initial startup and hotplug cases.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v2:
> * moved container->initialized back to its correct location
> * added missing QLIST_REMOVE()
> ---
>  hw/vfio/common.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ab95db689c..e8188eb3d5 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1087,6 +1087,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          goto free_container_exit;
>      }
>  
> +    vfio_kvm_device_add_group(group);
> +
> +    QLIST_INIT(&container->group_list);
> +    QLIST_INSERT_HEAD(&space->containers, container, next);
> +
> +    group->container = container;
> +    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +
>      container->listener = vfio_memory_listener;
>  
>      memory_listener_register(&container->listener, container->space->as);
> @@ -1100,14 +1108,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>  
>      container->initialized = true;
>  
> -    QLIST_INIT(&container->group_list);
> -    QLIST_INSERT_HEAD(&space->containers, container, next);
> -
> -    group->container = container;
> -    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> -
>      return 0;
>  listener_release_exit:
> +    QLIST_REMOVE(group, container_next);
> +    QLIST_REMOVE(container, next);
> +    vfio_kvm_device_del_group(group);
>      vfio_listener_release(container);
>  
>  free_container_exit:
> @@ -1212,8 +1217,6 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
>  
>      QLIST_INSERT_HEAD(&vfio_group_list, group, next);
>  
> -    vfio_kvm_device_add_group(group);
> -
>      return group;
>  
>  close_fd_exit:

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN Alexey Kardashevskiy
@ 2017-04-03  3:01   ` David Gibson
  2017-05-01 12:21     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2017-04-03  3:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, Alex Williamson, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6549 bytes --]

On Sat, Apr 01, 2017 at 11:37:40PM +1100, Alexey Kardashevskiy wrote:
> This implements a notification for a new IOMMU group attached to
> sPAPR's logical IO bus (LIOBN) to enable in-kernel TCE acceleration.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/hw/ppc/spapr.h        |  1 +
>  include/hw/vfio/vfio-common.h |  2 ++
>  hw/ppc/spapr_iommu.c          |  5 +++++
>  hw/vfio/common.c              | 10 ++++++++++
>  hw/vfio/spapr.c               | 31 +++++++++++++++++++++++++++++++
>  hw/vfio/trace-events          |  1 +
>  6 files changed, 50 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 6997ed7e98..8a1b32f89a 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -617,6 +617,7 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                              uint32_t page_shift, uint64_t bus_offset,
>                              uint32_t nb_table);
>  void spapr_tce_table_disable(sPAPRTCETable *tcet);
> +int spapr_tce_get_fd(sPAPRTCETable *tcet);
>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio);
>  
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7a4135ae6f..b99f4af96e 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -175,6 +175,8 @@ extern const MemoryListener vfio_prereg_listener;
>  int vfio_spapr_create_window(VFIOContainer *container,
>                               MemoryRegionSection *section,
>                               hwaddr *pgsize);
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd,
> +                          IOMMUMemoryRegion *iommumr);
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space);
>  
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 5051110b9d..f7531a6408 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -171,6 +171,11 @@ static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
>      }
>  }
>  
> +int spapr_tce_get_fd(sPAPRTCETable *tcet)
> +{
> +    return tcet->fd;
> +}
> +

I don't think this actually abstracts anything worthwhile.  The caller
needs the sPAPRTCETable definition anyway to use container_of(), so it
might as well just grab the field directly.

>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>  {
>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e8188eb3d5..b94b29be15 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -440,6 +440,16 @@ static void vfio_listener_region_add(MemoryListener *listener,
>              goto fail;
>          }
>  
> +#ifdef CONFIG_KVM
> +        if (kvm_enabled()) {
> +            VFIOGroup *group;
> +
> +            QLIST_FOREACH(group, &container->group_list, container_next) {
> +                vfio_spapr_notify_kvm(vfio_kvm_device_fd, group->fd,
> +                                      IOMMU_MEMORY_REGION(section->mr));
> +            }
> +        }
> +#endif
>          vfio_host_win_add(container, section->offset_within_address_space,
>                            section->offset_within_address_space +
>                            int128_get64(section->size) - 1, pgsize);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 551870d46b..6410438e62 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -15,8 +15,12 @@
>  
>  #include "hw/vfio/vfio-common.h"
>  #include "hw/hw.h"
> +#include "hw/ppc/spapr.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> +#ifdef CONFIG_KVM
> +#include "linux/kvm.h"
> +#endif
>  
>  static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
>  {
> @@ -188,6 +192,33 @@ int vfio_spapr_create_window(VFIOContainer *container,
>      return 0;
>  }
>  
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd,
> +                          IOMMUMemoryRegion *iommumr)
> +{
> +#ifdef CONFIG_KVM
> +    struct kvm_vfio_spapr_tce param = {
> +        .groupfd = groupfd,
> +    };
> +    struct kvm_device_attr attr = {
> +        .group = KVM_DEV_VFIO_GROUP,
> +        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
> +        .addr = (uint64_t)(unsigned long)&param,
> +    };
> +    sPAPRTCETable *tcet = container_of(iommumr, sPAPRTCETable, iommu);

This isn't safe.  The caller has verified that the host backend IOMMU
is sPAPR TCE, but you haven't verified that the *guest* IOMMU is TCE
based.  I suspect other details would prevent a TCG x86 machine with
VT-d running on a Power host from getting this far, but it's not good
to rely on that.

So, you need to explicitly verify that the guest IOMMU region really
is a PAPR TCE region.  The obvious way would be to continue your
QOMification and make sPAPRTCETable a subtype of IOMMUMemoryRegion,
rather than just including it by composition.

> +
> +    param.tablefd = spapr_tce_get_fd(tcet);
> +    if (param.tablefd != -1) {
> +        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> +            error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
> +                         param.tablefd, param.groupfd, strerror(errno));
> +            return -errno;
> +        }
> +    }
> +    trace_vfio_spapr_notify_kvm(groupfd, param.tablefd);
> +#endif
> +    return 0;
> +}
> +
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space)
>  {
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 2561c6d31a..084a92f7c2 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
>  vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>  vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
>  vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
> +vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 4/4] spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device Alexey Kardashevskiy
@ 2017-04-03  3:01   ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2017-04-03  3:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, Alex Williamson, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3205 bytes --]

On Sat, Apr 01, 2017 at 11:37:41PM +1100, Alexey Kardashevskiy wrote:
> This uses new kernel KVM_CAP_SPAPR_TCE_VFIO capability to enable
> in-kernel acceleration of TCE update requests which will go via
> the VFIO KVM device.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/kvm_ppc.h | 6 ++++++
>  hw/ppc/spapr_iommu.c | 4 ++++
>  target/ppc/kvm.c     | 7 ++++++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index f48243d13f..ce7327a4e0 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
>  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
> @@ -216,6 +217,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(char *obj_path)
>      return true;
>  }
>  
> +static inline bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +    return false;
> +}
> +
>  #endif /* !CONFIG_USER_ONLY */
>  
>  static inline bool kvmppc_has_cap_epr(void)
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index f7531a6408..35c2516fcb 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>  
>      tcet->need_vfio = need_vfio;
>  
> +    if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
> +        return;
> +    }
> +
>      oldtable = tcet->table;
>  
>      tcet->table = spapr_tce_alloc_table(tcet->liobn,
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 20f07ff320..76bd92c663 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -132,7 +132,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
>      cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
>      cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
> -    cap_spapr_vfio = false;
> +    cap_spapr_vfio = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO);
>      cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
>      cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
>      cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
> @@ -2422,6 +2422,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +    return cap_spapr_vfio;
> +}
> +
>  static PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
>  {
>      ObjectClass *oc = OBJECT_CLASS(pcc);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion
  2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion Alexey Kardashevskiy
  2017-04-03  2:26   ` David Gibson
@ 2017-04-03 12:53   ` Philippe Mathieu-Daudé
  2017-04-11  8:35     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-04-03 12:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Paolo Bonzini, Alex Williamson, qemu-ppc, David Gibson

On 04/01/2017 09:37 AM, Alexey Kardashevskiy wrote:
> This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
> as a parent.
>
> This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
> dymanic QOM casting in fast path (address_space_translate, etc),
> this adds an @is_iommu boolean flag to MR and provides new helper to
> do simple cast to IOMMU MR - memory_region_get_iommu. The flag
> is set in the instance init callback. This defines
> memory_region_is_iommu as memory_region_get_iommu()!=NULL.
>
> This switches MemoryRegion to IOMMUMemoryRegion in most places except
> the ones where MemoryRegion may be an alias.
>
> This defines memory_region_init_iommu_type() to allow creating
> IOMMUMemoryRegion subclasses. In order to support custom QOM type,
> this splits memory_region_init() to object_initialize() +
> memory_region_do_init.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
> Changes:
> v3:
> * added mr->is_iommu
> * updated i386/x86_64/s390/sun
> ---
>  hw/s390x/s390-pci-bus.h       |   2 +-
>  include/exec/memory.h         |  64 ++++++++++++++++++------
>  include/hw/i386/intel_iommu.h |   2 +-
>  include/hw/ppc/spapr.h        |   3 +-
>  include/hw/vfio/vfio-common.h |   2 +-
>  include/qemu/typedefs.h       |   1 +
>  exec.c                        |  16 +++---
>  hw/dma/sun4m_iommu.c          |   2 +-
>  hw/i386/amd_iommu.c           |  11 ++--
>  hw/i386/intel_iommu.c         |  14 +++---
>  hw/ppc/spapr_iommu.c          |  20 +++++---
>  hw/s390x/s390-pci-bus.c       |   6 +--
>  hw/s390x/s390-pci-inst.c      |   8 +--
>  hw/vfio/common.c              |  12 +++--
>  hw/vfio/spapr.c               |   3 +-
>  memory.c                      | 114 +++++++++++++++++++++++++++++-------------
>  16 files changed, 188 insertions(+), 92 deletions(-)
>
> diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
> index dcbf4820c9..441ddbedcb 100644
> --- a/hw/s390x/s390-pci-bus.h
> +++ b/hw/s390x/s390-pci-bus.h
> @@ -267,7 +267,7 @@ typedef struct S390PCIIOMMU {
>      S390PCIBusDevice *pbdev;
>      AddressSpace as;
>      MemoryRegion mr;
> -    MemoryRegion iommu_mr;
> +    IOMMUMemoryRegion iommu_mr;
>      bool enabled;
>      uint64_t g_iota;
>      uint64_t pba;
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index e39256ad03..29d59f4f7f 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -37,6 +37,10 @@
>  #define MEMORY_REGION(obj) \
>          OBJECT_CHECK(MemoryRegion, (obj), TYPE_MEMORY_REGION)
>
> +#define TYPE_IOMMU_MEMORY_REGION "qemu:iommu-memory-region"
> +#define IOMMU_MEMORY_REGION(obj) \
> +        OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_IOMMU_MEMORY_REGION)
> +
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>
> @@ -167,11 +171,12 @@ typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
>
>  struct MemoryRegionIOMMUOps {
>      /* Return a TLB entry that contains a given address. */
> -    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write);
> +    IOMMUTLBEntry (*translate)(IOMMUMemoryRegion *iommu, hwaddr addr,
> +                               bool is_write);
>      /* Returns minimum supported page size */
> -    uint64_t (*get_min_page_size)(MemoryRegion *iommu);
> +    uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
>      /* Called when IOMMU Notifier flag changed */
> -    void (*notify_flag_changed)(MemoryRegion *iommu,
> +    void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
>                                  IOMMUNotifierFlag old_flags,
>                                  IOMMUNotifierFlag new_flags);
>  };
> @@ -195,7 +200,6 @@ struct MemoryRegion {
>      uint8_t dirty_log_mask;
>      RAMBlock *ram_block;
>      Object *owner;
> -    const MemoryRegionIOMMUOps *iommu_ops;
>
>      const MemoryRegionOps *ops;
>      void *opaque;
> @@ -218,6 +222,13 @@ struct MemoryRegion {
>      const char *name;
>      unsigned ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
> +    bool is_iommu;
> +};
> +
> +struct IOMMUMemoryRegion {
> +    MemoryRegion parent_obj;
> +
> +    const MemoryRegionIOMMUOps *iommu_ops;
>      QLIST_HEAD(, IOMMUNotifier) iommu_notify;
>      IOMMUNotifierFlag iommu_notify_flags;
>  };
> @@ -555,19 +566,39 @@ static inline void memory_region_init_reservation(MemoryRegion *mr,
>  }
>
>  /**
> + * memory_region_init_iommu_type: Initialize a memory region of a custom type
> + * that translates addresses
> + *
> + * An IOMMU region translates addresses and forwards accesses to a target
> + * memory region.
> + *
> + * @typename: QOM class name
> + * @mr: the #IOMMUMemoryRegion to be initialized
> + * @owner: the object that tracks the region's reference count
> + * @ops: a function that translates addresses into the @target region
> + * @name: used for debugging; not visible to the user or ABI
> + * @size: size of the region.
> + */
> +void memory_region_init_iommu_type(const char *mrtypename,
> +                                   IOMMUMemoryRegion *iommumr,
> +                                   Object *owner,
> +                                   const MemoryRegionIOMMUOps *ops,
> +                                   const char *name,
> +                                   uint64_t size);
> +/**
>   * memory_region_init_iommu: Initialize a memory region that translates
>   * addresses
>   *
>   * An IOMMU region translates addresses and forwards accesses to a target
>   * memory region.
>   *
> - * @mr: the #MemoryRegion to be initialized
> + * @mr: the #IOMMUMemoryRegion to be initialized
>   * @owner: the object that tracks the region's reference count
>   * @ops: a function that translates addresses into the @target region
>   * @name: used for debugging; not visible to the user or ABI
>   * @size: size of the region.
>   */
> -void memory_region_init_iommu(MemoryRegion *mr,
> +void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
>                                struct Object *owner,
>                                const MemoryRegionIOMMUOps *ops,
>                                const char *name,
> @@ -622,20 +653,25 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>  }
>
>  /**
> - * memory_region_is_iommu: check whether a memory region is an iommu
> + * memory_region_get_iommu: check whether a memory region is an iommu
>   *
> - * Returns %true is a memory region is an iommu.
> + * Returns pointer to IOMMUMemoryRegion if a memory region is an iommu,
> + * otherwise NULL.
>   *
>   * @mr: the memory region being queried
>   */
> -static inline bool memory_region_is_iommu(MemoryRegion *mr)
> +static inline IOMMUMemoryRegion *memory_region_get_iommu(MemoryRegion *mr)
>  {
>      if (mr->alias) {
> -        return memory_region_is_iommu(mr->alias);
> +        return memory_region_get_iommu(mr->alias);
>      }
> -    return mr->iommu_ops;
> +    if (mr->is_iommu) {
> +        return (IOMMUMemoryRegion *) mr;
> +    }
> +    return NULL;
>  }
>
> +#define memory_region_is_iommu(mr) (memory_region_get_iommu(mr) != NULL)
>
>  /**
>   * memory_region_iommu_get_min_page_size: get minimum supported page size
> @@ -645,7 +681,7 @@ static inline bool memory_region_is_iommu(MemoryRegion *mr)
>   *
>   * @mr: the memory region being queried
>   */
> -uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
> +uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *mr);
>
>  /**
>   * memory_region_notify_iommu: notify a change in an IOMMU translation entry.
> @@ -664,7 +700,7 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
>   *         replaces all old entries for the same virtual I/O address range.
>   *         Deleted entries have .@perm == 0.
>   */
> -void memory_region_notify_iommu(MemoryRegion *mr,
> +void memory_region_notify_iommu(IOMMUMemoryRegion *mr,
>                                  IOMMUTLBEntry entry);
>
>  /**
> @@ -689,7 +725,7 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr,
>   * @is_write: Whether to treat the replay as a translate "write"
>   *     through the iommu
>   */
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> +void memory_region_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n,
>                                  bool is_write);
>
>  /**
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index fe645aa93a..34f1b61957 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -82,7 +82,7 @@ struct VTDAddressSpace {
>      PCIBus *bus;
>      uint8_t devfn;
>      AddressSpace as;
> -    MemoryRegion iommu;
> +    IOMMUMemoryRegion iommu;
>      MemoryRegion iommu_ir;      /* Interrupt region: 0xfeeXXXXX */
>      IntelIOMMUState *iommu_state;
>      VTDContextCacheEntry context_cache_entry;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index e27de64d31..6997ed7e98 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -590,7 +590,8 @@ struct sPAPRTCETable {
>      bool bypass;
>      bool need_vfio;
>      int fd;
> -    MemoryRegion root, iommu;
> +    MemoryRegion root;
> +    IOMMUMemoryRegion iommu;
>      struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
>      QLIST_ENTRY(sPAPRTCETable) list;
>  };
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index c582de18c9..7a4135ae6f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -94,7 +94,7 @@ typedef struct VFIOContainer {
>
>  typedef struct VFIOGuestIOMMU {
>      VFIOContainer *container;
> -    MemoryRegion *iommu;
> +    IOMMUMemoryRegion *iommu;
>      hwaddr iommu_offset;
>      IOMMUNotifier n;
>      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index e95f28cfec..b45f71ec11 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -45,6 +45,7 @@ typedef struct MachineState MachineState;
>  typedef struct MemoryListener MemoryListener;
>  typedef struct MemoryMappingList MemoryMappingList;
>  typedef struct MemoryRegion MemoryRegion;
> +typedef struct IOMMUMemoryRegion IOMMUMemoryRegion;
>  typedef struct MemoryRegionCache MemoryRegionCache;
>  typedef struct MemoryRegionSection MemoryRegionSection;
>  typedef struct MigrationIncomingState MigrationIncomingState;
> diff --git a/exec.c b/exec.c
> index e57a8a2178..bbd8df7a9d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -462,20 +462,20 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>  {
>      IOMMUTLBEntry iotlb = {0};
>      MemoryRegionSection *section;
> -    MemoryRegion *mr;
> +    IOMMUMemoryRegion *iommumr;
>
>      for (;;) {
>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>          section = address_space_lookup_region(d, addr, false);
>          addr = addr - section->offset_within_address_space
>                 + section->offset_within_region;
> -        mr = section->mr;
>
> -        if (!mr->iommu_ops) {
> +        iommumr = memory_region_get_iommu(section->mr);
> +        if (!iommumr) {
>              break;
>          }
>
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>          if (!(iotlb.perm & (1 << is_write))) {
>              iotlb.target_as = NULL;
>              break;
> @@ -497,17 +497,19 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
>      IOMMUTLBEntry iotlb;
>      MemoryRegionSection *section;
>      MemoryRegion *mr;
> +    IOMMUMemoryRegion *iommumr;
>
>      for (;;) {
>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>          section = address_space_translate_internal(d, addr, &addr, plen, true);
>          mr = section->mr;
>
> -        if (!mr->iommu_ops) {
> +        iommumr = memory_region_get_iommu(mr);
> +        if (!iommumr) {
>              break;
>          }
>
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>                  | (addr & iotlb.addr_mask));
>          *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
> @@ -538,7 +540,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
>
>      section = address_space_translate_internal(d, addr, xlat, plen, false);
>
> -    assert(!section->mr->iommu_ops);
> +    assert(!memory_region_is_iommu(section->mr));
>      return section;
>  }
>  #endif
> diff --git a/hw/dma/sun4m_iommu.c b/hw/dma/sun4m_iommu.c
> index b3cbc54c23..539115b629 100644
> --- a/hw/dma/sun4m_iommu.c
> +++ b/hw/dma/sun4m_iommu.c
> @@ -134,7 +134,7 @@
>  typedef struct IOMMUState {
>      SysBusDevice parent_obj;
>
> -    MemoryRegion iomem;
> +    IOMMUMemoryRegion iomem;
>      uint32_t regs[IOMMU_NREGS];
>      hwaddr iostart;
>      qemu_irq irq;
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index f86a40aa30..e76c29aec6 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -51,7 +51,7 @@ struct AMDVIAddressSpace {
>      uint8_t bus_num;            /* bus number                           */
>      uint8_t devfn;              /* device function                      */
>      AMDVIState *iommu_state;    /* AMDVI - one per machine              */
> -    MemoryRegion iommu;         /* Device's address translation region  */
> +    IOMMUMemoryRegion iommu;    /* Device's address translation region  */
>      MemoryRegion iommu_ir;      /* Device's interrupt remapping region  */
>      AddressSpace as;            /* device's corresponding address space */
>  };
> @@ -986,7 +986,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>      return addr >= AMDVI_INT_ADDR_FIRST && addr <= AMDVI_INT_ADDR_LAST;
>  }
>
> -static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
> +static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>                                       bool is_write)
>  {
>      AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
> @@ -1045,8 +1045,9 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>
>          memory_region_init_iommu(&iommu_as[devfn]->iommu, OBJECT(s),
>                                   &s->iommu_ops, "amd-iommu", UINT64_MAX);
> -        address_space_init(&iommu_as[devfn]->as, &iommu_as[devfn]->iommu,
> -                           "amd-iommu");
> +	address_space_init(&iommu_as[devfn]->as,
> +			   MEMORY_REGION(&iommu_as[devfn]->iommu),
> +			   "amd-iommu");
>      }
>      return &iommu_as[devfn]->as;
>  }
> @@ -1066,7 +1067,7 @@ static const MemoryRegionOps mmio_mem_ops = {
>      }
>  };
>
> -static void amdvi_iommu_notify_flag_changed(MemoryRegion *iommu,
> +static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                              IOMMUNotifierFlag old,
>                                              IOMMUNotifierFlag new)
>  {
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 22d8226e43..233fa75b64 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1457,7 +1457,8 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>      entry.iova = addr;
>      entry.perm = IOMMU_NONE;
>      entry.translated_addr = 0;
> -    memory_region_notify_iommu(entry.target_as->root, entry);
> +    memory_region_notify_iommu(memory_region_get_iommu(entry.target_as->root),
> +			       entry);
>
>  done:
>      return true;
> @@ -1968,7 +1969,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>      }
>  }
>
> -static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
> +static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>                                           bool is_write)
>  {
>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> @@ -2000,7 +2001,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>      return ret;
>  }
>
> -static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
> +static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                            IOMMUNotifierFlag old,
>                                            IOMMUNotifierFlag new)
>  {
> @@ -2394,10 +2395,11 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>          memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
>                                &vtd_mem_ir_ops, s, "intel_iommu_ir",
>                                VTD_INTERRUPT_ADDR_SIZE);
> -        memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
> -                                    &vtd_dev_as->iommu_ir);
> +	memory_region_add_subregion(MEMORY_REGION(&vtd_dev_as->iommu),
> +				    VTD_INTERRUPT_ADDR_FIRST,
> +				    &vtd_dev_as->iommu_ir);
>          address_space_init(&vtd_dev_as->as,
> -                           &vtd_dev_as->iommu, name);
> +                           MEMORY_REGION(&vtd_dev_as->iommu), name);
>      }
>      return vtd_dev_as;
>  }
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 9e30e148d6..5051110b9d 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -110,7 +110,8 @@ static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
>  }
>
>  /* Called from RCU critical section */
> -static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
> +static IOMMUTLBEntry spapr_tce_translate_iommu(IOMMUMemoryRegion *iommu,
> +                                               hwaddr addr,
>                                                 bool is_write)
>  {
>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> @@ -150,14 +151,14 @@ static void spapr_tce_table_pre_save(void *opaque)
>                                 tcet->bus_offset, tcet->page_shift);
>  }
>
> -static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
> +static uint64_t spapr_tce_get_min_page_size(IOMMUMemoryRegion *iommu)
>  {
>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>
>      return 1ULL << tcet->page_shift;
>  }
>
> -static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
> +static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                            IOMMUNotifierFlag old,
>                                            IOMMUNotifierFlag new)
>  {
> @@ -265,7 +266,9 @@ static int spapr_tce_table_realize(DeviceState *dev)
>      memory_region_init(&tcet->root, tcetobj, tmp, UINT64_MAX);
>
>      snprintf(tmp, sizeof(tmp), "tce-iommu-%x", tcet->liobn);
> -    memory_region_init_iommu(&tcet->iommu, tcetobj, &spapr_iommu_ops, tmp, 0);
> +    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION,
> +                                  &tcet->iommu, tcetobj, &spapr_iommu_ops,
> +                                  tmp, 0);
>
>      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>
> @@ -341,9 +344,10 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
>                                          &tcet->fd,
>                                          tcet->need_vfio);
>
> -    memory_region_set_size(&tcet->iommu,
> +    memory_region_set_size(MEMORY_REGION(&tcet->iommu),
>                             (uint64_t)tcet->nb_table << tcet->page_shift);
> -    memory_region_add_subregion(&tcet->root, tcet->bus_offset, &tcet->iommu);
> +    memory_region_add_subregion(&tcet->root, tcet->bus_offset,
> +                                MEMORY_REGION(&tcet->iommu));
>  }
>
>  void spapr_tce_table_disable(sPAPRTCETable *tcet)
> @@ -352,8 +356,8 @@ void spapr_tce_table_disable(sPAPRTCETable *tcet)
>          return;
>      }
>
> -    memory_region_del_subregion(&tcet->root, &tcet->iommu);
> -    memory_region_set_size(&tcet->iommu, 0);
> +    memory_region_del_subregion(&tcet->root, MEMORY_REGION(&tcet->iommu));
> +    memory_region_set_size(MEMORY_REGION(&tcet->iommu), 0);
>
>      spapr_tce_free_table(tcet->table, tcet->fd, tcet->nb_table);
>      tcet->fd = -1;
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 69b0291e8a..27e7336a76 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -354,7 +354,7 @@ out:
>      return pte;
>  }
>
> -static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *mr, hwaddr addr,
> +static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
>                                            bool is_write)
>  {
>      uint64_t pte;
> @@ -523,14 +523,14 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
>      memory_region_init_iommu(&iommu->iommu_mr, OBJECT(&iommu->mr),
>                               &s390_iommu_ops, name, iommu->pal + 1);
>      iommu->enabled = true;
> -    memory_region_add_subregion(&iommu->mr, 0, &iommu->iommu_mr);
> +    memory_region_add_subregion(&iommu->mr, 0, MEMORY_REGION(&iommu->iommu_mr));
>      g_free(name);
>  }
>
>  void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
>  {
>      iommu->enabled = false;
> -    memory_region_del_subregion(&iommu->mr, &iommu->iommu_mr);
> +    memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr));
>      object_unparent(OBJECT(&iommu->iommu_mr));
>  }
>
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index d2a8c0a083..b4fe4da798 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -561,7 +561,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>      S390PCIIOMMU *iommu;
>      hwaddr start, end;
>      IOMMUTLBEntry entry;
> -    MemoryRegion *mr;
> +    IOMMUMemoryRegion *iommumr;
>
>      cpu_synchronize_state(CPU(cpu));
>
> @@ -620,9 +620,9 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>          goto out;
>      }
>
> -    mr = &iommu->iommu_mr;
> +    iommumr = &iommu->iommu_mr;
>      while (start < end) {
> -        entry = mr->iommu_ops->translate(mr, start, 0);
> +        entry = iommumr->iommu_ops->translate(iommumr, start, 0);
>
>          if (!entry.translated_addr) {
>              pbdev->state = ZPCI_FS_ERROR;
> @@ -633,7 +633,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
>              goto out;
>          }
>
> -        memory_region_notify_iommu(mr, entry);
> +        memory_region_notify_iommu(iommumr, entry);
>          start += entry.addr_mask + 1;
>      }
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f3ba9b9007..ab95db689c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -465,6 +465,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>
>      if (memory_region_is_iommu(section->mr)) {
>          VFIOGuestIOMMU *giommu;
> +        IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
>
>          trace_vfio_listener_region_add_iommu(iova, end);
>          /*
> @@ -474,7 +475,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           * device emulation the VFIO iommu handles to use).
>           */
>          giommu = g_malloc0(sizeof(*giommu));
> -        giommu->iommu = section->mr;
> +        giommu->iommu = iommumr;
>          giommu->iommu_offset = section->offset_within_address_space -
>                                 section->offset_within_region;
>          giommu->container = container;
> @@ -482,7 +483,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>
> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> +        memory_region_register_iommu_notifier(section->mr, &giommu->n);
>          memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
>
>          return;
> @@ -550,8 +551,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
>          VFIOGuestIOMMU *giommu;
>
>          QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> -            if (giommu->iommu == section->mr) {
> -                memory_region_unregister_iommu_notifier(giommu->iommu,
> +            if (MEMORY_REGION(giommu->iommu) == section->mr) {
> +                memory_region_unregister_iommu_notifier(section->mr,
>                                                          &giommu->n);
>                  QLIST_REMOVE(giommu, giommu_next);
>                  g_free(giommu);
> @@ -1141,7 +1142,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
>          QLIST_REMOVE(container, next);
>
>          QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> -            memory_region_unregister_iommu_notifier(giommu->iommu, &giommu->n);
> +            memory_region_unregister_iommu_notifier(
> +                    MEMORY_REGION(giommu->iommu), &giommu->n);
>              QLIST_REMOVE(giommu, giommu_next);
>              g_free(giommu);
>          }
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4409bcc0d7..551870d46b 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -143,7 +143,8 @@ int vfio_spapr_create_window(VFIOContainer *container,
>                               hwaddr *pgsize)
>  {
>      int ret;
> -    unsigned pagesize = memory_region_iommu_get_min_page_size(section->mr);
> +    IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
> +    unsigned pagesize = memory_region_iommu_get_min_page_size(iommumr);
>      unsigned entries, pages;
>      struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) };
>
> diff --git a/memory.c b/memory.c
> index 4c95aaf39c..62796536dc 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -975,12 +975,11 @@ static char *memory_region_escape_name(const char *name)
>      return escaped;
>  }
>
> -void memory_region_init(MemoryRegion *mr,
> -                        Object *owner,
> -                        const char *name,
> -                        uint64_t size)
> +static void memory_region_do_init(MemoryRegion *mr,
> +                                  Object *owner,
> +                                  const char *name,
> +                                  uint64_t size)
>  {
> -    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
>      mr->size = int128_make64(size);
>      if (size == UINT64_MAX) {
>          mr->size = int128_2_64();
> @@ -1004,6 +1003,15 @@ void memory_region_init(MemoryRegion *mr,
>      }
>  }
>
> +void memory_region_init(MemoryRegion *mr,
> +                        Object *owner,
> +                        const char *name,
> +                        uint64_t size)
> +{
> +    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
> +    memory_region_do_init(mr, owner, name, size);
> +}
> +
>  static void memory_region_get_addr(Object *obj, Visitor *v, const char *name,
>                                     void *opaque, Error **errp)
>  {
> @@ -1090,6 +1098,13 @@ static void memory_region_initfn(Object *obj)
>                          NULL, NULL, &error_abort);
>  }
>
> +static void iommu_memory_region_initfn(Object *obj)
> +{
> +    MemoryRegion *mr = MEMORY_REGION(obj);
> +
> +    mr->is_iommu = true;
> +}
> +
>  static uint64_t unassigned_mem_read(void *opaque, hwaddr addr,
>                                      unsigned size)
>  {
> @@ -1473,17 +1488,33 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>      mr->ram_block = qemu_ram_alloc(size, mr, errp);
>  }
>
> -void memory_region_init_iommu(MemoryRegion *mr,
> -                              Object *owner,
> -                              const MemoryRegionIOMMUOps *ops,
> -                              const char *name,
> -                              uint64_t size)
> +void memory_region_init_iommu_type(const char *mrtypename,
> +                                   IOMMUMemoryRegion *iommumr,
> +                                   Object *owner,
> +                                   const MemoryRegionIOMMUOps *ops,
> +                                   const char *name,
> +                                   uint64_t size)
>  {
> -    memory_region_init(mr, owner, name, size);
> -    mr->iommu_ops = ops,
> +    struct MemoryRegion *mr;
> +    size_t instance_size = object_type_get_instance_size(mrtypename);
> +
> +    object_initialize(iommumr, instance_size, mrtypename);
> +    mr = MEMORY_REGION(iommumr);
> +    memory_region_do_init(mr, owner, name, size);
> +    iommumr->iommu_ops = ops,
>      mr->terminates = true;  /* then re-forwards */
> -    QLIST_INIT(&mr->iommu_notify);
> -    mr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
> +    QLIST_INIT(&iommumr->iommu_notify);
> +    iommumr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
> +}
> +
> +void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
> +                              Object *owner,
> +                              const MemoryRegionIOMMUOps *ops,
> +                              const char *name,
> +                              uint64_t size)
> +{
> +    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION, iommumr,
> +                                  owner, ops, name, size);
>  }
>
>  static void memory_region_finalize(Object *obj)
> @@ -1578,57 +1609,61 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
>      return memory_region_get_dirty_log_mask(mr) & (1 << client);
>  }
>
> -static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> +static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommumr)
>  {
>      IOMMUNotifierFlag flags = IOMMU_NOTIFIER_NONE;
>      IOMMUNotifier *iommu_notifier;
>
> -    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
>          flags |= iommu_notifier->notifier_flags;
>      }
>
> -    if (flags != mr->iommu_notify_flags &&
> -        mr->iommu_ops->notify_flag_changed) {
> -        mr->iommu_ops->notify_flag_changed(mr, mr->iommu_notify_flags,
> -                                           flags);
> +    if (flags != iommumr->iommu_notify_flags &&
> +        iommumr->iommu_ops->notify_flag_changed) {
> +        iommumr->iommu_ops->notify_flag_changed(iommumr,
> +                                                iommumr->iommu_notify_flags,
> +                                                flags);
>      }
>
> -    mr->iommu_notify_flags = flags;
> +    iommumr->iommu_notify_flags = flags;
>  }
>
>  void memory_region_register_iommu_notifier(MemoryRegion *mr,
>                                             IOMMUNotifier *n)
>  {
> +    IOMMUMemoryRegion *iommumr;
> +
>      if (mr->alias) {
>          memory_region_register_iommu_notifier(mr->alias, n);
>          return;
>      }
>
>      /* We need to register for at least one bitfield */
> +    iommumr = IOMMU_MEMORY_REGION(mr);
>      assert(n->notifier_flags != IOMMU_NOTIFIER_NONE);
> -    QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
> -    memory_region_update_iommu_notify_flags(mr);
> +    QLIST_INSERT_HEAD(&iommumr->iommu_notify, n, node);
> +    memory_region_update_iommu_notify_flags(iommumr);
>  }
>
> -uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
> +uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommumr)
>  {
> -    assert(memory_region_is_iommu(mr));
> -    if (mr->iommu_ops && mr->iommu_ops->get_min_page_size) {
> -        return mr->iommu_ops->get_min_page_size(mr);
> +    if (iommumr->iommu_ops && iommumr->iommu_ops->get_min_page_size) {
> +        return iommumr->iommu_ops->get_min_page_size(iommumr);
>      }
>      return TARGET_PAGE_SIZE;
>  }
>
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> +void memory_region_iommu_replay(IOMMUMemoryRegion *iommumr, IOMMUNotifier *n,
>                                  bool is_write)
>  {
> +    MemoryRegion *mr = MEMORY_REGION(iommumr);
>      hwaddr addr, granularity;
>      IOMMUTLBEntry iotlb;
>
> -    granularity = memory_region_iommu_get_min_page_size(mr);
> +    granularity = memory_region_iommu_get_min_page_size(iommumr);
>
>      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>          if (iotlb.perm != IOMMU_NONE) {
>              n->notify(n, &iotlb);
>          }
> @@ -1644,21 +1679,24 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>  void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>                                               IOMMUNotifier *n)
>  {
> +    IOMMUMemoryRegion *iommumr;
> +
>      if (mr->alias) {
>          memory_region_unregister_iommu_notifier(mr->alias, n);
>          return;
>      }
>      QLIST_REMOVE(n, node);
> -    memory_region_update_iommu_notify_flags(mr);
> +    iommumr = IOMMU_MEMORY_REGION(mr);
> +    memory_region_update_iommu_notify_flags(iommumr);
>  }
>
> -void memory_region_notify_iommu(MemoryRegion *mr,
> +void memory_region_notify_iommu(IOMMUMemoryRegion *iommumr,
>                                  IOMMUTLBEntry entry)
>  {
>      IOMMUNotifier *iommu_notifier;
>      IOMMUNotifierFlag request_flags;
>
> -    assert(memory_region_is_iommu(mr));
> +    assert(memory_region_is_iommu(MEMORY_REGION(iommumr)));
>
>      if (entry.perm & IOMMU_RW) {
>          request_flags = IOMMU_NOTIFIER_MAP;
> @@ -1666,7 +1704,7 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>          request_flags = IOMMU_NOTIFIER_UNMAP;
>      }
>
> -    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
>          if (iommu_notifier->notifier_flags & request_flags) {
>              iommu_notifier->notify(iommu_notifier, &entry);
>          }
> @@ -2660,9 +2698,17 @@ static const TypeInfo memory_region_info = {
>      .instance_finalize  = memory_region_finalize,
>  };
>
> +static const TypeInfo iommu_memory_region_info = {
> +    .parent             = TYPE_MEMORY_REGION,
> +    .name               = TYPE_IOMMU_MEMORY_REGION,
> +    .instance_size      = sizeof(IOMMUMemoryRegion),
> +    .instance_init      = iommu_memory_region_initfn,
> +};
> +
>  static void memory_register_types(void)
>  {
>      type_register_static(&memory_region_info);
> +    type_register_static(&iommu_memory_region_info);
>  }
>
>  type_init(memory_register_types)
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion
  2017-04-03 12:53   ` Philippe Mathieu-Daudé
@ 2017-04-11  8:35     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-11  8:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, qemu-ppc, David Gibson

On 03/04/17 22:53, Philippe Mathieu-Daudé wrote:
> On 04/01/2017 09:37 AM, Alexey Kardashevskiy wrote:
>> This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
>> as a parent.
>>
>> This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
>> dymanic QOM casting in fast path (address_space_translate, etc),
>> this adds an @is_iommu boolean flag to MR and provides new helper to
>> do simple cast to IOMMU MR - memory_region_get_iommu. The flag
>> is set in the instance init callback. This defines
>> memory_region_is_iommu as memory_region_get_iommu()!=NULL.
>>
>> This switches MemoryRegion to IOMMUMemoryRegion in most places except
>> the ones where MemoryRegion may be an alias.
>>
>> This defines memory_region_init_iommu_type() to allow creating
>> IOMMUMemoryRegion subclasses. In order to support custom QOM type,
>> this splits memory_region_init() to object_initialize() +
>> memory_region_do_init.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


This could go first as it is an independent thing and the rest of the
series depends on kernel headers update or does not depends on this patch.
Paolo?


> 
>> ---
>> Changes:
>> v3:
>> * added mr->is_iommu
>> * updated i386/x86_64/s390/sun
>> ---
>>  hw/s390x/s390-pci-bus.h       |   2 +-
>>  include/exec/memory.h         |  64 ++++++++++++++++++------
>>  include/hw/i386/intel_iommu.h |   2 +-
>>  include/hw/ppc/spapr.h        |   3 +-
>>  include/hw/vfio/vfio-common.h |   2 +-
>>  include/qemu/typedefs.h       |   1 +
>>  exec.c                        |  16 +++---
>>  hw/dma/sun4m_iommu.c          |   2 +-
>>  hw/i386/amd_iommu.c           |  11 ++--
>>  hw/i386/intel_iommu.c         |  14 +++---
>>  hw/ppc/spapr_iommu.c          |  20 +++++---
>>  hw/s390x/s390-pci-bus.c       |   6 +--
>>  hw/s390x/s390-pci-inst.c      |   8 +--
>>  hw/vfio/common.c              |  12 +++--
>>  hw/vfio/spapr.c               |   3 +-
>>  memory.c                      | 114
>> +++++++++++++++++++++++++++++-------------
>>  16 files changed, 188 insertions(+), 92 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
>> index dcbf4820c9..441ddbedcb 100644
>> --- a/hw/s390x/s390-pci-bus.h
>> +++ b/hw/s390x/s390-pci-bus.h
>> @@ -267,7 +267,7 @@ typedef struct S390PCIIOMMU {
>>      S390PCIBusDevice *pbdev;
>>      AddressSpace as;
>>      MemoryRegion mr;
>> -    MemoryRegion iommu_mr;
>> +    IOMMUMemoryRegion iommu_mr;
>>      bool enabled;
>>      uint64_t g_iota;
>>      uint64_t pba;
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index e39256ad03..29d59f4f7f 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -37,6 +37,10 @@
>>  #define MEMORY_REGION(obj) \
>>          OBJECT_CHECK(MemoryRegion, (obj), TYPE_MEMORY_REGION)
>>
>> +#define TYPE_IOMMU_MEMORY_REGION "qemu:iommu-memory-region"
>> +#define IOMMU_MEMORY_REGION(obj) \
>> +        OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_IOMMU_MEMORY_REGION)
>> +
>>  typedef struct MemoryRegionOps MemoryRegionOps;
>>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>>
>> @@ -167,11 +171,12 @@ typedef struct MemoryRegionIOMMUOps
>> MemoryRegionIOMMUOps;
>>
>>  struct MemoryRegionIOMMUOps {
>>      /* Return a TLB entry that contains a given address. */
>> -    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool
>> is_write);
>> +    IOMMUTLBEntry (*translate)(IOMMUMemoryRegion *iommu, hwaddr addr,
>> +                               bool is_write);
>>      /* Returns minimum supported page size */
>> -    uint64_t (*get_min_page_size)(MemoryRegion *iommu);
>> +    uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
>>      /* Called when IOMMU Notifier flag changed */
>> -    void (*notify_flag_changed)(MemoryRegion *iommu,
>> +    void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
>>                                  IOMMUNotifierFlag old_flags,
>>                                  IOMMUNotifierFlag new_flags);
>>  };
>> @@ -195,7 +200,6 @@ struct MemoryRegion {
>>      uint8_t dirty_log_mask;
>>      RAMBlock *ram_block;
>>      Object *owner;
>> -    const MemoryRegionIOMMUOps *iommu_ops;
>>
>>      const MemoryRegionOps *ops;
>>      void *opaque;
>> @@ -218,6 +222,13 @@ struct MemoryRegion {
>>      const char *name;
>>      unsigned ioeventfd_nb;
>>      MemoryRegionIoeventfd *ioeventfds;
>> +    bool is_iommu;
>> +};
>> +
>> +struct IOMMUMemoryRegion {
>> +    MemoryRegion parent_obj;
>> +
>> +    const MemoryRegionIOMMUOps *iommu_ops;
>>      QLIST_HEAD(, IOMMUNotifier) iommu_notify;
>>      IOMMUNotifierFlag iommu_notify_flags;
>>  };
>> @@ -555,19 +566,39 @@ static inline void
>> memory_region_init_reservation(MemoryRegion *mr,
>>  }
>>
>>  /**
>> + * memory_region_init_iommu_type: Initialize a memory region of a custom
>> type
>> + * that translates addresses
>> + *
>> + * An IOMMU region translates addresses and forwards accesses to a target
>> + * memory region.
>> + *
>> + * @typename: QOM class name
>> + * @mr: the #IOMMUMemoryRegion to be initialized
>> + * @owner: the object that tracks the region's reference count
>> + * @ops: a function that translates addresses into the @target region
>> + * @name: used for debugging; not visible to the user or ABI
>> + * @size: size of the region.
>> + */
>> +void memory_region_init_iommu_type(const char *mrtypename,
>> +                                   IOMMUMemoryRegion *iommumr,
>> +                                   Object *owner,
>> +                                   const MemoryRegionIOMMUOps *ops,
>> +                                   const char *name,
>> +                                   uint64_t size);
>> +/**
>>   * memory_region_init_iommu: Initialize a memory region that translates
>>   * addresses
>>   *
>>   * An IOMMU region translates addresses and forwards accesses to a target
>>   * memory region.
>>   *
>> - * @mr: the #MemoryRegion to be initialized
>> + * @mr: the #IOMMUMemoryRegion to be initialized
>>   * @owner: the object that tracks the region's reference count
>>   * @ops: a function that translates addresses into the @target region
>>   * @name: used for debugging; not visible to the user or ABI
>>   * @size: size of the region.
>>   */
>> -void memory_region_init_iommu(MemoryRegion *mr,
>> +void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
>>                                struct Object *owner,
>>                                const MemoryRegionIOMMUOps *ops,
>>                                const char *name,
>> @@ -622,20 +653,25 @@ static inline bool
>> memory_region_is_romd(MemoryRegion *mr)
>>  }
>>
>>  /**
>> - * memory_region_is_iommu: check whether a memory region is an iommu
>> + * memory_region_get_iommu: check whether a memory region is an iommu
>>   *
>> - * Returns %true is a memory region is an iommu.
>> + * Returns pointer to IOMMUMemoryRegion if a memory region is an iommu,
>> + * otherwise NULL.
>>   *
>>   * @mr: the memory region being queried
>>   */
>> -static inline bool memory_region_is_iommu(MemoryRegion *mr)
>> +static inline IOMMUMemoryRegion *memory_region_get_iommu(MemoryRegion *mr)
>>  {
>>      if (mr->alias) {
>> -        return memory_region_is_iommu(mr->alias);
>> +        return memory_region_get_iommu(mr->alias);
>>      }
>> -    return mr->iommu_ops;
>> +    if (mr->is_iommu) {
>> +        return (IOMMUMemoryRegion *) mr;
>> +    }
>> +    return NULL;
>>  }
>>
>> +#define memory_region_is_iommu(mr) (memory_region_get_iommu(mr) != NULL)
>>
>>  /**
>>   * memory_region_iommu_get_min_page_size: get minimum supported page size
>> @@ -645,7 +681,7 @@ static inline bool
>> memory_region_is_iommu(MemoryRegion *mr)
>>   *
>>   * @mr: the memory region being queried
>>   */
>> -uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
>> +uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *mr);
>>
>>  /**
>>   * memory_region_notify_iommu: notify a change in an IOMMU translation
>> entry.
>> @@ -664,7 +700,7 @@ uint64_t
>> memory_region_iommu_get_min_page_size(MemoryRegion *mr);
>>   *         replaces all old entries for the same virtual I/O address range.
>>   *         Deleted entries have .@perm == 0.
>>   */
>> -void memory_region_notify_iommu(MemoryRegion *mr,
>> +void memory_region_notify_iommu(IOMMUMemoryRegion *mr,
>>                                  IOMMUTLBEntry entry);
>>
>>  /**
>> @@ -689,7 +725,7 @@ void
>> memory_region_register_iommu_notifier(MemoryRegion *mr,
>>   * @is_write: Whether to treat the replay as a translate "write"
>>   *     through the iommu
>>   */
>> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>> +void memory_region_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n,
>>                                  bool is_write);
>>
>>  /**
>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>> index fe645aa93a..34f1b61957 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -82,7 +82,7 @@ struct VTDAddressSpace {
>>      PCIBus *bus;
>>      uint8_t devfn;
>>      AddressSpace as;
>> -    MemoryRegion iommu;
>> +    IOMMUMemoryRegion iommu;
>>      MemoryRegion iommu_ir;      /* Interrupt region: 0xfeeXXXXX */
>>      IntelIOMMUState *iommu_state;
>>      VTDContextCacheEntry context_cache_entry;
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index e27de64d31..6997ed7e98 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -590,7 +590,8 @@ struct sPAPRTCETable {
>>      bool bypass;
>>      bool need_vfio;
>>      int fd;
>> -    MemoryRegion root, iommu;
>> +    MemoryRegion root;
>> +    IOMMUMemoryRegion iommu;
>>      struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility
>> only */
>>      QLIST_ENTRY(sPAPRTCETable) list;
>>  };
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index c582de18c9..7a4135ae6f 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -94,7 +94,7 @@ typedef struct VFIOContainer {
>>
>>  typedef struct VFIOGuestIOMMU {
>>      VFIOContainer *container;
>> -    MemoryRegion *iommu;
>> +    IOMMUMemoryRegion *iommu;
>>      hwaddr iommu_offset;
>>      IOMMUNotifier n;
>>      QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>> index e95f28cfec..b45f71ec11 100644
>> --- a/include/qemu/typedefs.h
>> +++ b/include/qemu/typedefs.h
>> @@ -45,6 +45,7 @@ typedef struct MachineState MachineState;
>>  typedef struct MemoryListener MemoryListener;
>>  typedef struct MemoryMappingList MemoryMappingList;
>>  typedef struct MemoryRegion MemoryRegion;
>> +typedef struct IOMMUMemoryRegion IOMMUMemoryRegion;
>>  typedef struct MemoryRegionCache MemoryRegionCache;
>>  typedef struct MemoryRegionSection MemoryRegionSection;
>>  typedef struct MigrationIncomingState MigrationIncomingState;
>> diff --git a/exec.c b/exec.c
>> index e57a8a2178..bbd8df7a9d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -462,20 +462,20 @@ IOMMUTLBEntry
>> address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>>  {
>>      IOMMUTLBEntry iotlb = {0};
>>      MemoryRegionSection *section;
>> -    MemoryRegion *mr;
>> +    IOMMUMemoryRegion *iommumr;
>>
>>      for (;;) {
>>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>>          section = address_space_lookup_region(d, addr, false);
>>          addr = addr - section->offset_within_address_space
>>                 + section->offset_within_region;
>> -        mr = section->mr;
>>
>> -        if (!mr->iommu_ops) {
>> +        iommumr = memory_region_get_iommu(section->mr);
>> +        if (!iommumr) {
>>              break;
>>          }
>>
>> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
>> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>>          if (!(iotlb.perm & (1 << is_write))) {
>>              iotlb.target_as = NULL;
>>              break;
>> @@ -497,17 +497,19 @@ MemoryRegion *address_space_translate(AddressSpace
>> *as, hwaddr addr,
>>      IOMMUTLBEntry iotlb;
>>      MemoryRegionSection *section;
>>      MemoryRegion *mr;
>> +    IOMMUMemoryRegion *iommumr;
>>
>>      for (;;) {
>>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>>          section = address_space_translate_internal(d, addr, &addr, plen,
>> true);
>>          mr = section->mr;
>>
>> -        if (!mr->iommu_ops) {
>> +        iommumr = memory_region_get_iommu(mr);
>> +        if (!iommumr) {
>>              break;
>>          }
>>
>> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
>> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>>          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>>                  | (addr & iotlb.addr_mask));
>>          *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
>> @@ -538,7 +540,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int
>> asidx, hwaddr addr,
>>
>>      section = address_space_translate_internal(d, addr, xlat, plen, false);
>>
>> -    assert(!section->mr->iommu_ops);
>> +    assert(!memory_region_is_iommu(section->mr));
>>      return section;
>>  }
>>  #endif
>> diff --git a/hw/dma/sun4m_iommu.c b/hw/dma/sun4m_iommu.c
>> index b3cbc54c23..539115b629 100644
>> --- a/hw/dma/sun4m_iommu.c
>> +++ b/hw/dma/sun4m_iommu.c
>> @@ -134,7 +134,7 @@
>>  typedef struct IOMMUState {
>>      SysBusDevice parent_obj;
>>
>> -    MemoryRegion iomem;
>> +    IOMMUMemoryRegion iomem;
>>      uint32_t regs[IOMMU_NREGS];
>>      hwaddr iostart;
>>      qemu_irq irq;
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index f86a40aa30..e76c29aec6 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -51,7 +51,7 @@ struct AMDVIAddressSpace {
>>      uint8_t bus_num;            /* bus number                           */
>>      uint8_t devfn;              /* device function                      */
>>      AMDVIState *iommu_state;    /* AMDVI - one per machine              */
>> -    MemoryRegion iommu;         /* Device's address translation region  */
>> +    IOMMUMemoryRegion iommu;    /* Device's address translation region  */
>>      MemoryRegion iommu_ir;      /* Device's interrupt remapping region  */
>>      AddressSpace as;            /* device's corresponding address space */
>>  };
>> @@ -986,7 +986,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>>      return addr >= AMDVI_INT_ADDR_FIRST && addr <= AMDVI_INT_ADDR_LAST;
>>  }
>>
>> -static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
>> +static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>>                                       bool is_write)
>>  {
>>      AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>> @@ -1045,8 +1045,9 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus
>> *bus, void *opaque, int devfn)
>>
>>          memory_region_init_iommu(&iommu_as[devfn]->iommu, OBJECT(s),
>>                                   &s->iommu_ops, "amd-iommu", UINT64_MAX);
>> -        address_space_init(&iommu_as[devfn]->as, &iommu_as[devfn]->iommu,
>> -                           "amd-iommu");
>> +    address_space_init(&iommu_as[devfn]->as,
>> +               MEMORY_REGION(&iommu_as[devfn]->iommu),
>> +               "amd-iommu");
>>      }
>>      return &iommu_as[devfn]->as;
>>  }
>> @@ -1066,7 +1067,7 @@ static const MemoryRegionOps mmio_mem_ops = {
>>      }
>>  };
>>
>> -static void amdvi_iommu_notify_flag_changed(MemoryRegion *iommu,
>> +static void amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>                                              IOMMUNotifierFlag old,
>>                                              IOMMUNotifierFlag new)
>>  {
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 22d8226e43..233fa75b64 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -1457,7 +1457,8 @@ static bool
>> vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>>      entry.iova = addr;
>>      entry.perm = IOMMU_NONE;
>>      entry.translated_addr = 0;
>> -    memory_region_notify_iommu(entry.target_as->root, entry);
>> +   
>> memory_region_notify_iommu(memory_region_get_iommu(entry.target_as->root),
>> +                   entry);
>>
>>  done:
>>      return true;
>> @@ -1968,7 +1969,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>>      }
>>  }
>>
>> -static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>> +static IOMMUTLBEntry vtd_iommu_translate(IOMMUMemoryRegion *iommu,
>> hwaddr addr,
>>                                           bool is_write)
>>  {
>>      VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>> @@ -2000,7 +2001,7 @@ static IOMMUTLBEntry
>> vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
>>      return ret;
>>  }
>>
>> -static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
>> +static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>                                            IOMMUNotifierFlag old,
>>                                            IOMMUNotifierFlag new)
>>  {
>> @@ -2394,10 +2395,11 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState
>> *s, PCIBus *bus, int devfn)
>>          memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
>>                                &vtd_mem_ir_ops, s, "intel_iommu_ir",
>>                                VTD_INTERRUPT_ADDR_SIZE);
>> -        memory_region_add_subregion(&vtd_dev_as->iommu,
>> VTD_INTERRUPT_ADDR_FIRST,
>> -                                    &vtd_dev_as->iommu_ir);
>> +    memory_region_add_subregion(MEMORY_REGION(&vtd_dev_as->iommu),
>> +                    VTD_INTERRUPT_ADDR_FIRST,
>> +                    &vtd_dev_as->iommu_ir);
>>          address_space_init(&vtd_dev_as->as,
>> -                           &vtd_dev_as->iommu, name);
>> +                           MEMORY_REGION(&vtd_dev_as->iommu), name);
>>      }
>>      return vtd_dev_as;
>>  }
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 9e30e148d6..5051110b9d 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -110,7 +110,8 @@ static void spapr_tce_free_table(uint64_t *table, int
>> fd, uint32_t nb_table)
>>  }
>>
>>  /* Called from RCU critical section */
>> -static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu,
>> hwaddr addr,
>> +static IOMMUTLBEntry spapr_tce_translate_iommu(IOMMUMemoryRegion *iommu,
>> +                                               hwaddr addr,
>>                                                 bool is_write)
>>  {
>>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>> @@ -150,14 +151,14 @@ static void spapr_tce_table_pre_save(void *opaque)
>>                                 tcet->bus_offset, tcet->page_shift);
>>  }
>>
>> -static uint64_t spapr_tce_get_min_page_size(MemoryRegion *iommu)
>> +static uint64_t spapr_tce_get_min_page_size(IOMMUMemoryRegion *iommu)
>>  {
>>      sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>>
>>      return 1ULL << tcet->page_shift;
>>  }
>>
>> -static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
>> +static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>                                            IOMMUNotifierFlag old,
>>                                            IOMMUNotifierFlag new)
>>  {
>> @@ -265,7 +266,9 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>      memory_region_init(&tcet->root, tcetobj, tmp, UINT64_MAX);
>>
>>      snprintf(tmp, sizeof(tmp), "tce-iommu-%x", tcet->liobn);
>> -    memory_region_init_iommu(&tcet->iommu, tcetobj, &spapr_iommu_ops,
>> tmp, 0);
>> +    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION,
>> +                                  &tcet->iommu, tcetobj, &spapr_iommu_ops,
>> +                                  tmp, 0);
>>
>>      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>>
>> @@ -341,9 +344,10 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
>>                                          &tcet->fd,
>>                                          tcet->need_vfio);
>>
>> -    memory_region_set_size(&tcet->iommu,
>> +    memory_region_set_size(MEMORY_REGION(&tcet->iommu),
>>                             (uint64_t)tcet->nb_table << tcet->page_shift);
>> -    memory_region_add_subregion(&tcet->root, tcet->bus_offset,
>> &tcet->iommu);
>> +    memory_region_add_subregion(&tcet->root, tcet->bus_offset,
>> +                                MEMORY_REGION(&tcet->iommu));
>>  }
>>
>>  void spapr_tce_table_disable(sPAPRTCETable *tcet)
>> @@ -352,8 +356,8 @@ void spapr_tce_table_disable(sPAPRTCETable *tcet)
>>          return;
>>      }
>>
>> -    memory_region_del_subregion(&tcet->root, &tcet->iommu);
>> -    memory_region_set_size(&tcet->iommu, 0);
>> +    memory_region_del_subregion(&tcet->root, MEMORY_REGION(&tcet->iommu));
>> +    memory_region_set_size(MEMORY_REGION(&tcet->iommu), 0);
>>
>>      spapr_tce_free_table(tcet->table, tcet->fd, tcet->nb_table);
>>      tcet->fd = -1;
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index 69b0291e8a..27e7336a76 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -354,7 +354,7 @@ out:
>>      return pte;
>>  }
>>
>> -static IOMMUTLBEntry s390_translate_iommu(MemoryRegion *mr, hwaddr addr,
>> +static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr
>> addr,
>>                                            bool is_write)
>>  {
>>      uint64_t pte;
>> @@ -523,14 +523,14 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
>>      memory_region_init_iommu(&iommu->iommu_mr, OBJECT(&iommu->mr),
>>                               &s390_iommu_ops, name, iommu->pal + 1);
>>      iommu->enabled = true;
>> -    memory_region_add_subregion(&iommu->mr, 0, &iommu->iommu_mr);
>> +    memory_region_add_subregion(&iommu->mr, 0,
>> MEMORY_REGION(&iommu->iommu_mr));
>>      g_free(name);
>>  }
>>
>>  void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
>>  {
>>      iommu->enabled = false;
>> -    memory_region_del_subregion(&iommu->mr, &iommu->iommu_mr);
>> +    memory_region_del_subregion(&iommu->mr,
>> MEMORY_REGION(&iommu->iommu_mr));
>>      object_unparent(OBJECT(&iommu->iommu_mr));
>>  }
>>
>> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
>> index d2a8c0a083..b4fe4da798 100644
>> --- a/hw/s390x/s390-pci-inst.c
>> +++ b/hw/s390x/s390-pci-inst.c
>> @@ -561,7 +561,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1,
>> uint8_t r2)
>>      S390PCIIOMMU *iommu;
>>      hwaddr start, end;
>>      IOMMUTLBEntry entry;
>> -    MemoryRegion *mr;
>> +    IOMMUMemoryRegion *iommumr;
>>
>>      cpu_synchronize_state(CPU(cpu));
>>
>> @@ -620,9 +620,9 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1,
>> uint8_t r2)
>>          goto out;
>>      }
>>
>> -    mr = &iommu->iommu_mr;
>> +    iommumr = &iommu->iommu_mr;
>>      while (start < end) {
>> -        entry = mr->iommu_ops->translate(mr, start, 0);
>> +        entry = iommumr->iommu_ops->translate(iommumr, start, 0);
>>
>>          if (!entry.translated_addr) {
>>              pbdev->state = ZPCI_FS_ERROR;
>> @@ -633,7 +633,7 @@ int rpcit_service_call(S390CPU *cpu, uint8_t r1,
>> uint8_t r2)
>>              goto out;
>>          }
>>
>> -        memory_region_notify_iommu(mr, entry);
>> +        memory_region_notify_iommu(iommumr, entry);
>>          start += entry.addr_mask + 1;
>>      }
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index f3ba9b9007..ab95db689c 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -465,6 +465,7 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>
>>      if (memory_region_is_iommu(section->mr)) {
>>          VFIOGuestIOMMU *giommu;
>> +        IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
>>
>>          trace_vfio_listener_region_add_iommu(iova, end);
>>          /*
>> @@ -474,7 +475,7 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>           * device emulation the VFIO iommu handles to use).
>>           */
>>          giommu = g_malloc0(sizeof(*giommu));
>> -        giommu->iommu = section->mr;
>> +        giommu->iommu = iommumr;
>>          giommu->iommu_offset = section->offset_within_address_space -
>>                                 section->offset_within_region;
>>          giommu->container = container;
>> @@ -482,7 +483,7 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>          giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
>>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>>
>> -        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
>> +        memory_region_register_iommu_notifier(section->mr, &giommu->n);
>>          memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
>>
>>          return;
>> @@ -550,8 +551,8 @@ static void vfio_listener_region_del(MemoryListener
>> *listener,
>>          VFIOGuestIOMMU *giommu;
>>
>>          QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
>> -            if (giommu->iommu == section->mr) {
>> -                memory_region_unregister_iommu_notifier(giommu->iommu,
>> +            if (MEMORY_REGION(giommu->iommu) == section->mr) {
>> +                memory_region_unregister_iommu_notifier(section->mr,
>>                                                          &giommu->n);
>>                  QLIST_REMOVE(giommu, giommu_next);
>>                  g_free(giommu);
>> @@ -1141,7 +1142,8 @@ static void vfio_disconnect_container(VFIOGroup
>> *group)
>>          QLIST_REMOVE(container, next);
>>
>>          QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next,
>> tmp) {
>> -            memory_region_unregister_iommu_notifier(giommu->iommu,
>> &giommu->n);
>> +            memory_region_unregister_iommu_notifier(
>> +                    MEMORY_REGION(giommu->iommu), &giommu->n);
>>              QLIST_REMOVE(giommu, giommu_next);
>>              g_free(giommu);
>>          }
>> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
>> index 4409bcc0d7..551870d46b 100644
>> --- a/hw/vfio/spapr.c
>> +++ b/hw/vfio/spapr.c
>> @@ -143,7 +143,8 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>                               hwaddr *pgsize)
>>  {
>>      int ret;
>> -    unsigned pagesize = memory_region_iommu_get_min_page_size(section->mr);
>> +    IOMMUMemoryRegion *iommumr = IOMMU_MEMORY_REGION(section->mr);
>> +    unsigned pagesize = memory_region_iommu_get_min_page_size(iommumr);
>>      unsigned entries, pages;
>>      struct vfio_iommu_spapr_tce_create create = { .argsz =
>> sizeof(create) };
>>
>> diff --git a/memory.c b/memory.c
>> index 4c95aaf39c..62796536dc 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -975,12 +975,11 @@ static char *memory_region_escape_name(const char
>> *name)
>>      return escaped;
>>  }
>>
>> -void memory_region_init(MemoryRegion *mr,
>> -                        Object *owner,
>> -                        const char *name,
>> -                        uint64_t size)
>> +static void memory_region_do_init(MemoryRegion *mr,
>> +                                  Object *owner,
>> +                                  const char *name,
>> +                                  uint64_t size)
>>  {
>> -    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
>>      mr->size = int128_make64(size);
>>      if (size == UINT64_MAX) {
>>          mr->size = int128_2_64();
>> @@ -1004,6 +1003,15 @@ void memory_region_init(MemoryRegion *mr,
>>      }
>>  }
>>
>> +void memory_region_init(MemoryRegion *mr,
>> +                        Object *owner,
>> +                        const char *name,
>> +                        uint64_t size)
>> +{
>> +    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
>> +    memory_region_do_init(mr, owner, name, size);
>> +}
>> +
>>  static void memory_region_get_addr(Object *obj, Visitor *v, const char
>> *name,
>>                                     void *opaque, Error **errp)
>>  {
>> @@ -1090,6 +1098,13 @@ static void memory_region_initfn(Object *obj)
>>                          NULL, NULL, &error_abort);
>>  }
>>
>> +static void iommu_memory_region_initfn(Object *obj)
>> +{
>> +    MemoryRegion *mr = MEMORY_REGION(obj);
>> +
>> +    mr->is_iommu = true;
>> +}
>> +
>>  static uint64_t unassigned_mem_read(void *opaque, hwaddr addr,
>>                                      unsigned size)
>>  {
>> @@ -1473,17 +1488,33 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>>      mr->ram_block = qemu_ram_alloc(size, mr, errp);
>>  }
>>
>> -void memory_region_init_iommu(MemoryRegion *mr,
>> -                              Object *owner,
>> -                              const MemoryRegionIOMMUOps *ops,
>> -                              const char *name,
>> -                              uint64_t size)
>> +void memory_region_init_iommu_type(const char *mrtypename,
>> +                                   IOMMUMemoryRegion *iommumr,
>> +                                   Object *owner,
>> +                                   const MemoryRegionIOMMUOps *ops,
>> +                                   const char *name,
>> +                                   uint64_t size)
>>  {
>> -    memory_region_init(mr, owner, name, size);
>> -    mr->iommu_ops = ops,
>> +    struct MemoryRegion *mr;
>> +    size_t instance_size = object_type_get_instance_size(mrtypename);
>> +
>> +    object_initialize(iommumr, instance_size, mrtypename);
>> +    mr = MEMORY_REGION(iommumr);
>> +    memory_region_do_init(mr, owner, name, size);
>> +    iommumr->iommu_ops = ops,
>>      mr->terminates = true;  /* then re-forwards */
>> -    QLIST_INIT(&mr->iommu_notify);
>> -    mr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
>> +    QLIST_INIT(&iommumr->iommu_notify);
>> +    iommumr->iommu_notify_flags = IOMMU_NOTIFIER_NONE;
>> +}
>> +
>> +void memory_region_init_iommu(IOMMUMemoryRegion *iommumr,
>> +                              Object *owner,
>> +                              const MemoryRegionIOMMUOps *ops,
>> +                              const char *name,
>> +                              uint64_t size)
>> +{
>> +    memory_region_init_iommu_type(TYPE_IOMMU_MEMORY_REGION, iommumr,
>> +                                  owner, ops, name, size);
>>  }
>>
>>  static void memory_region_finalize(Object *obj)
>> @@ -1578,57 +1609,61 @@ bool memory_region_is_logging(MemoryRegion *mr,
>> uint8_t client)
>>      return memory_region_get_dirty_log_mask(mr) & (1 << client);
>>  }
>>
>> -static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
>> +static void memory_region_update_iommu_notify_flags(IOMMUMemoryRegion
>> *iommumr)
>>  {
>>      IOMMUNotifierFlag flags = IOMMU_NOTIFIER_NONE;
>>      IOMMUNotifier *iommu_notifier;
>>
>> -    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
>> +    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
>>          flags |= iommu_notifier->notifier_flags;
>>      }
>>
>> -    if (flags != mr->iommu_notify_flags &&
>> -        mr->iommu_ops->notify_flag_changed) {
>> -        mr->iommu_ops->notify_flag_changed(mr, mr->iommu_notify_flags,
>> -                                           flags);
>> +    if (flags != iommumr->iommu_notify_flags &&
>> +        iommumr->iommu_ops->notify_flag_changed) {
>> +        iommumr->iommu_ops->notify_flag_changed(iommumr,
>> +                                               
>> iommumr->iommu_notify_flags,
>> +                                                flags);
>>      }
>>
>> -    mr->iommu_notify_flags = flags;
>> +    iommumr->iommu_notify_flags = flags;
>>  }
>>
>>  void memory_region_register_iommu_notifier(MemoryRegion *mr,
>>                                             IOMMUNotifier *n)
>>  {
>> +    IOMMUMemoryRegion *iommumr;
>> +
>>      if (mr->alias) {
>>          memory_region_register_iommu_notifier(mr->alias, n);
>>          return;
>>      }
>>
>>      /* We need to register for at least one bitfield */
>> +    iommumr = IOMMU_MEMORY_REGION(mr);
>>      assert(n->notifier_flags != IOMMU_NOTIFIER_NONE);
>> -    QLIST_INSERT_HEAD(&mr->iommu_notify, n, node);
>> -    memory_region_update_iommu_notify_flags(mr);
>> +    QLIST_INSERT_HEAD(&iommumr->iommu_notify, n, node);
>> +    memory_region_update_iommu_notify_flags(iommumr);
>>  }
>>
>> -uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr)
>> +uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommumr)
>>  {
>> -    assert(memory_region_is_iommu(mr));
>> -    if (mr->iommu_ops && mr->iommu_ops->get_min_page_size) {
>> -        return mr->iommu_ops->get_min_page_size(mr);
>> +    if (iommumr->iommu_ops && iommumr->iommu_ops->get_min_page_size) {
>> +        return iommumr->iommu_ops->get_min_page_size(iommumr);
>>      }
>>      return TARGET_PAGE_SIZE;
>>  }
>>
>> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
>> +void memory_region_iommu_replay(IOMMUMemoryRegion *iommumr,
>> IOMMUNotifier *n,
>>                                  bool is_write)
>>  {
>> +    MemoryRegion *mr = MEMORY_REGION(iommumr);
>>      hwaddr addr, granularity;
>>      IOMMUTLBEntry iotlb;
>>
>> -    granularity = memory_region_iommu_get_min_page_size(mr);
>> +    granularity = memory_region_iommu_get_min_page_size(iommumr);
>>
>>      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
>> -        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
>> +        iotlb = iommumr->iommu_ops->translate(iommumr, addr, is_write);
>>          if (iotlb.perm != IOMMU_NONE) {
>>              n->notify(n, &iotlb);
>>          }
>> @@ -1644,21 +1679,24 @@ void memory_region_iommu_replay(MemoryRegion *mr,
>> IOMMUNotifier *n,
>>  void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
>>                                               IOMMUNotifier *n)
>>  {
>> +    IOMMUMemoryRegion *iommumr;
>> +
>>      if (mr->alias) {
>>          memory_region_unregister_iommu_notifier(mr->alias, n);
>>          return;
>>      }
>>      QLIST_REMOVE(n, node);
>> -    memory_region_update_iommu_notify_flags(mr);
>> +    iommumr = IOMMU_MEMORY_REGION(mr);
>> +    memory_region_update_iommu_notify_flags(iommumr);
>>  }
>>
>> -void memory_region_notify_iommu(MemoryRegion *mr,
>> +void memory_region_notify_iommu(IOMMUMemoryRegion *iommumr,
>>                                  IOMMUTLBEntry entry)
>>  {
>>      IOMMUNotifier *iommu_notifier;
>>      IOMMUNotifierFlag request_flags;
>>
>> -    assert(memory_region_is_iommu(mr));
>> +    assert(memory_region_is_iommu(MEMORY_REGION(iommumr)));
>>
>>      if (entry.perm & IOMMU_RW) {
>>          request_flags = IOMMU_NOTIFIER_MAP;
>> @@ -1666,7 +1704,7 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>>          request_flags = IOMMU_NOTIFIER_UNMAP;
>>      }
>>
>> -    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
>> +    QLIST_FOREACH(iommu_notifier, &iommumr->iommu_notify, node) {
>>          if (iommu_notifier->notifier_flags & request_flags) {
>>              iommu_notifier->notify(iommu_notifier, &entry);
>>          }
>> @@ -2660,9 +2698,17 @@ static const TypeInfo memory_region_info = {
>>      .instance_finalize  = memory_region_finalize,
>>  };
>>
>> +static const TypeInfo iommu_memory_region_info = {
>> +    .parent             = TYPE_MEMORY_REGION,
>> +    .name               = TYPE_IOMMU_MEMORY_REGION,
>> +    .instance_size      = sizeof(IOMMUMemoryRegion),
>> +    .instance_init      = iommu_memory_region_initfn,
>> +};
>> +
>>  static void memory_register_types(void)
>>  {
>>      type_register_static(&memory_region_info);
>> +    type_register_static(&iommu_memory_region_info);
>>  }
>>
>>  type_init(memory_register_types)
>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN
  2017-04-03  3:01   ` David Gibson
@ 2017-05-01 12:21     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-05-01 12:21 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc, Alex Williamson, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6948 bytes --]

On 03/04/17 13:01, David Gibson wrote:
> On Sat, Apr 01, 2017 at 11:37:40PM +1100, Alexey Kardashevskiy wrote:
>> This implements a notification for a new IOMMU group attached to
>> sPAPR's logical IO bus (LIOBN) to enable in-kernel TCE acceleration.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  include/hw/ppc/spapr.h        |  1 +
>>  include/hw/vfio/vfio-common.h |  2 ++
>>  hw/ppc/spapr_iommu.c          |  5 +++++
>>  hw/vfio/common.c              | 10 ++++++++++
>>  hw/vfio/spapr.c               | 31 +++++++++++++++++++++++++++++++
>>  hw/vfio/trace-events          |  1 +
>>  6 files changed, 50 insertions(+)
>>
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 6997ed7e98..8a1b32f89a 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -617,6 +617,7 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
>>                              uint32_t page_shift, uint64_t bus_offset,
>>                              uint32_t nb_table);
>>  void spapr_tce_table_disable(sPAPRTCETable *tcet);
>> +int spapr_tce_get_fd(sPAPRTCETable *tcet);
>>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio);
>>  
>>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 7a4135ae6f..b99f4af96e 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -175,6 +175,8 @@ extern const MemoryListener vfio_prereg_listener;
>>  int vfio_spapr_create_window(VFIOContainer *container,
>>                               MemoryRegionSection *section,
>>                               hwaddr *pgsize);
>> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd,
>> +                          IOMMUMemoryRegion *iommumr);
>>  int vfio_spapr_remove_window(VFIOContainer *container,
>>                               hwaddr offset_within_address_space);
>>  
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 5051110b9d..f7531a6408 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -171,6 +171,11 @@ static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>      }
>>  }
>>  
>> +int spapr_tce_get_fd(sPAPRTCETable *tcet)
>> +{
>> +    return tcet->fd;
>> +}
>> +
> 
> I don't think this actually abstracts anything worthwhile.  The caller
> needs the sPAPRTCETable definition anyway to use container_of(), so it
> might as well just grab the field directly.

So far @fd has only been accesses from hw/ppc/spapr_iommu.c and I'd like to
keep it that way ("encapsulation"?).


> 
>>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>>  {
>>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index e8188eb3d5..b94b29be15 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -440,6 +440,16 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>              goto fail;
>>          }
>>  
>> +#ifdef CONFIG_KVM
>> +        if (kvm_enabled()) {
>> +            VFIOGroup *group;
>> +
>> +            QLIST_FOREACH(group, &container->group_list, container_next) {
>> +                vfio_spapr_notify_kvm(vfio_kvm_device_fd, group->fd,
>> +                                      IOMMU_MEMORY_REGION(section->mr));
>> +            }
>> +        }
>> +#endif
>>          vfio_host_win_add(container, section->offset_within_address_space,
>>                            section->offset_within_address_space +
>>                            int128_get64(section->size) - 1, pgsize);
>> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
>> index 551870d46b..6410438e62 100644
>> --- a/hw/vfio/spapr.c
>> +++ b/hw/vfio/spapr.c
>> @@ -15,8 +15,12 @@
>>  
>>  #include "hw/vfio/vfio-common.h"
>>  #include "hw/hw.h"
>> +#include "hw/ppc/spapr.h"
>>  #include "qemu/error-report.h"
>>  #include "trace.h"
>> +#ifdef CONFIG_KVM
>> +#include "linux/kvm.h"
>> +#endif
>>  
>>  static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
>>  {
>> @@ -188,6 +192,33 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>      return 0;
>>  }
>>  
>> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd,
>> +                          IOMMUMemoryRegion *iommumr)
>> +{
>> +#ifdef CONFIG_KVM
>> +    struct kvm_vfio_spapr_tce param = {
>> +        .groupfd = groupfd,
>> +    };
>> +    struct kvm_device_attr attr = {
>> +        .group = KVM_DEV_VFIO_GROUP,
>> +        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
>> +        .addr = (uint64_t)(unsigned long)&param,
>> +    };
>> +    sPAPRTCETable *tcet = container_of(iommumr, sPAPRTCETable, iommu);
> 
> This isn't safe.  The caller has verified that the host backend IOMMU
> is sPAPR TCE, but you haven't verified that the *guest* IOMMU is TCE
> based.  I suspect other details would prevent a TCG x86 machine with
> VT-d running on a Power host from getting this far, but it's not good
> to rely on that.
> 
> So, you need to explicitly verify that the guest IOMMU region really
> is a PAPR TCE region.  The obvious way would be to continue your
> QOMification and make sPAPRTCETable a subtype of IOMMUMemoryRegion,
> rather than just including it by composition.

sPAPRTCETable is a device now, with 2 memory regions - one for entire 64bit
space and different sPAPRTCETable root MRs overlap, another one is the
actual IOMMU MR - may be QOM just this one, not the entire sPAPRTCETable
thingy?




> 
>> +
>> +    param.tablefd = spapr_tce_get_fd(tcet);
>> +    if (param.tablefd != -1) {
>> +        if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>> +            error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
>> +                         param.tablefd, param.groupfd, strerror(errno));
>> +            return -errno;
>> +        }
>> +    }
>> +    trace_vfio_spapr_notify_kvm(groupfd, param.tablefd);
>> +#endif
>> +    return 0;
>> +}
>> +
>>  int vfio_spapr_remove_window(VFIOContainer *container,
>>                               hwaddr offset_within_address_space)
>>  {
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 2561c6d31a..084a92f7c2 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
>>  vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>>  vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
>>  vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
>> +vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-05-01 12:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-01 12:37 [Qemu-devel] [RFC PATCH qemu v3 0/4] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 1/4] memory/iommu: QOM'fy IOMMU MemoryRegion Alexey Kardashevskiy
2017-04-03  2:26   ` David Gibson
2017-04-03 12:53   ` Philippe Mathieu-Daudé
2017-04-11  8:35     ` Alexey Kardashevskiy
2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 2/4] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
2017-04-03  2:27   ` David Gibson
2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 3/4] vfio/spapr: Add a notifier for PPC64 HV/PR KVM about new group attached to LIOBN Alexey Kardashevskiy
2017-04-03  3:01   ` David Gibson
2017-05-01 12:21     ` Alexey Kardashevskiy
2017-04-01 12:37 ` [Qemu-devel] [RFC PATCH qemu v3 4/4] spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device Alexey Kardashevskiy
2017-04-03  3:01   ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.