All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support)
@ 2015-07-10 10:43 Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask Alexey Kardashevskiy
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-10 10:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, David Gibson

Here are few patches to prepare an existing listener for handling memory
preregistration for SPAPR guests running on POWER8.

This used to be a part of DDW patchset but now is separated as requested.
I left versions in changelog of 5/5 for convenience.

Regarding 1/5, there is a question - in reality DMA windows are always
a lot bigger than a single 4K page and aligned to 32/64MB, may be only
use there qemu_real_host_page_mask?

Please comment. Thanks!


Alexey Kardashevskiy (5):
  vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  vfio: Skip PCI BARs in memory listener
  vfio: Store IOMMU type in container
  vfio: Refactor memory listener to accommodate more IOMMU types
  vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

 hw/vfio/common.c              | 164 ++++++++++++++++++++++++++++--------------
 hw/vfio/pci.c                 |  30 ++++----
 include/hw/vfio/vfio-common.h |   1 +
 trace-events                  |   2 +
 4 files changed, 129 insertions(+), 68 deletions(-)

-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  2015-07-10 10:43 [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
@ 2015-07-10 10:43 ` Alexey Kardashevskiy
  2015-07-13  6:15   ` David Gibson
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 2/5] vfio: Skip PCI BARs in memory listener Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-10 10:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, David Gibson

These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
a real host page size:
4e51361d7 "cpu-all: complete "real" host page size API" and
f7ceed190 "vfio: cpu: Use "real" page size API"

This finished the transition by:
- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
- removing bitfield length for offsets in VFIOQuirk::data as
qemu_real_host_page_mask is not a macro

This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
the minimum page size which IOMMU regions may be using and at the moment
memory regions do not carry the actual page size.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

In reality DMA windows are always a lot bigger than a single 4K page
and aligned to 32/64MB, may be only use here qemu_real_host_page_mask?

---
 hw/vfio/common.c | 26 ++++++++++++++++----------
 hw/vfio/pci.c    | 30 +++++++++++++++---------------
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85ee9b0..d115ec9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -321,6 +321,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
     Int128 llend;
     void *vaddr;
     int ret;
+    const bool is_iommu = memory_region_is_iommu(section->mr);
+    const hwaddr page_mask =
+        is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask;
 
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_add_skip(
@@ -330,16 +333,16 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+    if (unlikely((section->offset_within_address_space & ~page_mask) !=
+                 (section->offset_within_region & ~page_mask))) {
         error_report("%s received unaligned region", __func__);
         return;
     }
 
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    iova = ROUND_UP(section->offset_within_address_space, ~page_mask + 1);
     llend = int128_make64(section->offset_within_address_space);
     llend = int128_add(llend, section->size);
-    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+    llend = int128_and(llend, int128_exts64(page_mask));
 
     if (int128_ge(int128_make64(iova), llend)) {
         return;
@@ -347,7 +350,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     memory_region_ref(section->mr);
 
-    if (memory_region_is_iommu(section->mr)) {
+    if (is_iommu) {
         VFIOGuestIOMMU *giommu;
 
         trace_vfio_listener_region_add_iommu(iova,
@@ -423,6 +426,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
                                             iommu_data.type1.listener);
     hwaddr iova, end;
     int ret;
+    const bool is_iommu = memory_region_is_iommu(section->mr);
+    const hwaddr page_mask =
+        is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask;
 
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_del_skip(
@@ -432,13 +438,13 @@ static void vfio_listener_region_del(MemoryListener *listener,
         return;
     }
 
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+    if (unlikely((section->offset_within_address_space & ~page_mask) !=
+                 (section->offset_within_region & ~page_mask))) {
         error_report("%s received unaligned region", __func__);
         return;
     }
 
-    if (memory_region_is_iommu(section->mr)) {
+    if (is_iommu) {
         VFIOGuestIOMMU *giommu;
 
         QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
@@ -459,9 +465,9 @@ static void vfio_listener_region_del(MemoryListener *listener,
          */
     }
 
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
     end = (section->offset_within_address_space + int128_get64(section->size)) &
-          TARGET_PAGE_MASK;
+          page_mask;
 
     if (iova >= end) {
         return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2ed877f..7694afe 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -50,16 +50,16 @@ typedef struct VFIOQuirk {
     struct VFIOPCIDevice *vdev;
     QLIST_ENTRY(VFIOQuirk) next;
     struct {
-        uint32_t base_offset:TARGET_PAGE_BITS;
-        uint32_t address_offset:TARGET_PAGE_BITS;
+        uint32_t base_offset;
+        uint32_t address_offset;
         uint32_t address_size:3;
         uint32_t bar:3;
 
         uint32_t address_match;
         uint32_t address_mask;
 
-        uint32_t address_val:TARGET_PAGE_BITS;
-        uint32_t data_offset:TARGET_PAGE_BITS;
+        uint32_t address_val;
+        uint32_t data_offset;
         uint32_t data_size:3;
 
         uint8_t flags;
@@ -1319,8 +1319,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
 {
     VFIOQuirk *quirk = opaque;
     VFIOPCIDevice *vdev = quirk->vdev;
-    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
-    hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
+    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
+    hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask;
     uint64_t data;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
@@ -1349,8 +1349,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
 {
     VFIOQuirk *quirk = opaque;
     VFIOPCIDevice *vdev = quirk->vdev;
-    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
-    hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
+    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
+    hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.write_flags) &&
         ranges_overlap(addr, size, offset, quirk->data.address_mask + 1)) {
@@ -1650,9 +1650,9 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-ati-bar2-4000-quirk",
-                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
+                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
-                          quirk->data.address_match & TARGET_PAGE_MASK,
+                          quirk->data.address_match & qemu_real_host_page_mask,
                           &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1888,7 +1888,7 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
     VFIOQuirk *quirk = opaque;
     VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
-    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
+    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
 
     vfio_generic_quirk_write(opaque, addr, data, size);
 
@@ -1943,9 +1943,9 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
                           quirk, "vfio-nvidia-bar0-88000-quirk",
-                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
+                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
-                          quirk->data.address_match & TARGET_PAGE_MASK,
+                          quirk->data.address_match & qemu_real_host_page_mask,
                           &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1980,9 +1980,9 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-nvidia-bar0-1800-quirk",
-                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
+                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
-                          quirk->data.address_match & TARGET_PAGE_MASK,
+                          quirk->data.address_match & qemu_real_host_page_mask,
                           &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH qemu 2/5] vfio: Skip PCI BARs in memory listener
  2015-07-10 10:43 [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask Alexey Kardashevskiy
@ 2015-07-10 10:43 ` Alexey Kardashevskiy
  2015-07-13 17:08   ` Alex Williamson
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 3/5] vfio: Store IOMMU type in container Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-10 10:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, David Gibson

In some cases PCI BARs are registered as RAM via
memory_region_init_ram_ptr() and the vfio_memory_listener will be called
on them too. However DMA will not be performed to/from these regions so
just skip them.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d115ec9..225cdc7 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -248,7 +248,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
             * are never accessed by the CPU and beyond the address width of
             * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
             */
-           section->offset_within_address_space & (1ULL << 63);
+           section->offset_within_address_space & (1ULL << 63) ||
+           memory_region_is_skip_dump(section->mr);
 }
 
 static void vfio_iommu_map_notify(Notifier *n, void *data)
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH qemu 3/5] vfio: Store IOMMU type in container
  2015-07-10 10:43 [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 2/5] vfio: Skip PCI BARs in memory listener Alexey Kardashevskiy
@ 2015-07-10 10:43 ` Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 4/5] vfio: Refactor memory listener to accommodate more IOMMU types Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 5/5] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
  4 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-10 10:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, David Gibson

So far we were managing not to have an IOMMU type stored anywhere but
since we are going to implement different behavior for different IOMMU
types in the same memory listener, we need to know IOMMU type after
initialization.

This adds an IOMMU type into VFIOContainer and initializes it.
Since zero is not used for any type, no additional initialization is
necessarty for VFIOContainer::type.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c              | 7 ++++---
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 225cdc7..788c87a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -683,8 +683,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             goto free_container_exit;
         }
 
-        ret = ioctl(fd, VFIO_SET_IOMMU,
-                    v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU);
+        container->iommu_data.type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU;
+        ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_data.type);
         if (ret) {
             error_report("vfio: failed to set iommu for container: %m");
             ret = -errno;
@@ -712,7 +712,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+        container->iommu_data.type = VFIO_SPAPR_TCE_IOMMU;
+        ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_data.type);
         if (ret) {
             error_report("vfio: failed to set iommu for container: %m");
             ret = -errno;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 59a321d..0e96689 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -75,6 +75,7 @@ typedef struct VFIOContainer {
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     struct {
         /* enable abstraction to support various iommu backends */
+        unsigned type;
         union {
             VFIOType1 type1;
         };
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH qemu 4/5] vfio: Refactor memory listener to accommodate more IOMMU types
  2015-07-10 10:43 [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 3/5] vfio: Store IOMMU type in container Alexey Kardashevskiy
@ 2015-07-10 10:43 ` Alexey Kardashevskiy
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 5/5] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy
  4 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-10 10:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, David Gibson

The vfio_memory_listener is registered for PCI address space. On Type1
IOMMU that falls back to @address_space_memory and the listener is
called on RAM blocks. On sPAPR IOMMU is guest visible and the listener
is called on DMA windows. Therefore Type1 IOMMU only handled RAM regions
and sPAPR IOMMU only handled IOMMU regions.

With the memory preregistration, there is need to handle RAM regions in
the listener for sPAPR IOMMU too. To make next patches simpler/nicer,
this moves DMA mapping/unmapping code to a case in a switch.

This should cause no change in behavior.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c | 63 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 788c87a..77b5ab0 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -397,26 +397,36 @@ static void vfio_listener_region_add(MemoryListener *listener,
             section->offset_within_region +
             (iova - section->offset_within_address_space);
 
-    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+    switch (container->iommu_data.type) {
+    case VFIO_TYPE1_IOMMU:
+    case VFIO_TYPE1v2_IOMMU:
+        trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
 
-    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, end - iova, vaddr, ret);
+        ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iova, end - iova, vaddr, ret);
+            goto error_exit;
+        }
+        break;
+    }
+
+    return;
+
+error_exit:
 
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
+    /*
+     * On the initfn path, store the first error in the container so we
+     * can gracefully fail.  Runtime, there's not much we can do other
+     * than throw a hardware error.
+     */
+    if (!container->iommu_data.type1.initialized) {
+        if (!container->iommu_data.type1.error) {
+            container->iommu_data.type1.error = ret;
         }
+    } else {
+        hw_error("vfio: DMA mapping failed, unable to continue");
     }
 }
 
@@ -474,14 +484,19 @@ static void vfio_listener_region_del(MemoryListener *listener,
         return;
     }
 
-    trace_vfio_listener_region_del(iova, end - 1);
+    switch (container->iommu_data.type) {
+    case VFIO_TYPE1_IOMMU:
+    case VFIO_TYPE1v2_IOMMU:
+        trace_vfio_listener_region_del(iova, end - 1);
 
-    ret = vfio_dma_unmap(container, iova, end - iova);
-    memory_region_unref(section->mr);
-    if (ret) {
-        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx") = %d (%m)",
-                     container, iova, end - iova, ret);
+        ret = vfio_dma_unmap(container, iova, end - iova);
+        memory_region_unref(section->mr);
+        if (ret) {
+            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iova, end - iova, ret);
+        }
+        break;
     }
 }
 
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH qemu 5/5] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)
  2015-07-10 10:43 [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 4/5] vfio: Refactor memory listener to accommodate more IOMMU types Alexey Kardashevskiy
@ 2015-07-10 10:43 ` Alexey Kardashevskiy
  4 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-10 10:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc, David Gibson

This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a guest RAM memory listener which notifies a VFIO container
about memory which needs to be pinned/unpinned. VFIO MMIO regions
(i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This does not change the guest visible interface.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
Changes:
v11:
* merged register_listener into vfio_memory_listener

v9:
* since there is no more SPAPR-specific data in container::iommu_data,
the memory preregistration fields are common and potentially can be used
by other architectures

v7:
* in vfio_spapr_ram_listener_region_del(), do unref() after ioctl()
* s'ramlistener'register_listener'

v6:
* fixed commit log (s/guest/userspace/), added note about no guest visible
change
* fixed error checking if ram registration failed
* added alignment check for section->offset_within_region

v5:
* simplified the patch
* added trace points
* added round_up() for the size
* SPAPR IOMMU v2 used
---
 hw/vfio/common.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++---------
 trace-events     |  2 ++
 2 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 77b5ab0..8f90c23 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -410,6 +410,19 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto error_exit;
         }
         break;
+
+    case VFIO_SPAPR_TCE_v2_IOMMU: {
+        struct vfio_iommu_spapr_register_memory reg = {
+            .argsz = sizeof(reg),
+            .flags = 0,
+            .vaddr = (uint64_t) vaddr,
+            .size = end - iova
+        };
+
+        ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_REGISTER_MEMORY, &reg);
+        trace_vfio_ram_register(reg.vaddr, reg.size, ret ? -errno : 0);
+        break;
+    }
     }
 
     return;
@@ -497,6 +510,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
                          container, iova, end - iova, ret);
         }
         break;
+
+    case VFIO_SPAPR_TCE_v2_IOMMU: {
+        void *vaddr = memory_region_get_ram_ptr(section->mr) +
+                section->offset_within_region +
+                (iova - section->offset_within_address_space);
+        struct vfio_iommu_spapr_register_memory reg = {
+            .argsz = sizeof(reg),
+            .flags = 0,
+            .vaddr = (uint64_t) vaddr,
+            .size = end - iova
+        };
+
+        ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY, &reg);
+        trace_vfio_ram_unregister(reg.vaddr, reg.size, ret ? -errno : 0);
+        break;
+    }
     }
 }
 
@@ -720,14 +749,18 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
 
         container->iommu_data.type1.initialized = true;
 
-    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
+               ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
+        bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU);
+
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
             error_report("vfio: failed to set group container: %m");
             ret = -errno;
             goto free_container_exit;
         }
-        container->iommu_data.type = VFIO_SPAPR_TCE_IOMMU;
+        container->iommu_data.type =
+            v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
         ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_data.type);
         if (ret) {
             error_report("vfio: failed to set iommu for container: %m");
@@ -740,18 +773,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
          * when container fd is closed so we do not call it explicitly
          * in this file.
          */
-        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-        if (ret) {
-            error_report("vfio: failed to enable container: %m");
-            ret = -errno;
-            goto free_container_exit;
+        if (!v2) {
+            ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+            if (ret) {
+                error_report("vfio: failed to enable container: %m");
+                ret = -errno;
+                goto free_container_exit;
+            }
         }
 
         container->iommu_data.type1.listener = vfio_memory_listener;
+        memory_listener_register(&container->iommu_data.type1.listener, NULL);
+
         container->iommu_data.release = vfio_listener_release;
+        if (container->iommu_data.type1.error) {
+            error_report("vfio: RAM memory listener initialization failed for container");
+            goto listener_release_exit;
+        }
 
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
+        container->iommu_data.type1.initialized = true;
 
     } else {
         error_report("vfio: No available IOMMU models");
diff --git a/trace-events b/trace-events
index d24d80a..f859ad0 100644
--- a/trace-events
+++ b/trace-events
@@ -1582,6 +1582,8 @@ vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
+vfio_ram_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
+vfio_ram_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
 
 # hw/vfio/platform.c
 vfio_platform_populate_regions(int region_index, unsigned long flag, unsigned long size, int fd, unsigned long offset) "- region %d flags = 0x%lx, size = 0x%lx, fd= %d, offset = 0x%lx"
-- 
2.4.0.rc3.8.gfb3e7d5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask Alexey Kardashevskiy
@ 2015-07-13  6:15   ` David Gibson
  2015-07-13  7:24     ` Alexey Kardashevskiy
  2015-07-13 20:32     ` Peter Crosthwaite
  0 siblings, 2 replies; 12+ messages in thread
From: David Gibson @ 2015-07-13  6:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Alex Williamson, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
> a real host page size:
> 4e51361d7 "cpu-all: complete "real" host page size API" and
> f7ceed190 "vfio: cpu: Use "real" page size API"
> 
> This finished the transition by:
> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
> - removing bitfield length for offsets in VFIOQuirk::data as
> qemu_real_host_page_mask is not a macro

This does not make much sense to me.  f7ceed190 moved to
REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
only on the host page size.

Here you're applying a blanket change to vfio code, in particular to
the DMA handling code, and for DMA both the host and target page size
can be relevant, depending on the details of the IOMMU implementation.

> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
> the minimum page size which IOMMU regions may be using and at the moment
> memory regions do not carry the actual page size.

And this exception also doesn't make much sense to me.  Partly it's
confusing because the listener is doing different things depending on
whether we have a guest visible IOMMU or not.

In short, there doesn't seem to be a coherent explanation here of
where the page size / alignment restriction is coming from, and
therefore whether it needs to be a host page alignment, a guest page
alignment, or both.

> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> In reality DMA windows are always a lot bigger than a single 4K page
> and aligned to 32/64MB, may be only use here
> qemu_real_host_page_mask?

I don't understand this question either.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  2015-07-13  6:15   ` David Gibson
@ 2015-07-13  7:24     ` Alexey Kardashevskiy
  2015-07-14  3:46       ` David Gibson
  2015-07-13 20:32     ` Peter Crosthwaite
  1 sibling, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-13  7:24 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, qemu-devel

On 07/13/2015 04:15 PM, David Gibson wrote:
> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>> a real host page size:
>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>
>> This finished the transition by:
>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>> - removing bitfield length for offsets in VFIOQuirk::data as
>> qemu_real_host_page_mask is not a macro
>
> This does not make much sense to me.  f7ceed190 moved to
> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
> only on the host page size.
>
> Here you're applying a blanket change to vfio code, in particular to
> the DMA handling code, and for DMA both the host and target page size
> can be relevant, depending on the details of the IOMMU implementation.


5/5 uses this listener for memory preregistration - this totally depends on 
the host page size. It was suggested to make this listener do memory 
preregistration too, not just DMA.


>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
>> the minimum page size which IOMMU regions may be using and at the moment
>> memory regions do not carry the actual page size.
>
> And this exception also doesn't make much sense to me.  Partly it's
> confusing because the listener is doing different things depending on
> whether we have a guest visible IOMMU or not.

Yes...

> In short, there doesn't seem to be a coherent explanation here of
> where the page size / alignment restriction is coming from, and
> therefore whether it needs to be a host page alignment, a guest page
> alignment, or both.
>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> In reality DMA windows are always a lot bigger than a single 4K page
>> and aligned to 32/64MB, may be only use here
>> qemu_real_host_page_mask?
>
> I don't understand this question either.

The listener is called on RAM regions and DMA windows. If it is a RAM 
region, then host page size applies. If it is a DMA window - then 4K. So 
the same listener has to use different page size in different situation to 
check for alignments. But in reality everything will be aligned to 
megabytes or so. So we could enforce 64K alignment for DMA windows, for 
example, and make code simpler.




-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 2/5] vfio: Skip PCI BARs in memory listener
  2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 2/5] vfio: Skip PCI BARs in memory listener Alexey Kardashevskiy
@ 2015-07-13 17:08   ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2015-07-13 17:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, David Gibson

On Fri, 2015-07-10 at 20:43 +1000, Alexey Kardashevskiy wrote:
> In some cases PCI BARs are registered as RAM via
> memory_region_init_ram_ptr() and the vfio_memory_listener will be called
> on them too. However DMA will not be performed to/from these regions so
> just skip them.


Who says?  What about peer-to-peer DMA?  We have all sorts of FUD about
whether the hardware handles this correctly, but mapping PCI MMIO BARs
into the IOMMU allows the possibility of supporting p2p DMA between
devices in the guest.  Thanks,

Alex


> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/vfio/common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index d115ec9..225cdc7 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -248,7 +248,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>              * are never accessed by the CPU and beyond the address width of
>              * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
>              */
> -           section->offset_within_address_space & (1ULL << 63);
> +           section->offset_within_address_space & (1ULL << 63) ||
> +           memory_region_is_skip_dump(section->mr);
>  }
>  
>  static void vfio_iommu_map_notify(Notifier *n, void *data)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  2015-07-13  6:15   ` David Gibson
  2015-07-13  7:24     ` Alexey Kardashevskiy
@ 2015-07-13 20:32     ` Peter Crosthwaite
  2015-07-14  7:08       ` Alexey Kardashevskiy
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Crosthwaite @ 2015-07-13 20:32 UTC (permalink / raw)
  To: David Gibson
  Cc: Alexey Kardashevskiy, Alex Williamson, qemu-ppc Mailing List,
	qemu-devel@nongnu.org Developers

On Sun, Jul 12, 2015 at 11:15 PM, David Gibson
<david@gibson.dropbear.id.au> wrote:
> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>> a real host page size:
>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>
>> This finished the transition by:
>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>> - removing bitfield length for offsets in VFIOQuirk::data as
>> qemu_real_host_page_mask is not a macro
>
> This does not make much sense to me.  f7ceed190 moved to
> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
> only on the host page size.
>
> Here you're applying a blanket change to vfio code, in particular to
> the DMA handling code, and for DMA both the host and target page size
> can be relevant, depending on the details of the IOMMU implementation.
>

So the multi-arch work (for which f7ceed190 preps) does have a problem
that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN
do need to go away from common code, or this has to be promoted to
cpu-specific code. Consequently, the page size is fixed to 4K for
multi-arch and this is not a good long-term limitation. Is the IOMMU
page size really tied to the CPU implementation? In practice this is
going to be the case, but IOMMU and CPU should be decoupleable.

If vfio needs to respect a particular (or all?) CPUs/IOMMUs page
alignment then can we virtualise this as data rather than a macroified
constant?

uint64_t  page_align = 0;

CPU_FOREACH(cpu, ...) {
    CPUClass *cc = CPU_GET_CLASS(cpu);

    page_align = MAX(page_align, cc->page_size);
}

/* This is a little more made up ... */
IOMMU_FOREACH(iommu, ...) {
    page_align = MAX(page_align, iommu->page_size);
}

Regards,
Peter

>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
>> the minimum page size which IOMMU regions may be using and at the moment
>> memory regions do not carry the actual page size.
>
> And this exception also doesn't make much sense to me.  Partly it's
> confusing because the listener is doing different things depending on
> whether we have a guest visible IOMMU or not.
>
> In short, there doesn't seem to be a coherent explanation here of
> where the page size / alignment restriction is coming from, and
> therefore whether it needs to be a host page alignment, a guest page
> alignment, or both.
>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> In reality DMA windows are always a lot bigger than a single 4K page
>> and aligned to 32/64MB, may be only use here
>> qemu_real_host_page_mask?
>
> I don't understand this question either.
>
> --
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  2015-07-13  7:24     ` Alexey Kardashevskiy
@ 2015-07-14  3:46       ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2015-07-14  3:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Alex Williamson, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3189 bytes --]

On Mon, Jul 13, 2015 at 05:24:17PM +1000, Alexey Kardashevskiy wrote:
> On 07/13/2015 04:15 PM, David Gibson wrote:
> >On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
> >>These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
> >>a real host page size:
> >>4e51361d7 "cpu-all: complete "real" host page size API" and
> >>f7ceed190 "vfio: cpu: Use "real" page size API"
> >>
> >>This finished the transition by:
> >>- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
> >>- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
> >>- removing bitfield length for offsets in VFIOQuirk::data as
> >>qemu_real_host_page_mask is not a macro
> >
> >This does not make much sense to me.  f7ceed190 moved to
> >REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
> >only on the host page size.
> >
> >Here you're applying a blanket change to vfio code, in particular to
> >the DMA handling code, and for DMA both the host and target page size
> >can be relevant, depending on the details of the IOMMU implementation.
> 
> 
> 5/5 uses this listener for memory preregistration - this totally depends on
> the host page size. It was suggested to make this listener do memory
> preregistration too, not just DMA.
> 
> 
> >>This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
> >>the minimum page size which IOMMU regions may be using and at the moment
> >>memory regions do not carry the actual page size.
> >
> >And this exception also doesn't make much sense to me.  Partly it's
> >confusing because the listener is doing different things depending on
> >whether we have a guest visible IOMMU or not.
> 
> Yes...
> 
> >In short, there doesn't seem to be a coherent explanation here of
> >where the page size / alignment restriction is coming from, and
> >therefore whether it needs to be a host page alignment, a guest page
> >alignment, or both.
> >
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>---
> >>
> >>In reality DMA windows are always a lot bigger than a single 4K page
> >>and aligned to 32/64MB, may be only use here
> >>qemu_real_host_page_mask?
> >
> >I don't understand this question either.
> 
> The listener is called on RAM regions and DMA windows. If it is a RAM
> region, then host page size applies. If it is a DMA window - then
> 4K.

That might be true in the particular cases you're thinking about, but
you've got to think more generally if you're going to have coherent
semantics in the core code here.

For preregistration, host page size applies.

For auto-mapping of RAM regions (as on x86), host IOMMU page size
applies (which is probably the same as host page size, but it doesn't
theoretically have to be).  Guest page size kind of implicitly
applies, since added RAM regions generally have to be target page
aligned anyway.

For guest controlled mapping, you're contrained by both the host iommu
page size and the guest iommu page size.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
  2015-07-13 20:32     ` Peter Crosthwaite
@ 2015-07-14  7:08       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-14  7:08 UTC (permalink / raw)
  To: Peter Crosthwaite, David Gibson
  Cc: Alex Williamson, qemu-ppc Mailing List, qemu-devel@nongnu.org Developers

On 07/14/2015 06:32 AM, Peter Crosthwaite wrote:
> On Sun, Jul 12, 2015 at 11:15 PM, David Gibson
> <david@gibson.dropbear.id.au> wrote:
>> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
>>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>>> a real host page size:
>>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>>
>>> This finished the transition by:
>>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>>> - removing bitfield length for offsets in VFIOQuirk::data as
>>> qemu_real_host_page_mask is not a macro
>>
>> This does not make much sense to me.  f7ceed190 moved to
>> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
>> only on the host page size.
>>
>> Here you're applying a blanket change to vfio code, in particular to
>> the DMA handling code, and for DMA both the host and target page size
>> can be relevant, depending on the details of the IOMMU implementation.
>>
>
> So the multi-arch work (for which f7ceed190 preps) does have a problem
> that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN
> do need to go away from common code, or this has to be promoted to
> cpu-specific code. Consequently, the page size is fixed to 4K for
> multi-arch and this is not a good long-term limitation. Is the IOMMU
> page size really tied to the CPU implementation? In practice this is
> going to be the case, but IOMMU and CPU should be decoupleable.
>
> If vfio needs to respect a particular (or all?) CPUs/IOMMUs page
> alignment then can we virtualise this as data rather than a macroified
> constant?
>
> uint64_t  page_align = 0;
>
> CPU_FOREACH(cpu, ...) {
>      CPUClass *cc = CPU_GET_CLASS(cpu);
>
>      page_align = MAX(page_align, cc->page_size);
> }
>
> /* This is a little more made up ... */
> IOMMU_FOREACH(iommu, ...) {
>      page_align = MAX(page_align, iommu->page_size);
> }

This assumes that IOMMU has a constant page size, is this always true?
Cannot it be a (contigous?) set of different size chunks, for example, one 
per memory dimm?


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-07-14  7:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 10:43 [Qemu-devel] [PATCH qemu 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support) Alexey Kardashevskiy
2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask Alexey Kardashevskiy
2015-07-13  6:15   ` David Gibson
2015-07-13  7:24     ` Alexey Kardashevskiy
2015-07-14  3:46       ` David Gibson
2015-07-13 20:32     ` Peter Crosthwaite
2015-07-14  7:08       ` Alexey Kardashevskiy
2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 2/5] vfio: Skip PCI BARs in memory listener Alexey Kardashevskiy
2015-07-13 17:08   ` Alex Williamson
2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 3/5] vfio: Store IOMMU type in container Alexey Kardashevskiy
2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 4/5] vfio: Refactor memory listener to accommodate more IOMMU types Alexey Kardashevskiy
2015-07-10 10:43 ` [Qemu-devel] [PATCH qemu 5/5] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.