All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt
@ 2016-10-06  9:50 Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 1/8] linux-headers: Partial update for MSI IOVA handling Eric Auger
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

On ARM, MSI transactions emitted by passthrough'ed devices are translated
by the IOMMU.  So the host must allocate IOVAs and map them to the host
MSI frame physical addresses. Those IOVAs must be allocated within safe
GPA slots, unused by the guest.

The QEMU VFIO device retrieves the size if the IOVA window needed by the
host using a new VFIO IOMMU type capability chain API. This window is
allocated on guest address space withing the platform bus memory container.
This latter acts as a pool of usable GPA and comes with its own GPA allocator.
The memory region is tagged as "reserved_iova". The vfio_listener_region_add
callback is in charge of passing the window characteristics to the kernel
through an extended VFIO_IOMMU_MAP_DMA ioctl.

Best Regards

Eric

Dependencies:
The series depends on the not yet upstream kernel series:
[PATCH v13 00/15] KVM PCIe/MSI passthrough on ARM/ARM64
http://www.spinics.net/lists/arm-kernel/msg535168.html

Git:
https://github.com/eauger/qemu/tree/v2.7.0-vITS-v7-passthrough-rfc-v3

History:

RFCv2 -> RFC v3:
- IOVA aperture size is not arbitrary anymore. It is retrieved from the host
  usig VFIO IOMMU type capability chain API
- GPEX related patches removed since the warning is not seen anymore

RFC v1 -> RFC v2:
- now uses platform bus MMIO for mapping reserved IOVA region; hence the
  new patch file:
  "hw: platform-bus: enable to map any memory region onto the platform-bus"



Eric Auger (8):
  linux-headers: Partial update for MSI IOVA handling
  hw: vfio: common: vfio_get_iommu_type1_info
  hw: vfio: common: Introduce vfio_register_msi_iova
  memory: Add reserved_iova region type
  memory: memory_region_find_by_name
  hw: platform-bus: Enable to map any memory region onto the
    platform-bus
  hw: vfio: common: vfio_prepare_msi_mapping
  hw: vfio: common: Adapt vfio_listeners for reserved_iova region

 hw/core/platform-bus.c     |  27 ++++---
 hw/vfio/common.c           | 175 +++++++++++++++++++++++++++++++++++++++------
 include/exec/memory.h      |  40 +++++++++++
 include/hw/platform-bus.h  |   7 ++
 linux-headers/linux/vfio.h |  48 +++++++++++--
 memory.c                   |  27 +++++++
 6 files changed, 288 insertions(+), 36 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 1/8] linux-headers: Partial update for MSI IOVA handling
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 2/8] hw: vfio: common: vfio_get_iommu_type1_info Eric Auger
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

This is a partial update aiming at enhancing the VFIO user API
with IOMMU info capability chain, msi_geometry reporting and
MSI IOVA window registration.

The kernel code is not yet upstreamed. It is available at
https://github.com/eauger/linux/tree/generic-v7-passthrough-v13
([PATCH v13 00/15] KVM PCIe/MSI passthrough on ARM/ARM64)

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- features VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY
---
 linux-headers/linux/vfio.h | 48 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..8dae013 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -8,8 +8,8 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#ifndef VFIO_H
-#define VFIO_H
+#ifndef _UAPIVFIO_H
+#define _UAPIVFIO_H
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
@@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
-	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
+	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
+	__u32	__resv;
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ *   whose boundaries are given by [@aperture_start, @aperture_end].
+ *   this is typically the case on x86 host. The userspace is not allowed
+ *   to map userspace memory at IOVAs intersecting this range using
+ *   VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ *   in that case the userspace must provide an IOVA window characterized by
+ *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+	struct vfio_info_cap_header header;
+	__u32 flags;
+#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
+	/* not reserved */
+	__u32 order; /* iommu page order used for aperture alignment*/
+	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
+	/* reserved */
+	__u64 aperture_start;
+	__u64 aperture_end;
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -498,12 +526,21 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -519,7 +556,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
@@ -688,4 +726,4 @@ struct vfio_iommu_spapr_tce_remove {
 
 /* ***************************************************************** */
 
-#endif /* VFIO_H */
+#endif /* _UAPIVFIO_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 2/8] hw: vfio: common: vfio_get_iommu_type1_info
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 1/8] linux-headers: Partial update for MSI IOVA handling Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 3/8] hw: vfio: common: Introduce vfio_register_msi_iova Eric Auger
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

Introduce vfio_get_iommu_type1_info helper that allows to handle
variable size vfio_iommu_type1_info allocation with capability
chain support.

Besides, fixes a checkpatch warning on vfio_host_win_add's call.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/vfio/common.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 29188a1..4f4014e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -900,6 +900,27 @@ static void vfio_put_address_space(VFIOAddressSpace *space)
     }
 }
 
+static int vfio_get_iommu_type1_info(int fd,
+                                     struct vfio_iommu_type1_info **pinfo)
+{
+    size_t argsz = sizeof(struct vfio_iommu_type1_info);
+
+    *pinfo = g_malloc0(argsz);
+retry:
+    (*pinfo)->argsz =  argsz;
+
+    if (ioctl(fd, VFIO_IOMMU_GET_INFO, *pinfo)) {
+        return -errno;
+    }
+    if ((*pinfo)->argsz > argsz) {
+        argsz = (*pinfo)->argsz;
+        *pinfo = g_realloc(*pinfo, argsz);
+        goto retry;
+    }
+    return 0;
+}
+
+
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
 {
     VFIOContainer *container;
@@ -937,7 +958,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
     if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
         ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
         bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
-        struct vfio_iommu_type1_info info;
+        struct vfio_iommu_type1_info *pinfo;
 
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -961,14 +982,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
          * existing Type1 IOMMUs generally support any IOVA we're
          * going to actually try in practice.
          */
-        info.argsz = sizeof(info);
-        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
+        vfio_get_iommu_type1_info(fd, &pinfo);
         /* Ignore errors */
-        if (ret || !(info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+        if (ret || !(pinfo->flags & VFIO_IOMMU_INFO_PGSIZES)) {
             /* Assume 4k IOVA page size */
-            info.iova_pgsizes = 4096;
+            pinfo->iova_pgsizes = 4096;
         }
-        vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
+        vfio_host_win_add(container, 0, (hwaddr)(-1), pinfo->iova_pgsizes);
+        g_free(pinfo);
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
                ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
         struct vfio_iommu_spapr_tce_info info;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 3/8] hw: vfio: common: Introduce vfio_register_msi_iova
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 1/8] linux-headers: Partial update for MSI IOVA handling Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 2/8] hw: vfio: common: vfio_get_iommu_type1_info Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 4/8] memory: Add reserved_iova region type Eric Auger
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

vfio_register_msi_iova allows to register the MSI IOVA region.
This IOVA window will be used by the kernel to map MSI doorbells.

The function will become static in subsequent patches. However, since
there is no user yet, the compiler argues; the function is currently
not static and a dummy declaration needs to be added.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- rename vfio_register_reserved_iova into vfio_register_msi_iova
- VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA renamed into
  VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
---
 hw/vfio/common.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4f4014e..fe8a855 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -212,6 +212,34 @@ static int vfio_dma_unmap(VFIOContainer *container,
     return 0;
 }
 
+/**
+ * vfio_register_msi_iova: registers the MSI iova region
+ *
+ * @container: container handle
+ * @iova: base IOVA of the MSI region
+ * @size: size of the MSI IOVA region
+ */
+int vfio_register_msi_iova(VFIOContainer *container, hwaddr iova,
+                           ram_addr_t size);
+int vfio_register_msi_iova(VFIOContainer *container, hwaddr iova,
+                           ram_addr_t size)
+{
+    int ret;
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA,
+        .iova = iova,
+        .size = size,
+    };
+
+    ret = ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map);
+
+    if (ret) {
+        error_report("VFIO_MAP_DMA/RESERVED_MSI_IOVA: %m");
+    }
+    return ret;
+}
+
 static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
                         ram_addr_t size, void *vaddr, bool readonly)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 4/8] memory: Add reserved_iova region type
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (2 preceding siblings ...)
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 3/8] hw: vfio: common: Introduce vfio_register_msi_iova Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 5/8] memory: memory_region_find_by_name Eric Auger
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

Introduce a new reserved_iova region type. This type of iova region
is bound to be used by the kernel to map some host physical addresses
(typically MSI frames).

A new initializer, memory_region_init_reserved_iova is introduced, as
well as a test function, memory_region_is_reserved_iova.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/exec/memory.h | 29 +++++++++++++++++++++++++++++
 memory.c              | 11 +++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 10d7eac..f97b1f4 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -191,6 +191,7 @@ struct MemoryRegion {
     /* The following fields should fit in a cache line */
     bool romd_mode;
     bool ram;
+    bool reserved_iova;
     bool subpage;
     bool readonly; /* For RAM regions */
     bool rom_device;
@@ -385,6 +386,21 @@ void memory_region_init_ram(MemoryRegion *mr,
                             Error **errp);
 
 /**
+ * memory_region_init_reserved_iova:  Initialize reserved iova memory region
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @name: the name of the region.
+ * @size: size of the region.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+                                      struct Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      Error **errp);
+
+/**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
  *                                     RAM.  Accesses into the region will
  *                                     modify memory directly.  Only an initial
@@ -573,6 +589,19 @@ static inline bool memory_region_is_ram(MemoryRegion *mr)
 }
 
 /**
+ * memory_region_is_reserved_iova: check whether a memory region corresponds to
+   reserved iova
+ *
+ * Returns %true is a memory region is reserved iova
+ *
+ * @mr: the memory region being queried
+ */
+static inline bool memory_region_is_reserved_iova(MemoryRegion *mr)
+{
+    return mr->reserved_iova;
+}
+
+/**
  * memory_region_is_skip_dump: check whether a memory region should not be
  *                             dumped
  *
diff --git a/memory.c b/memory.c
index 58f9269..00a0ebe 100644
--- a/memory.c
+++ b/memory.c
@@ -1309,6 +1309,17 @@ void memory_region_init_ram(MemoryRegion *mr,
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+                                      Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      Error **errp)
+{
+    memory_region_init(mr, owner, name, size);
+    mr->reserved_iova = true;
+    mr->terminates = true;
+}
+
 void memory_region_init_resizeable_ram(MemoryRegion *mr,
                                        Object *owner,
                                        const char *name,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 5/8] memory: memory_region_find_by_name
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (3 preceding siblings ...)
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 4/8] memory: Add reserved_iova region type Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 6/8] hw: platform-bus: Enable to map any memory region onto the platform-bus Eric Auger
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

This new helper makes possible to search for a MemoryRegion matching
a given name within a root MemoryRegion.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/exec/memory.h | 11 +++++++++++
 memory.c              | 16 ++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f97b1f4..f62e5b5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1217,6 +1217,17 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
                                        hwaddr addr, uint64_t size);
 
 /**
+ * memory_region_find_by_name: Locates the first #MemoryRegion within @mr
+ * whose name matches @name
+ *
+ * @mr: the root MemoryRegion
+ * @name: name of the target MemoryRegion
+ *
+ * Returns the matched memory region or NULL
+ */
+MemoryRegion *memory_region_find_by_name(MemoryRegion *mr, const char *name);
+
+/**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
  *
  * Synchronizes the dirty page log for all address spaces.
diff --git a/memory.c b/memory.c
index 00a0ebe..3701b4f 100644
--- a/memory.c
+++ b/memory.c
@@ -2166,6 +2166,22 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
     return ret;
 }
 
+MemoryRegion *memory_region_find_by_name(MemoryRegion *root,
+                                         const char *name)
+{
+    MemoryRegion *other;
+
+    QTAILQ_FOREACH(other, &root->subregions, subregions_link) {
+        if (!strcmp(other->name, name)) {
+            memory_region_ref(other);
+            return other;
+        } else {
+            memory_region_find_by_name(other, name);
+        }
+    }
+    return NULL;
+}
+
 bool memory_region_present(MemoryRegion *container, hwaddr addr)
 {
     MemoryRegion *mr;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 6/8] hw: platform-bus: Enable to map any memory region onto the platform-bus
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (4 preceding siblings ...)
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 5/8] memory: memory_region_find_by_name Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 7/8] hw: vfio: common: vfio_prepare_msi_mapping Eric Auger
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 8/8] hw: vfio: common: Adapt vfio_listeners for reserved_iova region Eric Auger
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

The platform bus is currently used to map dynamically instantiable
platform device MMIO regions. The platform bus also can be seen as a
pool of free guest physical addresses. We would like to use that pool
to allocate a contiguous reserved IOVA region usable for MSI message
address IOMMU mapping.

This patch introduces platform_bus_map_region which enables to map any
memory region onto the platform bus.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

---

v2 -> v3:
include qapi/error.h
---
 hw/core/platform-bus.c    | 27 +++++++++++++++++----------
 include/hw/platform-bus.h |  7 +++++++
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c
index 329ac67..3fb6f6f 100644
--- a/hw/core/platform-bus.c
+++ b/hw/core/platform-bus.c
@@ -24,6 +24,7 @@
 #include "exec/address-spaces.h"
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
+#include "qapi/error.h"
 
 
 /*
@@ -127,16 +128,14 @@ static void platform_bus_map_irq(PlatformBusDevice *pbus, SysBusDevice *sbdev,
     sysbus_connect_irq(sbdev, n, pbus->irqs[irqn]);
 }
 
-static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
-                                  int n)
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr)
 {
-    MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
-    uint64_t size = memory_region_size(sbdev_mr);
+    uint64_t size = memory_region_size(mr);
     uint64_t alignment = (1ULL << (63 - clz64(size + size - 1)));
     uint64_t off;
     bool found_region = false;
 
-    if (memory_region_is_mapped(sbdev_mr)) {
+    if (memory_region_is_mapped(mr)) {
         /* Region is already mapped, nothing to do */
         return;
     }
@@ -153,13 +152,21 @@ static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
     }
 
     if (!found_region) {
-        error_report("Platform Bus: Can not fit MMIO region of size %"PRIx64,
-                     size);
-        exit(1);
+        error_setg(&error_fatal,
+                   "Platform Bus: Can not fit region %s of size %"PRIx64,
+                   mr->name, size);
     }
 
-    /* Map the device's region into our Platform Bus MMIO space */
-    memory_region_add_subregion(&pbus->mmio, off, sbdev_mr);
+    /* Map the region into our Platform Bus MMIO space */
+    memory_region_add_subregion(&pbus->mmio, off, mr);
+}
+
+static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
+                                  int n)
+{
+    MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
+
+    platform_bus_map_region(pbus, sbdev_mr);
 }
 
 /*
diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
index a00775c..6d3a664 100644
--- a/include/hw/platform-bus.h
+++ b/include/hw/platform-bus.h
@@ -54,4 +54,11 @@ int platform_bus_get_irqn(PlatformBusDevice *platform_bus, SysBusDevice *sbdev,
 hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice *sbdev,
                                   int n);
 
+/**
+ * platform_bus_map_region: map a MemoryRegion into the platform bus
+ * @pbus: platform bus handle
+ * @mr: memory region handle
+ */
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr);
+
 #endif /* HW_PLATFORM_BUS_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 7/8] hw: vfio: common: vfio_prepare_msi_mapping
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (5 preceding siblings ...)
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 6/8] hw: platform-bus: Enable to map any memory region onto the platform-bus Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  2016-10-06 10:51   ` Auger Eric
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 8/8] hw: vfio: common: Adapt vfio_listeners for reserved_iova region Eric Auger
  7 siblings, 1 reply; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

Introduce an helper function to retrieve the iommu type1 capability
chain info.

The first capability ready to be exploited is the msi geometry
capability. vfio_prepare_msi_mapping allocates a MemoryRegion
dedicated to host MSI IOVA mapping. Its size matches the host needs.
This region is mapped on guest side on the platform bus memory container.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v3: creation
---
 hw/vfio/common.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fe8a855..7d20c33 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -34,6 +34,8 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
+#include "hw/platform-bus.h"
+#include "qapi/error.h"
 
 struct vfio_group_head vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -948,12 +950,76 @@ retry:
     return 0;
 }
 
+static struct vfio_info_cap_header *
+vfio_get_iommu_type1_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
+{
+    struct vfio_info_cap_header *hdr;
+    void *ptr = info;
+
+    if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
+        return NULL;
+    }
+
+    for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+        if (hdr->id == id) {
+            return hdr;
+        }
+    }
+    return NULL;
+}
+
+static void vfio_prepare_msi_mapping(struct vfio_iommu_type1_info *info,
+                                     AddressSpace *as, Error **errp)
+{
+    struct vfio_iommu_type1_info_cap_msi_geometry *msi_geometry;
+    MemoryRegion *pbus_region, *reserved_reg;
+    struct vfio_info_cap_header *hdr;
+    PlatformBusDevice *pbus;
+
+    hdr = vfio_get_iommu_type1_info_cap(info,
+                                        VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY);
+    if (!hdr) {
+        return;
+    }
+
+    msi_geometry = container_of(hdr,
+                                struct vfio_iommu_type1_info_cap_msi_geometry,
+                                header);
+
+    if (msi_geometry->flags & VFIO_IOMMU_MSI_GEOMETRY_RESERVED) {
+        return;
+    }
+
+    /*
+     * MSI must be iommu mapped: allocate a GPA region located on the
+     * platform bus that the host will be able to use for MSI IOVA allocation
+     */
+    reserved_reg = memory_region_find_by_name(as->root, "reserved-iova");
+    if (reserved_reg) {
+        memory_region_unref(reserved_reg);
+        return;
+    }
+
+    pbus_region = memory_region_find_by_name(as->root, "platform bus");
+    if (!pbus_region) {
+        error_setg(errp, "no platform bus memory container found");
+        return;
+    }
+    pbus = container_of(pbus_region, PlatformBusDevice, mmio);
+    reserved_reg = g_new0(MemoryRegion, 1);
+    memory_region_init_reserved_iova(reserved_reg, OBJECT(pbus),
+                                     "reserved-iova",
+                                     msi_geometry->size, &error_fatal);
+    platform_bus_map_region(pbus, reserved_reg);
+    memory_region_unref(pbus_region);
+}
 
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
 {
     VFIOContainer *container;
     int ret, fd;
     VFIOAddressSpace *space;
+    Error *err;
 
     space = vfio_get_address_space(as);
 
@@ -1011,6 +1077,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
          * going to actually try in practice.
          */
         vfio_get_iommu_type1_info(fd, &pinfo);
+        vfio_prepare_msi_mapping(pinfo, as, &err);
+        if (err) {
+            error_append_hint(&err,
+                      "Make sure your machine instantiate a platform bus\n");
+            error_report_err(err);
+            goto free_container_exit;
+        }
+
         /* Ignore errors */
         if (ret || !(pinfo->flags & VFIO_IOMMU_INFO_PGSIZES)) {
             /* Assume 4k IOVA page size */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC v3 8/8] hw: vfio: common: Adapt vfio_listeners for reserved_iova region
  2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (6 preceding siblings ...)
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 7/8] hw: vfio: common: vfio_prepare_msi_mapping Eric Auger
@ 2016-10-06  9:50 ` Eric Auger
  7 siblings, 0 replies; 10+ messages in thread
From: Eric Auger @ 2016-10-06  9:50 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, christoffer.dall, drjones, Bharat.Bhushan

In case of reserved iova region, let's declare this region to the
kernel so that it can use it for IOVA/HPA bindings.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/vfio/common.c | 48 +++++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7d20c33..0018538 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -221,10 +221,8 @@ static int vfio_dma_unmap(VFIOContainer *container,
  * @iova: base IOVA of the MSI region
  * @size: size of the MSI IOVA region
  */
-int vfio_register_msi_iova(VFIOContainer *container, hwaddr iova,
-                           ram_addr_t size);
-int vfio_register_msi_iova(VFIOContainer *container, hwaddr iova,
-                           ram_addr_t size)
+static int vfio_register_msi_iova(VFIOContainer *container, hwaddr iova,
+                                  ram_addr_t size)
 {
     int ret;
     struct vfio_iommu_type1_dma_map map = {
@@ -313,6 +311,7 @@ static int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_reserved_iova(section->mr) &&
             !memory_region_is_iommu(section->mr)) ||
            /*
             * Sizing an enabled 64-bit BAR can cause spurious mappings to
@@ -396,7 +395,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     hwaddr iova, end;
     Int128 llend, llsize;
     void *vaddr;
-    int ret;
+    int ret = -1;
     VFIOHostDMAWindow *hostwin;
     bool hostwin_found;
 
@@ -492,27 +491,38 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    /* Here we assume that memory_region_is_ram(section->mr)==true */
+    /* Here we assume that the memory region is ram or reserved iova */
 
-    vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
+    if (memory_region_is_ram(section->mr)) {
+        vaddr = memory_region_get_ram_ptr(section->mr) +
+                section->offset_within_region +
+                (iova - section->offset_within_address_space);
 
-    trace_vfio_listener_region_add_ram(iova, end, vaddr);
+        trace_vfio_listener_region_add_ram(iova, end, vaddr);
 
-    llsize = int128_sub(llend, int128_make64(iova));
+        llsize = int128_sub(llend, int128_make64(iova));
 
-    ret = vfio_dma_map(container, iova, int128_get64(llsize),
+        ret = vfio_dma_map(container, iova, int128_get64(llsize),
                        vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, int128_get64(llsize), vaddr, ret);
-        goto fail;
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iova, int128_get64(llsize), vaddr, ret);
+            goto fail;
+        }
+        return;
+    } else if (memory_region_is_reserved_iova(section->mr)) {
+        llsize = int128_sub(llend, int128_make64(iova));
+        ret = vfio_register_msi_iova(container, iova, int128_get64(llsize));
+        if (ret) {
+            error_report("vfio_register_msi_iova(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iova, int128_get64(llsize), ret);
+            goto fail;
+        }
+        return;
     }
 
-    return;
-
 fail:
     /*
      * On the initfn path, store the first error in the container so we
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC v3 7/8] hw: vfio: common: vfio_prepare_msi_mapping
  2016-10-06  9:50 ` [Qemu-devel] [RFC v3 7/8] hw: vfio: common: vfio_prepare_msi_mapping Eric Auger
@ 2016-10-06 10:51   ` Auger Eric
  0 siblings, 0 replies; 10+ messages in thread
From: Auger Eric @ 2016-10-06 10:51 UTC (permalink / raw)
  To: eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, pranav.sawargaonkar
  Cc: diana.craciun, Bharat.Bhushan, drjones, christoffer.dall

Hi,

On 06/10/2016 11:50, Eric Auger wrote:
> Introduce an helper function to retrieve the iommu type1 capability
> chain info.
> 
> The first capability ready to be exploited is the msi geometry
> capability. vfio_prepare_msi_mapping allocates a MemoryRegion
> dedicated to host MSI IOVA mapping. Its size matches the host needs.
> This region is mapped on guest side on the platform bus memory container.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v3: creation
> ---
>  hw/vfio/common.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index fe8a855..7d20c33 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -34,6 +34,8 @@
>  #include "qemu/range.h"
>  #include "sysemu/kvm.h"
>  #include "trace.h"
> +#include "hw/platform-bus.h"
> +#include "qapi/error.h"
>  
>  struct vfio_group_head vfio_group_list =
>      QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -948,12 +950,76 @@ retry:
>      return 0;
>  }
>  
> +static struct vfio_info_cap_header *
> +vfio_get_iommu_type1_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
> +{
> +    struct vfio_info_cap_header *hdr;
> +    void *ptr = info;
> +
> +    if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
> +        return NULL;
> +    }
> +
> +    for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
> +        if (hdr->id == id) {
> +            return hdr;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void vfio_prepare_msi_mapping(struct vfio_iommu_type1_info *info,
> +                                     AddressSpace *as, Error **errp)
> +{
> +    struct vfio_iommu_type1_info_cap_msi_geometry *msi_geometry;
> +    MemoryRegion *pbus_region, *reserved_reg;
> +    struct vfio_info_cap_header *hdr;
> +    PlatformBusDevice *pbus;
> +
> +    hdr = vfio_get_iommu_type1_info_cap(info,
> +                                        VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY);
> +    if (!hdr) {
> +        return;
> +    }
> +
> +    msi_geometry = container_of(hdr,
> +                                struct vfio_iommu_type1_info_cap_msi_geometry,
> +                                header);
> +
> +    if (msi_geometry->flags & VFIO_IOMMU_MSI_GEOMETRY_RESERVED) {
> +        return;
> +    }
> +
> +    /*
> +     * MSI must be iommu mapped: allocate a GPA region located on the
> +     * platform bus that the host will be able to use for MSI IOVA allocation
> +     */
> +    reserved_reg = memory_region_find_by_name(as->root, "reserved-iova");
> +    if (reserved_reg) {
> +        memory_region_unref(reserved_reg);
> +        return;
> +    }
> +
> +    pbus_region = memory_region_find_by_name(as->root, "platform bus");
> +    if (!pbus_region) {
> +        error_setg(errp, "no platform bus memory container found");
> +        return;
> +    }
> +    pbus = container_of(pbus_region, PlatformBusDevice, mmio);
> +    reserved_reg = g_new0(MemoryRegion, 1);
> +    memory_region_init_reserved_iova(reserved_reg, OBJECT(pbus),
> +                                     "reserved-iova",
> +                                     msi_geometry->size, &error_fatal);
> +    platform_bus_map_region(pbus, reserved_reg);
> +    memory_region_unref(pbus_region);
> +}
>  
>  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>  {
>      VFIOContainer *container;
>      int ret, fd;
>      VFIOAddressSpace *space;
> +    Error *err;
A last minute change related to future vfio realize migration that I
forgot to test. err needs to be initialized to NULL here.

Will re-post. Sorry for the inconvience.

Thanks

Eric
>  
>      space = vfio_get_address_space(as);
>  
> @@ -1011,6 +1077,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
>           * going to actually try in practice.
>           */
>          vfio_get_iommu_type1_info(fd, &pinfo);
> +        vfio_prepare_msi_mapping(pinfo, as, &err);
> +        if (err) {
> +            error_append_hint(&err,
> +                      "Make sure your machine instantiate a platform bus\n");
> +            error_report_err(err);
> +            goto free_container_exit;
> +        }
> +
>          /* Ignore errors */
>          if (ret || !(pinfo->flags & VFIO_IOMMU_INFO_PGSIZES)) {
>              /* Assume 4k IOVA page size */
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-10-06 10:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-06  9:50 [Qemu-devel] [RFC v3 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 1/8] linux-headers: Partial update for MSI IOVA handling Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 2/8] hw: vfio: common: vfio_get_iommu_type1_info Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 3/8] hw: vfio: common: Introduce vfio_register_msi_iova Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 4/8] memory: Add reserved_iova region type Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 5/8] memory: memory_region_find_by_name Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 6/8] hw: platform-bus: Enable to map any memory region onto the platform-bus Eric Auger
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 7/8] hw: vfio: common: vfio_prepare_msi_mapping Eric Auger
2016-10-06 10:51   ` Auger Eric
2016-10-06  9:50 ` [Qemu-devel] [RFC v3 8/8] hw: vfio: common: Adapt vfio_listeners for reserved_iova region Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.