qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
@ 2016-01-29 16:53 Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 1/8] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This series enables KVM PCI/MSI passthrough with mach-virt.

A new memory region type is introduced (reserved iova). On
vfio_listener_region_add this IOVA region is registered to the kernel with
VFIO_IOMMU_MAP_DMA (using the new VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA flag).

The host VFIO PCI driver then can use this IOVA window to map some host
physical addresses, accessed by passthrough'ed PCI devices, through the IOMMU.
The first goal is to map host MSI controller frames (GICv2M, GITS_TRANSLATER).

mach-virt currently instantiates a 16x64kB reserved IOVA window. This
provisions for future usage. Most probably this exceeds MSI binding needs.
To avoid wasting guest PA, we now map the reserved region onto the
platform bus MMIO.

The series includes Pranav/Tushar' series:
QEMU, [v2 0/2] Generic PCIe host bridge INTx determination for INTx routing
((https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04361.html))

Those patches are not mandated for PCI/MSI passthrough to work but without
those, the following warning is observed and can puzzle the end-user:
"qemu-system-aarch64: PCI: Bug - unimplemented PCI INTx routing (gpex-pcihost)"

Best Regards

Eric

Dependencies:
The series depends on kernel series: "[PATCH 00/10] KVM PCIe/MSI passthrough on
ARM/ARM64", (https://lkml.org/lkml/2016/1/26/371)

Git:
QEMU:
https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-pci-passthrough-rfc-v2

Kernel:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc1-pcie-passthrough-v1

Testing:
- on ARM64 AMD Overdrive HW with one e1000e PCIe card.

History:
RFC v1 -> RFC v2:
- now uses platform bus MMIO for mapping reserved IOVA region; hence the
  new patch file:
  "hw: platform-bus: enable to map any memory region onto the platform-bus"

Eric Auger (8):
  linux-headers: partial update for VFIO reserved IOVA registration
  Add a function to determine interrupt number for INTx routing
  Generic PCIe host bridge INTx determination for INTx routing
  hw: vfio: common: introduce vfio_register_reserved_iova
  memory: add reserved_iova region type
  hw: platform-bus: enable to map any memory region onto the
    platform-bus
  hw: arm: virt: register reserved IOVA region
  hw: vfio: common: adapt vfio_listeners for reserved_iova region

 hw/arm/virt.c              | 23 ++++++++++++----
 hw/core/platform-bus.c     | 26 +++++++++++-------
 hw/pci-host/gpex.c         | 12 ++++++++
 hw/vfio/common.c           | 68 ++++++++++++++++++++++++++++++++++++----------
 include/exec/memory.h      | 29 ++++++++++++++++++++
 include/hw/pci-host/gpex.h |  1 +
 include/hw/platform-bus.h  |  7 +++++
 linux-headers/linux/vfio.h | 15 ++++++++--
 memory.c                   | 11 ++++++++
 9 files changed, 160 insertions(+), 32 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 1/8] linux-headers: partial update for VFIO reserved IOVA registration
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 2/8] Add a function to determine interrupt number for INTx routing Eric Auger
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This is a partial update aiming at enhancing the VFIO user API
according to not yet upstreamed kernel developments available at:

https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc1-pcie-passthrough-v1

See https://lkml.org/lkml/2016/1/26/371 for more details.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 linux-headers/linux/vfio.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index aa276bc..ac6032e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -8,8 +8,8 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#ifndef VFIO_H
-#define VFIO_H
+#ifndef _UAPIVFIO_H
+#define _UAPIVFIO_H
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
@@ -393,6 +393,7 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_REQUIRE_MSI_MAP (1 << 1)/* MSI must be mapped */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
 };
 
@@ -403,12 +404,20 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case MSI_RESERVED_IOVA is set, the API only aims at registering an IOVA
+ * region which will be used on some platforms to map the host MSI frame.
+ * in that specific case, vaddr is ignored. The requirement for provisioning
+ * such IOVA range can be checked by calling VFIO_IOMMU_GET_INFO with the
+ * VFIO_IOMMU_INFO_REQUIRE_MSI_MAP attribute.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -591,4 +600,4 @@ struct vfio_iommu_spapr_tce_remove {
 
 /* ***************************************************************** */
 
-#endif /* VFIO_H */
+#endif /* _UAPIVFIO_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 2/8] Add a function to determine interrupt number for INTx routing
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 1/8] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination " Eric Auger
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This patch adds a PCI bus specific function pointer "route_intx_to_irq"
for GPEX. This is used in detemining PCI INTx number from pin.

Signed-off-by: Pranavkumar Sawargaonkar <address@hidden>
Signed-off-by: Tushar Jagad <address@hidden>
---
 hw/pci-host/gpex.c         | 12 ++++++++++++
 include/hw/pci-host/gpex.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
index 9d8fb5a..d0d1250 100644
--- a/hw/pci-host/gpex.c
+++ b/hw/pci-host/gpex.c
@@ -42,6 +42,17 @@ static void gpex_set_irq(void *opaque, int irq_num, int level)
     qemu_set_irq(s->irq[irq_num], level);
 }
 
+static PCIINTxRoute gpex_route_intx_pin_to_irq(void *opaque, int pin)
+{
+    PCIINTxRoute route;
+    GPEXHost *s = opaque;
+
+    route.mode = PCI_INTX_ENABLED;
+    route.irq = (int)s->irq_num[pin];
+
+    return route;
+}
+
 static void gpex_host_realize(DeviceState *dev, Error **errp)
 {
     PCIHostState *pci = PCI_HOST_BRIDGE(dev);
@@ -66,6 +77,7 @@ static void gpex_host_realize(DeviceState *dev, Error **errp)
                                 &s->io_ioport, 0, 4, TYPE_PCIE_BUS);
 
     qdev_set_parent_bus(DEVICE(&s->gpex_root), BUS(pci->bus));
+    pci_bus_set_route_irq_fn(pci->bus, gpex_route_intx_pin_to_irq);
     qdev_init_nofail(DEVICE(&s->gpex_root));
 }
 
diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index 68c9348..7df1c16 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -51,6 +51,7 @@ typedef struct GPEXHost {
     MemoryRegion io_ioport;
     MemoryRegion io_mmio;
     qemu_irq irq[GPEX_NUM_IRQS];
+    uint32_t irq_num[GPEX_NUM_IRQS];
 } GPEXHost;
 
 #endif /* HW_GPEX_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination for INTx routing
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 1/8] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 2/8] Add a function to determine interrupt number for INTx routing Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-02-16 18:14   ` Peter Maydell
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 4/8] hw: vfio: common: introduce vfio_register_reserved_iova Eric Auger
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This patch stores information about assigned legacy interrupt numbers in
GPEX host structure.
This is used during GPEX INTx number determination from a pin during
INTx routing.

Signed-off-by: Pranavkumar Sawargaonkar <address@hidden>
Signed-off-by: Tushar Jagad <address@hidden>
---
 hw/arm/virt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 15658f4..3839c68 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -826,6 +826,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     char *nodename;
     int i;
     PCIHostState *pci;
+    GPEXHost *s;
 
     dev = qdev_create(NULL, TYPE_GPEX_HOST);
     qdev_init_nofail(dev);
@@ -861,8 +862,11 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     /* Map IO port space */
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
+    s = GPEX_HOST(dev);
+
     for (i = 0; i < GPEX_NUM_IRQS; i++) {
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
+        s->irq_num[i] = irq + i;
     }
 
     pci = PCI_HOST_BRIDGE(dev);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 4/8] hw: vfio: common: introduce vfio_register_reserved_iova
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (2 preceding siblings ...)
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination " Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 5/8] memory: add reserved_iova region type Eric Auger
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

vfio_register_reserved_iova allows to register the reserved IOVA region,
typically for MSI frame binding purpose. The kernel allows registering
a single reserved IOVA region.

Unregistration is handled through legacy vfio_dma_unmap.

The function will become static in subsequent patches. However, since
there is no user yet, the compiler argues; the function is currently
not static and a dummy declaration needs to be added.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/common.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6797208..247c87b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -208,6 +208,36 @@ static int vfio_dma_unmap(VFIOContainer *container,
     return 0;
 }
 
+/**
+ * vfio_register_reserved_iova: registers the iova reserved region
+ *
+ * @container: container handle
+ * @iova: base iova of the reserved region
+ * @size: reserved region size
+ *
+ * unregistration is handled using vfio_dma_unmap
+ */
+int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
+                                ram_addr_t size);
+int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
+                                ram_addr_t size)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
+        return 0;
+    }
+
+    error_report("VFIO_MAP_DMA/MSI_RESERVED_IOVA: %d", -errno);
+    return -errno;
+
+}
+
 static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
                         ram_addr_t size, void *vaddr, bool readonly)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 5/8] memory: add reserved_iova region type
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (3 preceding siblings ...)
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 4/8] hw: vfio: common: introduce vfio_register_reserved_iova Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus Eric Auger
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

Introduce a new reserved_iova region type. This type of iova region
is bound to be used by the kernel to map some host physical addresses.

A new initializer, memory_region_init_reserved_iova is introduced, as
well as a test function, memory_region_is_reserved_iova.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 include/exec/memory.h | 29 +++++++++++++++++++++++++++++
 memory.c              | 11 +++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c92734a..616cb86 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -165,6 +165,7 @@ struct MemoryRegion {
     /* The following fields should fit in a cache line */
     bool romd_mode;
     bool ram;
+    bool reserved_iova;
     bool subpage;
     bool readonly; /* For RAM regions */
     bool rom_device;
@@ -359,6 +360,21 @@ void memory_region_init_ram(MemoryRegion *mr,
                             Error **errp);
 
 /**
+ * memory_region_init_reserved_iova:  Initialize reserved iova memory region
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @name: the name of the region.
+ * @size: size of the region.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+                                      struct Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      Error **errp);
+
+/**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
  *                                     RAM.  Accesses into the region will
  *                                     modify memory directly.  Only an initial
@@ -531,6 +547,19 @@ static inline bool memory_region_is_ram(MemoryRegion *mr)
 }
 
 /**
+ * memory_region_is_reserved_iova: check whether a memory region corresponds to
+   reserved iova
+ *
+ * Returns %true is a memory region is reserved iova
+ *
+ * @mr: the memory region being queried
+ */
+static inline bool memory_region_is_reserved_iova(MemoryRegion *mr)
+{
+    return mr->reserved_iova;
+}
+
+/**
  * memory_region_is_skip_dump: check whether a memory region should not be
  *                             dumped
  *
diff --git a/memory.c b/memory.c
index d2d0a92..d9ff1b7 100644
--- a/memory.c
+++ b/memory.c
@@ -1231,6 +1231,17 @@ void memory_region_init_ram(MemoryRegion *mr,
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+                                      Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      Error **errp)
+{
+    memory_region_init(mr, owner, name, size);
+    mr->reserved_iova = true;
+    mr->terminates = true;
+}
+
 void memory_region_init_resizeable_ram(MemoryRegion *mr,
                                        Object *owner,
                                        const char *name,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (4 preceding siblings ...)
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 5/8] memory: add reserved_iova region type Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-02-16 18:16   ` Peter Maydell
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region Eric Auger
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

The platform bus currently is used to map dynamically instantiable
platform device MMIO regions. The platform bus also can be seen as a
pool of free guest physical addresses. We would like to use that pool
to allocate a contiguous reserved IOVA region usable for MSI message
address IOMMU mapping.

This patch introduces platform_bus_map_region which enables to map any
memory region onto the platform bus.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/core/platform-bus.c    | 26 ++++++++++++++++----------
 include/hw/platform-bus.h |  7 +++++++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c
index aa55d01..7d0f5e0 100644
--- a/hw/core/platform-bus.c
+++ b/hw/core/platform-bus.c
@@ -128,16 +128,14 @@ static void platform_bus_map_irq(PlatformBusDevice *pbus, SysBusDevice *sbdev,
     sysbus_connect_irq(sbdev, n, pbus->irqs[irqn]);
 }
 
-static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
-                                  int n)
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr)
 {
-    MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
-    uint64_t size = memory_region_size(sbdev_mr);
+    uint64_t size = memory_region_size(mr);
     uint64_t alignment = (1ULL << (63 - clz64(size + size - 1)));
     uint64_t off;
     bool found_region = false;
 
-    if (memory_region_is_mapped(sbdev_mr)) {
+    if (memory_region_is_mapped(mr)) {
         /* Region is already mapped, nothing to do */
         return;
     }
@@ -154,13 +152,21 @@ static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
     }
 
     if (!found_region) {
-        error_report("Platform Bus: Can not fit MMIO region of size %"PRIx64,
-                     size);
-        exit(1);
+        error_setg(&error_fatal,
+                   "Platform Bus: Can not fit region %s of size %"PRIx64,
+                   mr->name, size);
     }
 
-    /* Map the device's region into our Platform Bus MMIO space */
-    memory_region_add_subregion(&pbus->mmio, off, sbdev_mr);
+    /* Map the region into our Platform Bus MMIO space */
+    memory_region_add_subregion(&pbus->mmio, off, mr);
+}
+
+static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
+                                  int n)
+{
+    MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
+
+    platform_bus_map_region(pbus, sbdev_mr);
 }
 
 /*
diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
index bd42b83..ee19674 100644
--- a/include/hw/platform-bus.h
+++ b/include/hw/platform-bus.h
@@ -54,4 +54,11 @@ int platform_bus_get_irqn(PlatformBusDevice *platform_bus, SysBusDevice *sbdev,
 hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice *sbdev,
                                   int n);
 
+/**
+ * platform_bus_map_region: map a region into the platform bus
+ * @pbus: platform bus handle
+ * @mr: memory region handle
+ */
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr);
+
 #endif /* !HW_PLATFORM_BUS_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (5 preceding siblings ...)
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-02-16 18:21   ` Peter Maydell
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 8/8] hw: vfio: common: adapt vfio_listeners for reserved_iova region Eric Auger
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

Registers a 16x64kB reserved iova region. Currently this iova
region is used by the host kernel to map host MSI controller frames
(GICv2m, GITS_TRANSLATER). The host kernel needs this iova window
since it cannot program the PCIe device with MSI frame physical
address (as opposed to x86) since the MSI write transactions go
through the IOMMU.

The reserved region is mapped on the platform bus.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

RFC v1 -> RFC v2:
- use the platform bus to map the reserved iova region
---
 hw/arm/virt.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3839c68..4b2a891 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -805,7 +805,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle,
 }
 
 static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
-                        bool use_highmem)
+                        bool use_highmem, MemoryRegion **reserved_reg)
 {
     hwaddr base_mmio = vbi->memmap[VIRT_PCIE_MMIO].base;
     hwaddr size_mmio = vbi->memmap[VIRT_PCIE_MMIO].size;
@@ -920,10 +920,16 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     qemu_fdt_setprop_cell(vbi->fdt, nodename, "#interrupt-cells", 1);
     create_pcie_irq_map(vbi, vbi->gic_phandle, irq, nodename);
 
+    /* initialize the reserved iova region for MSI binding (16 x 64kb) */
+    *reserved_reg = g_new0(MemoryRegion, 1);
+    memory_region_init_reserved_iova(*reserved_reg, OBJECT(dev),
+                                     "reserved-iova",
+                                     0x100000, &error_fatal);
+
     g_free(nodename);
 }
 
-static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic)
+static PlatformBusDevice *create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic)
 {
     DeviceState *dev;
     SysBusDevice *s;
@@ -962,6 +968,7 @@ static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic)
     memory_region_add_subregion(sysmem,
                                 platform_bus_params.platform_bus_base,
                                 sysbus_mmio_get_region(s, 0));
+    return PLATFORM_BUS_DEVICE(dev);
 }
 
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
@@ -1015,7 +1022,7 @@ static void machvirt_init(MachineState *machine)
     VirtMachineState *vms = VIRT_MACHINE(machine);
     qemu_irq pic[NUM_IRQS];
     MemoryRegion *sysmem = get_system_memory();
-    MemoryRegion *secure_sysmem = NULL;
+    MemoryRegion *secure_sysmem = NULL, *reserved_reg;
     int gic_version = vms->gic_version;
     int n, max_cpus;
     MemoryRegion *ram = g_new(MemoryRegion, 1);
@@ -1024,6 +1031,7 @@ static void machvirt_init(MachineState *machine)
     VirtGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state);
     VirtGuestInfo *guest_info = &guest_info_state->info;
     char **cpustr;
+    PlatformBusDevice *pbus;
 
     if (!cpu_model) {
         cpu_model = "cortex-a15";
@@ -1161,7 +1169,7 @@ static void machvirt_init(MachineState *machine)
 
     create_rtc(vbi, pic);
 
-    create_pcie(vbi, pic, vms->highmem);
+    create_pcie(vbi, pic, vms->highmem, &reserved_reg);
 
     create_gpio(vbi, pic);
 
@@ -1200,7 +1208,8 @@ static void machvirt_init(MachineState *machine)
      * another notifier is registered which adds platform bus nodes.
      * Notifiers are executed in registration reverse order.
      */
-    create_platform_bus(vbi, pic);
+    pbus = create_platform_bus(vbi, pic);
+    platform_bus_map_region(pbus, reserved_reg);
 }
 
 static bool virt_get_secure(Object *obj, Error **errp)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC v2 8/8] hw: vfio: common: adapt vfio_listeners for reserved_iova region
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (6 preceding siblings ...)
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region Eric Auger
@ 2016-01-29 16:53 ` Eric Auger
  2016-02-16 18:23 ` [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Peter Maydell
  2016-05-04 11:18 ` Yehuda Yitschak
  9 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-01-29 16:53 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

In case of reserved iova region, let's declare this region to the
kernel so that it can use it for IOVA/PA bindings.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/common.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 247c87b..ee957ba 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -217,10 +217,8 @@ static int vfio_dma_unmap(VFIOContainer *container,
  *
  * unregistration is handled using vfio_dma_unmap
  */
-int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
-                                ram_addr_t size);
-int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
-                                ram_addr_t size)
+static int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
+                                       ram_addr_t size)
 {
     struct vfio_iommu_type1_dma_map map = {
         .argsz = sizeof(map),
@@ -271,6 +269,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_reserved_iova(section->mr) &&
             !memory_region_is_iommu(section->mr)) ||
            /*
             * Sizing an enabled 64-bit BAR can cause spurious mappings to
@@ -354,7 +353,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     hwaddr iova, end;
     Int128 llend;
     void *vaddr;
-    int ret;
+    int ret = -1;
 
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_add_skip(
@@ -418,24 +417,35 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    /* Here we assume that memory_region_is_ram(section->mr)==true */
+    /* Here we assume that the memory region is ram or reserved iova */
 
-    vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
+    if (memory_region_is_ram(section->mr)) {
+        vaddr = memory_region_get_ram_ptr(section->mr) +
+                section->offset_within_region +
+                (iova - section->offset_within_address_space);
 
-    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+        trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
 
-    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, end - iova, vaddr, ret);
-        goto fail;
+        ret = vfio_dma_map(container, iova, end - iova, vaddr,
+                           section->readonly);
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iova, end - iova, vaddr, ret);
+            goto fail;
+        }
+        return;
+    } else if (memory_region_is_reserved_iova(section->mr)) {
+        ret = vfio_register_reserved_iova(container, iova, end - iova);
+        if (ret) {
+            error_report("vfio_register_reserved_iova(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iova, end - iova, ret);
+            goto fail;
+        }
+        return;
     }
 
-    return;
-
 fail:
     /*
      * On the initfn path, store the first error in the container so we
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination for INTx routing
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination " Eric Auger
@ 2016-02-16 18:14   ` Peter Maydell
  2016-02-18 18:15     ` Eric Auger
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2016-02-16 18:14 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, Pranav Sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

On 29 January 2016 at 16:53, Eric Auger <eric.auger@linaro.org> wrote:
> This patch stores information about assigned legacy interrupt numbers in
> GPEX host structure.
> This is used during GPEX INTx number determination from a pin during
> INTx routing.
>
> Signed-off-by: Pranavkumar Sawargaonkar <address@hidden>
> Signed-off-by: Tushar Jagad <address@hidden>
> ---
>  hw/arm/virt.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 15658f4..3839c68 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -826,6 +826,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>      char *nodename;
>      int i;
>      PCIHostState *pci;
> +    GPEXHost *s;
>
>      dev = qdev_create(NULL, TYPE_GPEX_HOST);
>      qdev_init_nofail(dev);
> @@ -861,8 +862,11 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>      /* Map IO port space */
>      sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
>
> +    s = GPEX_HOST(dev);
> +
>      for (i = 0; i < GPEX_NUM_IRQS; i++) {
>          sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
> +        s->irq_num[i] = irq + i;
>      }

I don't think that the board code should be prodding stuff in the GPEXHost
struct like this -- device structs are supposed to be private to the
device implementation. If you need the information in the device then
you need to come up with a better API for this.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus Eric Auger
@ 2016-02-16 18:16   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2016-02-16 18:16 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, Pranav Sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

On 29 January 2016 at 16:53, Eric Auger <eric.auger@linaro.org> wrote:
> The platform bus currently is used to map dynamically instantiable
> platform device MMIO regions. The platform bus also can be seen as a
> pool of free guest physical addresses. We would like to use that pool
> to allocate a contiguous reserved IOVA region usable for MSI message
> address IOMMU mapping.
>
> This patch introduces platform_bus_map_region which enables to map any
> memory region onto the platform bus.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  hw/core/platform-bus.c    | 26 ++++++++++++++++----------
>  include/hw/platform-bus.h |  7 +++++++
>  2 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c
> index aa55d01..7d0f5e0 100644
> --- a/hw/core/platform-bus.c
> +++ b/hw/core/platform-bus.c
> @@ -128,16 +128,14 @@ static void platform_bus_map_irq(PlatformBusDevice *pbus, SysBusDevice *sbdev,
>      sysbus_connect_irq(sbdev, n, pbus->irqs[irqn]);
>  }
>
> -static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
> -                                  int n)
> +void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr)
>  {
> -    MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
> -    uint64_t size = memory_region_size(sbdev_mr);
> +    uint64_t size = memory_region_size(mr);
>      uint64_t alignment = (1ULL << (63 - clz64(size + size - 1)));
>      uint64_t off;
>      bool found_region = false;
>
> -    if (memory_region_is_mapped(sbdev_mr)) {
> +    if (memory_region_is_mapped(mr)) {
>          /* Region is already mapped, nothing to do */
>          return;
>      }
> @@ -154,13 +152,21 @@ static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
>      }
>
>      if (!found_region) {
> -        error_report("Platform Bus: Can not fit MMIO region of size %"PRIx64,
> -                     size);
> -        exit(1);
> +        error_setg(&error_fatal,
> +                   "Platform Bus: Can not fit region %s of size %"PRIx64,
> +                   mr->name, size);
>      }
>
> -    /* Map the device's region into our Platform Bus MMIO space */
> -    memory_region_add_subregion(&pbus->mmio, off, sbdev_mr);
> +    /* Map the region into our Platform Bus MMIO space */
> +    memory_region_add_subregion(&pbus->mmio, off, mr);
> +}
> +
> +static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
> +                                  int n)
> +{
> +    MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
> +
> +    platform_bus_map_region(pbus, sbdev_mr);
>  }
>
>  /*
> diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
> index bd42b83..ee19674 100644
> --- a/include/hw/platform-bus.h
> +++ b/include/hw/platform-bus.h
> @@ -54,4 +54,11 @@ int platform_bus_get_irqn(PlatformBusDevice *platform_bus, SysBusDevice *sbdev,
>  hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice *sbdev,
>                                    int n);
>
> +/**
> + * platform_bus_map_region: map a region into the platform bus

s/region/MemoryRegion/

> + * @pbus: platform bus handle
> + * @mr: memory region handle
> + */
> +void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr);
> +
>  #endif /* !HW_PLATFORM_BUS_H */
> --
> 1.9.1

otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region Eric Auger
@ 2016-02-16 18:21   ` Peter Maydell
  2016-02-18 18:48     ` Eric Auger
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2016-02-16 18:21 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, Pranav Sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

On 29 January 2016 at 16:53, Eric Auger <eric.auger@linaro.org> wrote:
> Registers a 16x64kB reserved iova region. Currently this iova
> region is used by the host kernel to map host MSI controller frames
> (GICv2m, GITS_TRANSLATER). The host kernel needs this iova window
> since it cannot program the PCIe device with MSI frame physical
> address (as opposed to x86) since the MSI write transactions go
> through the IOMMU.
>
> The reserved region is mapped on the platform bus.

I guess that keeps it neatly out of the way of everybody else :-)

> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
>
> RFC v1 -> RFC v2:
> - use the platform bus to map the reserved iova region
> ---
>  hw/arm/virt.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 3839c68..4b2a891 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -805,7 +805,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle,
>  }
>
>  static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
> -                        bool use_highmem)
> +                        bool use_highmem, MemoryRegion **reserved_reg)
>  {
>      hwaddr base_mmio = vbi->memmap[VIRT_PCIE_MMIO].base;
>      hwaddr size_mmio = vbi->memmap[VIRT_PCIE_MMIO].size;
> @@ -920,10 +920,16 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>      qemu_fdt_setprop_cell(vbi->fdt, nodename, "#interrupt-cells", 1);
>      create_pcie_irq_map(vbi, vbi->gic_phandle, irq, nodename);
>
> +    /* initialize the reserved iova region for MSI binding (16 x 64kb) */
> +    *reserved_reg = g_new0(MemoryRegion, 1);
> +    memory_region_init_reserved_iova(*reserved_reg, OBJECT(dev),
> +                                     "reserved-iova",
> +                                     0x100000, &error_fatal);

So the only reason this is here is because we need to have a pointer to
the PCIe controller DeviceState, right? I think it would be better to
make create_pcie() return the DeviceState* instead of NULL. Then you
can either (a) pass the pcie controller pointer into create_platform_bus()
and have that create and map the reserved iova region, or (b) have a
separate function to create the reserved iova region. In any case I
think it fits more naturally with the rest of the platform bus code
rather than in the PCIe controller creation function.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (7 preceding siblings ...)
  2016-01-29 16:53 ` [Qemu-devel] [RFC v2 8/8] hw: vfio: common: adapt vfio_listeners for reserved_iova region Eric Auger
@ 2016-02-16 18:23 ` Peter Maydell
  2016-05-04 11:18 ` Yehuda Yitschak
  9 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2016-02-16 18:23 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, Pranav Sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

On 29 January 2016 at 16:53, Eric Auger <eric.auger@linaro.org> wrote:
> This series enables KVM PCI/MSI passthrough with mach-virt.
>
> A new memory region type is introduced (reserved iova). On
> vfio_listener_region_add this IOVA region is registered to the kernel with
> VFIO_IOMMU_MAP_DMA (using the new VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA flag).
>
> The host VFIO PCI driver then can use this IOVA window to map some host
> physical addresses, accessed by passthrough'ed PCI devices, through the IOMMU.
> The first goal is to map host MSI controller frames (GICv2M, GITS_TRANSLATER).
>
> mach-virt currently instantiates a 16x64kB reserved IOVA window. This
> provisions for future usage. Most probably this exceeds MSI binding needs.
> To avoid wasting guest PA, we now map the reserved region onto the
> platform bus MMIO.
>
> The series includes Pranav/Tushar' series:
> QEMU, [v2 0/2] Generic PCIe host bridge INTx determination for INTx routing
> ((https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04361.html))
>
> Those patches are not mandated for PCI/MSI passthrough to work but without
> those, the following warning is observed and can puzzle the end-user:
> "qemu-system-aarch64: PCI: Bug - unimplemented PCI INTx routing (gpex-pcihost)"

I've replied with comments about the parts I care about; I'm leaving
the rest of the review to others (VFIO related, etc).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination for INTx routing
  2016-02-16 18:14   ` Peter Maydell
@ 2016-02-18 18:15     ` Eric Auger
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-02-18 18:15 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger, Pranav Sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

Hi Peter,
On 02/16/2016 07:14 PM, Peter Maydell wrote:
> On 29 January 2016 at 16:53, Eric Auger <eric.auger@linaro.org> wrote:
>> This patch stores information about assigned legacy interrupt numbers in
>> GPEX host structure.
>> This is used during GPEX INTx number determination from a pin during
>> INTx routing.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <address@hidden>
>> Signed-off-by: Tushar Jagad <address@hidden>
>> ---
>>  hw/arm/virt.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 15658f4..3839c68 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -826,6 +826,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>>      char *nodename;
>>      int i;
>>      PCIHostState *pci;
>> +    GPEXHost *s;
>>
>>      dev = qdev_create(NULL, TYPE_GPEX_HOST);
>>      qdev_init_nofail(dev);
>> @@ -861,8 +862,11 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>>      /* Map IO port space */
>>      sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
>>
>> +    s = GPEX_HOST(dev);
>> +
>>      for (i = 0; i < GPEX_NUM_IRQS; i++) {
>>          sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
>> +        s->irq_num[i] = irq + i;
>>      }
> 
> I don't think that the board code should be prodding stuff in the GPEXHost
> struct like this -- device structs are supposed to be private to the
> device implementation. If you need the information in the device then
> you need to come up with a better API for this.

Sure, Thanks

Eric
> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region
  2016-02-16 18:21   ` Peter Maydell
@ 2016-02-18 18:48     ` Eric Auger
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2016-02-18 18:48 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger, Pranav Sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

Hi Peter,
On 02/16/2016 07:21 PM, Peter Maydell wrote:
> On 29 January 2016 at 16:53, Eric Auger <eric.auger@linaro.org> wrote:
>> Registers a 16x64kB reserved iova region. Currently this iova
>> region is used by the host kernel to map host MSI controller frames
>> (GICv2m, GITS_TRANSLATER). The host kernel needs this iova window
>> since it cannot program the PCIe device with MSI frame physical
>> address (as opposed to x86) since the MSI write transactions go
>> through the IOMMU.
>>
>> The reserved region is mapped on the platform bus.
> 
> I guess that keeps it neatly out of the way of everybody else :-)
Yes hopefully. The platform bus has its own MMIO allocation scheme.
> 
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> RFC v1 -> RFC v2:
>> - use the platform bus to map the reserved iova region
>> ---
>>  hw/arm/virt.c | 19 ++++++++++++++-----
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 3839c68..4b2a891 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -805,7 +805,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, uint32_t gic_phandle,
>>  }
>>
>>  static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>> -                        bool use_highmem)
>> +                        bool use_highmem, MemoryRegion **reserved_reg)
>>  {
>>      hwaddr base_mmio = vbi->memmap[VIRT_PCIE_MMIO].base;
>>      hwaddr size_mmio = vbi->memmap[VIRT_PCIE_MMIO].size;
>> @@ -920,10 +920,16 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
>>      qemu_fdt_setprop_cell(vbi->fdt, nodename, "#interrupt-cells", 1);
>>      create_pcie_irq_map(vbi, vbi->gic_phandle, irq, nodename);
>>
>> +    /* initialize the reserved iova region for MSI binding (16 x 64kb) */
>> +    *reserved_reg = g_new0(MemoryRegion, 1);
>> +    memory_region_init_reserved_iova(*reserved_reg, OBJECT(dev),
>> +                                     "reserved-iova",
>> +                                     0x100000, &error_fatal);
> 
> So the only reason this is here is because we need to have a pointer to
> the PCIe controller DeviceState, right?
yes that's correct. currently the PCIe controller is the object that
tracks the reserved region's ref count. I proceeded that way because the
reserved IOVA provision currently is related to PCIe/MSI functionality.

 I think it would be better to
> make create_pcie() return the DeviceState* instead of NULL. Then you
> can either (a) pass the pcie controller pointer into create_platform_bus()
> and have that create and map the reserved iova region, or (b) have a
> separate function to create the reserved iova region. In any case I
> think it fits more naturally with the rest of the platform bus code
> rather than in the PCIe controller creation function.
OK

Another issue is that we currently book an arbitrary 1MB IOVA window
whatever the needs. We are also thinking about extending the VFIO user
API to return the number of reserved iova pages that are requested and
possibly some alignment constraints. Then it becomes more complex since
VFIO devices are instantiated after the machine creation, all the needs
must be collected and consolidated and eventually the reserved iova
region can be created. So I think that code will end up somewhere in a
machine init done notifier ...

Thanks for the review!

Best Regards

Eric
> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt
  2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (8 preceding siblings ...)
  2016-02-16 18:23 ` [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Peter Maydell
@ 2016-05-04 11:18 ` Yehuda Yitschak
  9 siblings, 0 replies; 16+ messages in thread
From: Yehuda Yitschak @ 2016-05-04 11:18 UTC (permalink / raw)
  To: Eric Auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall


Tested-by: Yehuda Yitschak <yehuday@marvell.com>

Tested on Armada-7040 using an intel IXGBE (82599ES). 


> -----Original Message-----
> From: qemu-devel-bounces+yehuday=marvell.com@nongnu.org
> [mailto:qemu-devel-bounces+yehuday=marvell.com@nongnu.org] On
> Behalf Of Eric Auger
> Sent: Friday, January 29, 2016 18:54
> To: eric.auger@st.com; eric.auger@linaro.org; qemu-devel@nongnu.org;
> qemu-arm@nongnu.org; peter.maydell@linaro.org;
> alex.williamson@redhat.com; pranav.sawargaonkar@gmail.com;
> p.fedin@samsung.com; pbonzini@redhat.com; agraf@suse.de
> Cc: Bharat.Bhushan@freescale.com; suravee.suthikulpanit@amd.com;
> christoffer.dall@linaro.org
> Subject: [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-
> virt
> 
> This series enables KVM PCI/MSI passthrough with mach-virt.
> 
> A new memory region type is introduced (reserved iova). On
> vfio_listener_region_add this IOVA region is registered to the kernel with
> VFIO_IOMMU_MAP_DMA (using the new
> VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA flag).
> 
> The host VFIO PCI driver then can use this IOVA window to map some host
> physical addresses, accessed by passthrough'ed PCI devices, through the
> IOMMU.
> The first goal is to map host MSI controller frames (GICv2M,
> GITS_TRANSLATER).
> 
> mach-virt currently instantiates a 16x64kB reserved IOVA window. This
> provisions for future usage. Most probably this exceeds MSI binding needs.
> To avoid wasting guest PA, we now map the reserved region onto the
> platform bus MMIO.
> 
> The series includes Pranav/Tushar' series:
> QEMU, [v2 0/2] Generic PCIe host bridge INTx determination for INTx routing
> ((https://lists.nongnu.org/archive/html/qemu-devel/2015-
> 04/msg04361.html))
> 
> Those patches are not mandated for PCI/MSI passthrough to work but
> without those, the following warning is observed and can puzzle the end-
> user:
> "qemu-system-aarch64: PCI: Bug - unimplemented PCI INTx routing (gpex-
> pcihost)"
> 
> Best Regards
> 
> Eric
> 
> Dependencies:
> The series depends on kernel series: "[PATCH 00/10] KVM PCIe/MSI
> passthrough on ARM/ARM64", (https://lkml.org/lkml/2016/1/26/371)
> 
> Git:
> QEMU:
> https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.
> 0-pci-passthrough-rfc-v2
> 
> Kernel:
> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-
> rc1-pcie-passthrough-v1
> 
> Testing:
> - on ARM64 AMD Overdrive HW with one e1000e PCIe card.
> 
> History:
> RFC v1 -> RFC v2:
> - now uses platform bus MMIO for mapping reserved IOVA region; hence
> the
>   new patch file:
>   "hw: platform-bus: enable to map any memory region onto the platform-
> bus"
> 
> Eric Auger (8):
>   linux-headers: partial update for VFIO reserved IOVA registration
>   Add a function to determine interrupt number for INTx routing
>   Generic PCIe host bridge INTx determination for INTx routing
>   hw: vfio: common: introduce vfio_register_reserved_iova
>   memory: add reserved_iova region type
>   hw: platform-bus: enable to map any memory region onto the
>     platform-bus
>   hw: arm: virt: register reserved IOVA region
>   hw: vfio: common: adapt vfio_listeners for reserved_iova region
> 
>  hw/arm/virt.c              | 23 ++++++++++++----
>  hw/core/platform-bus.c     | 26 +++++++++++-------
>  hw/pci-host/gpex.c         | 12 ++++++++
>  hw/vfio/common.c           | 68
> ++++++++++++++++++++++++++++++++++++----------
>  include/exec/memory.h      | 29 ++++++++++++++++++++
>  include/hw/pci-host/gpex.h |  1 +
>  include/hw/platform-bus.h  |  7 +++++
>  linux-headers/linux/vfio.h | 15 ++++++++--
>  memory.c                   | 11 ++++++++
>  9 files changed, 160 insertions(+), 32 deletions(-)
> 
> --
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-05-04 11:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29 16:53 [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 1/8] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 2/8] Add a function to determine interrupt number for INTx routing Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination " Eric Auger
2016-02-16 18:14   ` Peter Maydell
2016-02-18 18:15     ` Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 4/8] hw: vfio: common: introduce vfio_register_reserved_iova Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 5/8] memory: add reserved_iova region type Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus Eric Auger
2016-02-16 18:16   ` Peter Maydell
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region Eric Auger
2016-02-16 18:21   ` Peter Maydell
2016-02-18 18:48     ` Eric Auger
2016-01-29 16:53 ` [Qemu-devel] [RFC v2 8/8] hw: vfio: common: adapt vfio_listeners for reserved_iova region Eric Auger
2016-02-16 18:23 ` [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt Peter Maydell
2016-05-04 11:18 ` Yehuda Yitschak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).