All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt
@ 2016-01-27 13:51 Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 1/7] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This series enables KVM PCI/MSI passthrough with mach-virt.

A new memory region type is introduced (reserved iova). On
vfio_listener_region_add this IOVA region is registered to the kernel with
VFIO_IOMMU_MAP_DMA (using the new VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA flag).

The host VFIO PCI driver then can use this IOVA window to map some host
physical addresses, accessed by passthrough'ed PCI devices, through the IOMMU.
The first goal is to map host MSI controller frames (GICv2M, GITS_TRANSLATER).

mach-virt currently instantiates a 16x64kB reserved IOVA window. This
provisions for future usage. Most probably this exceeds MSI binding needs.

The series includes Pranav/Tushar' series:
QEMU, [v2 0/2] Generic PCIe host bridge INTx determination for INTx routing
((https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04361.html))

Those patches are not mandated for PCI/MSI passthrough to work but without
those, the following warning is observed and can puzzle the end-user:
"qemu-system-aarch64: PCI: Bug - unimplemented PCI INTx routing (gpex-pcihost)"

If prefered, this series can be maintained separately. I just put them here
for consistency.

Best Regards

Eric

Dependencies:
The series depends on kernel series: "[PATCH 00/10] KVM PCIe/MSI passthrough on
ARM/ARM64", (https://lkml.org/lkml/2016/1/26/371)

Git:
QEMU:
https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-pci-passthrough-rfc

Kernel:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc1-pcie-passthrough-v1

Testing:
- on ARM64 AMD Overdrive HW with one e1000e PCIe card.

Eric Auger (7):
  linux-headers: partial update for VFIO reserved IOVA registration
  Add a function to determine interrupt number for INTx routing
  Generic PCIe host bridge INTx determination for INTx routing
  hw: vfio: common: introduce vfio_register_reserved_iova
  memory: add reserved_iova region type
  hw: arm: virt: register reserved IOVA region
  hw: vfio: common: adapt vfio_listeners for reserved_iova region

 hw/arm/virt.c              | 14 ++++++++++
 hw/pci-host/gpex.c         | 12 ++++++++
 hw/vfio/common.c           | 68 ++++++++++++++++++++++++++++++++++++----------
 include/exec/memory.h      | 29 ++++++++++++++++++++
 include/hw/arm/virt.h      |  1 +
 include/hw/pci-host/gpex.h |  1 +
 linux-headers/linux/vfio.h | 15 ++++++++--
 memory.c                   | 11 ++++++++
 8 files changed, 134 insertions(+), 17 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 1/7] linux-headers: partial update for VFIO reserved IOVA registration
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 2/7] Add a function to determine interrupt number for INTx routing Eric Auger
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This is a partial update aiming at enhancing the VFIO user API
according to not yet upstreamed kernel developments available at:

https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.5-rc1-pcie-passthrough-v1

See https://lkml.org/lkml/2016/1/26/371 for more details.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 linux-headers/linux/vfio.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index aa276bc..ac6032e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -8,8 +8,8 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#ifndef VFIO_H
-#define VFIO_H
+#ifndef _UAPIVFIO_H
+#define _UAPIVFIO_H
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
@@ -393,6 +393,7 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
+#define VFIO_IOMMU_INFO_REQUIRE_MSI_MAP (1 << 1)/* MSI must be mapped */
 	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
 };
 
@@ -403,12 +404,20 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case MSI_RESERVED_IOVA is set, the API only aims at registering an IOVA
+ * region which will be used on some platforms to map the host MSI frame.
+ * in that specific case, vaddr is ignored. The requirement for provisioning
+ * such IOVA range can be checked by calling VFIO_IOMMU_GET_INFO with the
+ * VFIO_IOMMU_INFO_REQUIRE_MSI_MAP attribute.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -591,4 +600,4 @@ struct vfio_iommu_spapr_tce_remove {
 
 /* ***************************************************************** */
 
-#endif /* VFIO_H */
+#endif /* _UAPIVFIO_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 2/7] Add a function to determine interrupt number for INTx routing
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 1/7] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 3/7] Generic PCIe host bridge INTx determination " Eric Auger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This patch adds a PCI bus specific function pointer "route_intx_to_irq"
for GPEX. This is used in detemining PCI INTx number from pin.

Signed-off-by: Pranavkumar Sawargaonkar <address@hidden>
Signed-off-by: Tushar Jagad <address@hidden>
---
 hw/pci-host/gpex.c         | 12 ++++++++++++
 include/hw/pci-host/gpex.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
index 9d8fb5a..d0d1250 100644
--- a/hw/pci-host/gpex.c
+++ b/hw/pci-host/gpex.c
@@ -42,6 +42,17 @@ static void gpex_set_irq(void *opaque, int irq_num, int level)
     qemu_set_irq(s->irq[irq_num], level);
 }
 
+static PCIINTxRoute gpex_route_intx_pin_to_irq(void *opaque, int pin)
+{
+    PCIINTxRoute route;
+    GPEXHost *s = opaque;
+
+    route.mode = PCI_INTX_ENABLED;
+    route.irq = (int)s->irq_num[pin];
+
+    return route;
+}
+
 static void gpex_host_realize(DeviceState *dev, Error **errp)
 {
     PCIHostState *pci = PCI_HOST_BRIDGE(dev);
@@ -66,6 +77,7 @@ static void gpex_host_realize(DeviceState *dev, Error **errp)
                                 &s->io_ioport, 0, 4, TYPE_PCIE_BUS);
 
     qdev_set_parent_bus(DEVICE(&s->gpex_root), BUS(pci->bus));
+    pci_bus_set_route_irq_fn(pci->bus, gpex_route_intx_pin_to_irq);
     qdev_init_nofail(DEVICE(&s->gpex_root));
 }
 
diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index 68c9348..7df1c16 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -51,6 +51,7 @@ typedef struct GPEXHost {
     MemoryRegion io_ioport;
     MemoryRegion io_mmio;
     qemu_irq irq[GPEX_NUM_IRQS];
+    uint32_t irq_num[GPEX_NUM_IRQS];
 } GPEXHost;
 
 #endif /* HW_GPEX_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 3/7] Generic PCIe host bridge INTx determination for INTx routing
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 1/7] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 2/7] Add a function to determine interrupt number for INTx routing Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 4/7] hw: vfio: common: introduce vfio_register_reserved_iova Eric Auger
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

This patch stores information about assigned legacy interrupt numbers in
GPEX host structure.
This is used during GPEX INTx number determination from a pin during
INTx routing.

Signed-off-by: Pranavkumar Sawargaonkar <address@hidden>
Signed-off-by: Tushar Jagad <address@hidden>
---
 hw/arm/virt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 15658f4..3839c68 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -826,6 +826,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     char *nodename;
     int i;
     PCIHostState *pci;
+    GPEXHost *s;
 
     dev = qdev_create(NULL, TYPE_GPEX_HOST);
     qdev_init_nofail(dev);
@@ -861,8 +862,11 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     /* Map IO port space */
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
 
+    s = GPEX_HOST(dev);
+
     for (i = 0; i < GPEX_NUM_IRQS; i++) {
         sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
+        s->irq_num[i] = irq + i;
     }
 
     pci = PCI_HOST_BRIDGE(dev);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 4/7] hw: vfio: common: introduce vfio_register_reserved_iova
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (2 preceding siblings ...)
  2016-01-27 13:51 ` [Qemu-devel] [RFC 3/7] Generic PCIe host bridge INTx determination " Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 5/7] memory: add reserved_iova region type Eric Auger
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

vfio_register_reserved_iova allows to register the reserved IOVA region,
typically for MSI frame binding purpose. The kernel allows registering
a single reserved IOVA region.

Unregistration is handled through legacy vfio_dma_unmap.

The function will become static in subsequent patches. However, since
there is no user yet, the compiler argues; the function is currently
not static and a dummy declaration needs to be added.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/common.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6797208..247c87b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -208,6 +208,36 @@ static int vfio_dma_unmap(VFIOContainer *container,
     return 0;
 }
 
+/**
+ * vfio_register_reserved_iova: registers the iova reserved region
+ *
+ * @container: container handle
+ * @iova: base iova of the reserved region
+ * @size: reserved region size
+ *
+ * unregistration is handled using vfio_dma_unmap
+ */
+int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
+                                ram_addr_t size);
+int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
+                                ram_addr_t size)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0) {
+        return 0;
+    }
+
+    error_report("VFIO_MAP_DMA/MSI_RESERVED_IOVA: %d", -errno);
+    return -errno;
+
+}
+
 static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
                         ram_addr_t size, void *vaddr, bool readonly)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 5/7] memory: add reserved_iova region type
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (3 preceding siblings ...)
  2016-01-27 13:51 ` [Qemu-devel] [RFC 4/7] hw: vfio: common: introduce vfio_register_reserved_iova Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region Eric Auger
  2016-01-27 13:51 ` [Qemu-devel] [RFC 7/7] hw: vfio: common: adapt vfio_listeners for reserved_iova region Eric Auger
  6 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

Introduce a new reserved_iova region type. This type of iova region
is bound to be used by the kernel to map some host physical addresses.

A new initializer, memory_region_init_reserved_iova is introduced, as
well as a test function, memory_region_is_reserved_iova.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 include/exec/memory.h | 29 +++++++++++++++++++++++++++++
 memory.c              | 11 +++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c92734a..616cb86 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -165,6 +165,7 @@ struct MemoryRegion {
     /* The following fields should fit in a cache line */
     bool romd_mode;
     bool ram;
+    bool reserved_iova;
     bool subpage;
     bool readonly; /* For RAM regions */
     bool rom_device;
@@ -359,6 +360,21 @@ void memory_region_init_ram(MemoryRegion *mr,
                             Error **errp);
 
 /**
+ * memory_region_init_reserved_iova:  Initialize reserved iova memory region
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @name: the name of the region.
+ * @size: size of the region.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+                                      struct Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      Error **errp);
+
+/**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
  *                                     RAM.  Accesses into the region will
  *                                     modify memory directly.  Only an initial
@@ -531,6 +547,19 @@ static inline bool memory_region_is_ram(MemoryRegion *mr)
 }
 
 /**
+ * memory_region_is_reserved_iova: check whether a memory region corresponds to
+   reserved iova
+ *
+ * Returns %true is a memory region is reserved iova
+ *
+ * @mr: the memory region being queried
+ */
+static inline bool memory_region_is_reserved_iova(MemoryRegion *mr)
+{
+    return mr->reserved_iova;
+}
+
+/**
  * memory_region_is_skip_dump: check whether a memory region should not be
  *                             dumped
  *
diff --git a/memory.c b/memory.c
index d2d0a92..d9ff1b7 100644
--- a/memory.c
+++ b/memory.c
@@ -1231,6 +1231,17 @@ void memory_region_init_ram(MemoryRegion *mr,
     mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+                                      Object *owner,
+                                      const char *name,
+                                      uint64_t size,
+                                      Error **errp)
+{
+    memory_region_init(mr, owner, name, size);
+    mr->reserved_iova = true;
+    mr->terminates = true;
+}
+
 void memory_region_init_resizeable_ram(MemoryRegion *mr,
                                        Object *owner,
                                        const char *name,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (4 preceding siblings ...)
  2016-01-27 13:51 ` [Qemu-devel] [RFC 5/7] memory: add reserved_iova region type Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  2016-01-28  7:10   ` Pavel Fedin
  2016-01-28 10:10   ` Peter Maydell
  2016-01-27 13:51 ` [Qemu-devel] [RFC 7/7] hw: vfio: common: adapt vfio_listeners for reserved_iova region Eric Auger
  6 siblings, 2 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

Registers a 16x64kB reserved iova region. Currently this iova
region is used by the kernel to map host MSI controller frames
(GICv2m, GITS_TRANSLATER).

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/arm/virt.c         | 10 ++++++++++
 include/hw/arm/virt.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3839c68..7eaf8be 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -125,6 +125,7 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
     [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
+    [VIRT_RESERVED] =           { 0x0be00000, 0x00100000 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
     [VIRT_PCIE_MMIO] =          { 0x10000000, 0x2eff0000 },
@@ -815,6 +816,8 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     hwaddr size_pio = vbi->memmap[VIRT_PCIE_PIO].size;
     hwaddr base_ecam = vbi->memmap[VIRT_PCIE_ECAM].base;
     hwaddr size_ecam = vbi->memmap[VIRT_PCIE_ECAM].size;
+    hwaddr base_reserved = vbi->memmap[VIRT_RESERVED].base;
+    hwaddr size_reserved = vbi->memmap[VIRT_RESERVED].size;
     hwaddr base = base_mmio;
     int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
     int irq = vbi->irqmap[VIRT_PCIE];
@@ -822,6 +825,7 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
     MemoryRegion *mmio_reg;
     MemoryRegion *ecam_alias;
     MemoryRegion *ecam_reg;
+    MemoryRegion *reserved_reg;
     DeviceState *dev;
     char *nodename;
     int i;
@@ -838,6 +842,12 @@ static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
                              ecam_reg, 0, size_ecam);
     memory_region_add_subregion(get_system_memory(), base_ecam, ecam_alias);
 
+    reserved_reg = g_new0(MemoryRegion, 1);
+    memory_region_init_reserved_iova(reserved_reg, OBJECT(dev), "reserved-iova",
+                       size_reserved, &error_fatal);
+    memory_region_add_subregion(get_system_memory(), base_reserved,
+                                reserved_reg);
+
     /* Map the MMIO window into system address space so as to expose
      * the section of PCI MMIO space which starts at the same base address
      * (ie 1:1 mapping for that part of PCI MMIO space visible through
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 1ce7847..194871b 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -61,6 +61,7 @@ enum {
     VIRT_PCIE_MMIO_HIGH,
     VIRT_GPIO,
     VIRT_SECURE_UART,
+    VIRT_RESERVED,
 };
 
 typedef struct MemMapEntry {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [RFC 7/7] hw: vfio: common: adapt vfio_listeners for reserved_iova region
  2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
                   ` (5 preceding siblings ...)
  2016-01-27 13:51 ` [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region Eric Auger
@ 2016-01-27 13:51 ` Eric Auger
  6 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-27 13:51 UTC (permalink / raw)
  To: eric.auger, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, p.fedin, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

In case of reserved iova region, let's declare this region to the
kernel so that it can use it for IOVA/PA bindings.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/common.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 247c87b..ee957ba 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -217,10 +217,8 @@ static int vfio_dma_unmap(VFIOContainer *container,
  *
  * unregistration is handled using vfio_dma_unmap
  */
-int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
-                                ram_addr_t size);
-int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
-                                ram_addr_t size)
+static int vfio_register_reserved_iova(VFIOContainer *container, hwaddr iova,
+                                       ram_addr_t size)
 {
     struct vfio_iommu_type1_dma_map map = {
         .argsz = sizeof(map),
@@ -271,6 +269,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_reserved_iova(section->mr) &&
             !memory_region_is_iommu(section->mr)) ||
            /*
             * Sizing an enabled 64-bit BAR can cause spurious mappings to
@@ -354,7 +353,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     hwaddr iova, end;
     Int128 llend;
     void *vaddr;
-    int ret;
+    int ret = -1;
 
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_add_skip(
@@ -418,24 +417,35 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    /* Here we assume that memory_region_is_ram(section->mr)==true */
+    /* Here we assume that the memory region is ram or reserved iova */
 
-    vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
+    if (memory_region_is_ram(section->mr)) {
+        vaddr = memory_region_get_ram_ptr(section->mr) +
+                section->offset_within_region +
+                (iova - section->offset_within_address_space);
 
-    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+        trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
 
-    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, end - iova, vaddr, ret);
-        goto fail;
+        ret = vfio_dma_map(container, iova, end - iova, vaddr,
+                           section->readonly);
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iova, end - iova, vaddr, ret);
+            goto fail;
+        }
+        return;
+    } else if (memory_region_is_reserved_iova(section->mr)) {
+        ret = vfio_register_reserved_iova(container, iova, end - iova);
+        if (ret) {
+            error_report("vfio_register_reserved_iova(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iova, end - iova, ret);
+            goto fail;
+        }
+        return;
     }
 
-    return;
-
 fail:
     /*
      * On the initfn path, store the first error in the container so we
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region
  2016-01-27 13:51 ` [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region Eric Auger
@ 2016-01-28  7:10   ` Pavel Fedin
  2016-01-28  9:39     ` Eric Auger
  2016-01-28 10:10   ` Peter Maydell
  1 sibling, 1 reply; 12+ messages in thread
From: Pavel Fedin @ 2016-01-28  7:10 UTC (permalink / raw)
  To: 'Eric Auger',
	eric.auger, qemu-devel, qemu-arm, peter.maydell, alex.williamson,
	pranav.sawargaonkar, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

 Hello!

> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 3839c68..7eaf8be 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -125,6 +125,7 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
>      [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> +    [VIRT_RESERVED] =           { 0x0be00000, 0x00100000 },

 Looks like with this approach we would need to add this to all machine models which make use of PCI. But is it a good idea? As far
as i understand, the only requirement for this region is not to clash with guest RAM addresses. So, can we instead have some code,
which automatically finds some place, based on the size? For now we hardcode the size to 0x00100000, but in future we could query
the host for the size, because it's still host's MSI controller.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region
  2016-01-28  7:10   ` Pavel Fedin
@ 2016-01-28  9:39     ` Eric Auger
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-28  9:39 UTC (permalink / raw)
  To: Pavel Fedin, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	alex.williamson, pranav.sawargaonkar, pbonzini, agraf
  Cc: Bharat.Bhushan, suravee.suthikulpanit, christoffer.dall

Hi Pavel,
On 01/28/2016 08:10 AM, Pavel Fedin wrote:
>  Hello!
> 
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 3839c68..7eaf8be 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -125,6 +125,7 @@ static const MemMapEntry a15memmap[] = {
>>      [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
>>      [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
>>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>> +    [VIRT_RESERVED] =           { 0x0be00000, 0x00100000 },
> 
>  Looks like with this approach we would need to add this to all machine models which make use of PCI. But is it a good idea?
Yes that's correct. On the other hand not all platforms do need that
feature.
 As far
> as i understand, the only requirement for this region is not to clash with guest RAM addresses. So, can we instead have some code,
> which automatically finds some place, based on the size?
Maybe we could use the IOVA region already reserved for platform bus
which is currently sparsely used (by vfio platform devices).
 For now we hardcode the size to 0x00100000, but in future we could query
> the host for the size, because it's still host's MSI controller.
Yes I agree, this is something we currently miss. I need to study how we
can go through all domains/groups and get the max number of MSI frames
used by any.

Thanks for the follow-up!

Regards

Eric
> 
> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region
  2016-01-27 13:51 ` [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region Eric Auger
  2016-01-28  7:10   ` Pavel Fedin
@ 2016-01-28 10:10   ` Peter Maydell
  2016-01-28 10:20     ` Eric Auger
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2016-01-28 10:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, pranav.sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

On 27 January 2016 at 13:51, Eric Auger <eric.auger@linaro.org> wrote:
> Registers a 16x64kB reserved iova region. Currently this iova
> region is used by the kernel to map host MSI controller frames
> (GICv2m, GITS_TRANSLATER).

Do you mean the host kernel or the guest kernel? The host
kernel should be keeping its paws out of the guest's
address space, and the guest kernel can manage the memory
and the address space any way it likes, I would have thought.
It's not clear to me what this is for.

> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  hw/arm/virt.c         | 10 ++++++++++
>  include/hw/arm/virt.h |  1 +
>  2 files changed, 11 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 3839c68..7eaf8be 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -125,6 +125,7 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
>      [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> +    [VIRT_RESERVED] =           { 0x0be00000, 0x00100000 },
>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */

You've put the new entry between the VIRT_MMIO line and the comment that
is associated with it.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region
  2016-01-28 10:10   ` Peter Maydell
@ 2016-01-28 10:20     ` Eric Auger
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Auger @ 2016-01-28 10:20 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger, pranav.sawargaonkar, Pavel Fedin, QEMU Developers,
	Alexander Graf, Bharat Bhushan, Alex Williamson, qemu-arm,
	Suravee Suthikulpanit, Paolo Bonzini, Christoffer Dall

On 01/28/2016 11:10 AM, Peter Maydell wrote:
> On 27 January 2016 at 13:51, Eric Auger <eric.auger@linaro.org> wrote:
>> Registers a 16x64kB reserved iova region. Currently this iova
>> region is used by the kernel to map host MSI controller frames
>> (GICv2m, GITS_TRANSLATER).
> 
> Do you mean the host kernel or the guest kernel? The host
> kernel should be keeping its paws out of the guest's
> address space, and the guest kernel can manage the memory
> and the address space any way it likes, I would have thought.
> It's not clear to me what this is for.
I meant the host kernel.

If we do not do anything, the host VFIO-PCI driver programs the PCI
device with host GICv2m MSI frame host physical address (as an example).
Since it goes through the sMMU and there is no mapping defined, this
faults. So the idea of this series is that the guest provides some
unused guest PA = IOVA. This window can be used by the host VFIO-PCI
driver to transparently create an IOVA/ host GICv2 MSI frame mapping.
That way the PCI device is programmed with this IOVA and this eventually
reaches the host GICv2m MSI frame physical page.

Hope it clarifies.

> 
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  hw/arm/virt.c         | 10 ++++++++++
>>  include/hw/arm/virt.h |  1 +
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 3839c68..7eaf8be 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -125,6 +125,7 @@ static const MemMapEntry a15memmap[] = {
>>      [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
>>      [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
>>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>> +    [VIRT_RESERVED] =           { 0x0be00000, 0x00100000 },
>>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
> 
> You've put the new entry between the VIRT_MMIO line and the comment that
> is associated with it.
sure thanks

Eric
> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-28 10:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-27 13:51 [Qemu-devel] [RFC 0/7] KVM PCI/MSI passthrough with mach-virt Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 1/7] linux-headers: partial update for VFIO reserved IOVA registration Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 2/7] Add a function to determine interrupt number for INTx routing Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 3/7] Generic PCIe host bridge INTx determination " Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 4/7] hw: vfio: common: introduce vfio_register_reserved_iova Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 5/7] memory: add reserved_iova region type Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 6/7] hw: arm: virt: register reserved IOVA region Eric Auger
2016-01-28  7:10   ` Pavel Fedin
2016-01-28  9:39     ` Eric Auger
2016-01-28 10:10   ` Peter Maydell
2016-01-28 10:20     ` Eric Auger
2016-01-27 13:51 ` [Qemu-devel] [RFC 7/7] hw: vfio: common: adapt vfio_listeners for reserved_iova region Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.